Wednesday 31 December 2014

The Sound of 2014

I still listen to music all day, every day during the working week. So here's my personal Top 10 from 2014 in Letterman-style reverse order.

This is available as a Spotify playlist and each track links to the album it comes from on Spotify.

10. Royal Blood - "Come On Over"
    - Best rock album released this year in my view. Of course this is the year that rock gave up.
9. Phantogram - "Black Out Days"
    - Electro pop trip-hop. Check out their back catalogue if you like this.
8. ODESZA - "Memories That You Call"
    - Ibiza style chillout via Seatle.
7. Lia Ices - "Sweet as Ice"
    - Electro folk? Mystical lyrics plus an ethereal voice with a subtle electronic sheen.
6. Bombay Bicycle Club - "Home By Now"
    - British indie at it's best. Eclectic and sublime.
5. HAERTS - "Hemiplegia"
    - An elctro Fleetwood Mac. Kind of like HAIM but better.
4. I Break Horses - "Denial"
    - Downtempo Swedish electronica, dare I say "chillout".
3. RHODES - "Raise Your Love"
    - British indie folk. Better than the much touted Hozier in my view.
2. London Grammar - "Hey Now"
    - Exceptional. The KEXP live video actually sounds better than the album.
1. The Jezabels - "The End"
    - Still the best band you've never heard of.

I'm on last.fm as joeharris76 (of course). Stop by and say hi.

Monday 8 December 2014

Micro-review of InformaticaCloud for Amazon Redshift

I recently finished an evaluation of Informatica Cloud's platform as a candidate to take over our Redshift ETL workload.

I'm posting this 'thumbnail' for the sake of my own memory as much anything else. :)

TL;DR - It's very _close_ to being an ideal solution but, sadly, has some flaws we can't work around right now.

Good:

  • Very straightforward agent based architecture. Very simple to install and consume.
  • Reliably syncs/replicates *simple* data from source DBs via flat files and cleans up after itself.
  • Performs simple and complex joins on the data source. This is a *major* weakness for competitors as they can't do this.
  • All functionality is wizard driven and probably consumable by line of business users.


Bad:

  • Cannot create SQL based transforms without uploading PowerCenter mappings. This is too much to ask of LOB users.
  • Extracts land as plain text on the agent host before gzipping. This is a *very* slow approach. They should pipe directly to gzip.
  • Logging is *extremely* verbose. Every single row is written to the success or error. You'll quickly create many GB of logs.
  • Upsert operations update *every* matching row in Redshift. This is handled as a delete plus and insert within Redshift, which means we then need a very slow vacuum to reclaim space.
  • Finally, the deal breaker, cannot handle `timestamp with timezone` data types. In 2014. Redshift cannot handle these either but I need the ETL platform to be able to work with them.

Disqus for @joeharris76