AWS provides a scalable SQL service called ‘redshift’. It is commonly used in data warehousing, and can scale to store PB of data. But going from raw data into a properly formatted table suitable for Redshift (or any other database for that matter) is often problematic.
We added a module to Essentia to address issues that include:
In order to link Essentia and Redshift, the following is needed:
Transferring data is very straightforward. First register the Redshift cluster with Essentia, and then provide the ETL operation which is in a format very similar to the ‘stream’ command as described in the ETL tutorial:
$ ess redshift register redshift_cluster_name
$ ess redshift stream Standard 2014-12-01 2014-12-10 "command" -U username -d table -p password
Here, ‘command’ is typically aq_pp
, but it can also be any other program that accepts text data from the stdin
and outputs the results to stdout.