Extract, Transform, Load
Raw to ready in minutes or hours instead of days or weeks
Essentia can accelerate your ETL workflow, especially when working with semi-structured or completely unstructured data. It reduces the number of steps to go from raw data to cleansed and normalized data, and it does so without ever moving or modifying the original data files. In addition to simplified workflow, Essentia’s scalable architecture allows any ETL job to be seamlessly split across multiple nodes, with linear scalability that makes it easy to estimate how much more compute resources need to be applied to achieve a specific performance target.
1. Data Collection
No ETL or pre-processing needs to be performed prior to storage, and data can be left in original state and format for the entirety of future data processing and analysis operations. This ensures data immutability.
2. Exploring the Data
When ready, solidify rule based virtual data categories that map back to the original files in object storage.
3. Preparing Data
You can also join data sets as well as apply data transformations on these combined data sets.
Iterative Data Transformations at Scale
With Essentia, target data is streamed directly from the raw files in cloud storage, cleansed and normalized on the fly, and then either outputted to file, loaded into an in memory db or piped directly to some other analytic tool.