The Essentia Documentation Portal

We wrote Essentia to help solve the day-to-day ‘big data’ analysis problems we faced when processing different types of data from different types of users. Specifically, we needed a framework that would allow us to quickly:

Determine the data types stored in the data
Organize the data into a catalog that could be used to lookup exactly the data we needed.
Clean the data to enable analytics.

Essentia combines scalable, fast, ETL operations with an in-memory NoSQL database to simplify many common problems encountered by data engineers and scientists. The documentation in these pages is meant to train users on how to use and integrate Essentia into their data processing workflow. Another useful resource and supplement to this documentation are the Essentia Forums.

Tutorial Data

We maintain a GitHub repository that contains test data and source code for some of the tutorials and usecases you will find in this documentation.

To get started, pull the tutorial repository via:

$ git clone https://github.com/auriq/EssentiaPublic.git

The data and scripts relevant for most of the documentation tutorials are under tutorials and those relevant for the examples and integrations are under case studies.

To get started, go to Essentia Tutorials.

The Essentia Platform

Essentia is made to be run on the cloud, where we can spin up as many worker nodes as needed to scale to difficult problems. Currently Amazon and Microsoft clouds are supported. Essentia can also be used for an on premise cluster; contact us for details. We do offer a single node version that can be run from a desktop; however, the power of Essentia lies in the cloud.

Installation

Note

The tutorials assume you are using the bash shell.

Note

The Azure version does not support worker nodes. A release in the near future will sync up all the capabilities of both the Azure and Amazon versions of Essentia.