The Essentia Documentation Portal

We wrote Essentia to help solve the day-to-day ‘big data’ analysis problems we faced when processing different types of data from different types of users. Specifically, we needed a framework that would allow us to quickly:

  • Determine the data types stored in the data
  • Organize the data into a catalog that could be used to lookup exactly the data we needed.
  • Clean the data to enable analytics.

Essentia combines scalable, fast, Data Processing operations with an in-memory NoSQL database to simplify many common problems encountered by data engineers and scientists. The documentation in these pages is meant to train users on how to use and integrate Essentia into their data processing workflow. Another useful resource and supplement to this documentation are the Essentia Forums.

Tutorial Data

We maintain a GitHub repository that contains test data and source code for some of the tutorials and usecases you will find in this documentation.

To get started, pull the tutorial repository via:

$ git clone

The data and scripts relevant for most of the documentation tutorials are under tutorials and those relevant for the examples and integrations are under case studies.

To get started, go to Essentia Tutorials.

The Essentia Platform

Essentia is made to be run on the cloud, where we can spin up as many worker nodes as needed to scale to difficult problems. Currently the Amazon cloud is supported. Essentia can also be used for an on premise cluster; contact us for details. We do offer a single node version that can be run from a desktop; however, the power of Essentia lies in the cloud. You can install this single node version of Essentia on an Azure Linux VM if you want to run Essentia on the Microsoft cloud.


The tutorials assume you are using the bash shell.