Essentia Overview
Essentia is a highly efficient and highly scalable solution for managing, processing and analyzing vast amounts of unstructured, semi-structured and structured data stored in cloud data lakes. This can be categorized as big data and/or complex data.
The two main components of Essentia is a data pre-processing engine and an analytic engine. It utilizes parallel processing so all operations can be distributed across multiple virtual machines, enabling Essentia to scale to handle any data processing workload. Data analysis is performed in-memory for incredibly fast results.
Architecture
Support for Popular Tools and Languages
Essentia is a highly efficient and highly scalable solution for managing, processing and analyzing vast amounts of unstructured, semi-structured and structured data stored in cloud data lakes. This can be categorized as big data and/or complex data.
The two main components of Essentia is a data pre-processing engine and an analytic engine. It utilizes parallel processing so all operations can be distributed across multiple virtual machines, enabling Essentia to scale to handle any data processing workload. Data analysis is performed in-memory for incredibly fast results.
Capabilities
Data Collection & Ingestion
Essentia eliminates the need to perform any ETL on the original data at the time of ingest. Once data is transferred to Amazon S3 or Azure Blob, it can be left in-place and as-is for the entirety of future data processing and analysis workflows.
Data Virtualization
Data Exploration
Scan files to view any underlying structure, view sample data, run SQL like queries, and test out pre-processing rules. This can all be done with the raw data, even while in compressed format, on a single node. You don’t have to be an expert on the source data to begin to explore and get meaningful insights with Essentia.
Advanced Analysis
Essentia scripts can be run from the web UI or from the CLI (command line interface). Scripts enable users to apply more powerful and complicated queries to one or more categories. An integrated in-memory, parallelized database enables fast, iterative analysis of your data. Data in memory can be integrated with machine learning algorithms for even more complex analyses.