Apache Spark and Essentia are both scalable distributed processing platforms for big data workloads. They both allow to quickly transform data at scales larger than Python can handle effectively. Apache Spark is supported in Amazon EMR, and Essentia is offered through AWS Marketplace. Performance differences across platforms are not well-known, so I tested both to […] continue reading »
Author: auriqadmin
Case Study: Massive Data Integration and Analytics for Large Airline Company
In 2016, AuriQ Systems was engaged by a large airline company to assist in a massive data integration and analytics project. AuriQ is a Select Technology Partner with Amazon Web Services (AWS). The company wanted to integrate all their customer touchpoint data to better understand the behavior and intent of their customers. These data sources […] continue reading »
Analyzing Archived Apache Logs Stored In Cloud Based Data Stores
Objective Apache logs are a common source of log data that almost every organization analyzes. Access logs are used both by IT and Marketing to understand what is being accessed, when and by whom. Error logs provide valuable information about the health and performance of their servers and the applications run on them. Typically, due […] continue reading »
Building an End-to-End Marketing Analysis and Attribution Solution
Objective Providing marketing attribution modeling at big data scale is difficult for most data analysis platforms. Ingesting a variety of data from different sources in multiple formats can be a data integration nightmare as there are multiple levels of ETL to perform, then joining the disparate data into a cohesive customer journey table, and finally […] continue reading »
Analyzing Global Weather Measurement from the GSDD for the period 1929 – 2009
Objective The Essentia data science team wanted to demonstrate how easy it is for any researcher to immediately gain value from the large pools of public climate data sets by using Essentia in the cloud. In this example, they used the public dataset Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD) available for free on Amazon […] continue reading »
Boosting Trading Models with Sagemaker and Essentia
Building accurate models takes a great deal of time, resources, and technical ability. The biggest challenge? You almost never know what model or feature combination will end up working. Like many others, I was having to spend painstaking amounts of time preparing data for a specific algorithm and then developing all the code and processing […] continue reading »