Open Data: Climate
Analyzing Global Weather Measurement from the GSDD for the period 1929 – 2009
The Essentia data science team wanted to demonstrate how easy it is for any researcher to immediately gain value from the large pools of public climate data sets by using Essentia in the cloud.
In this example, they used the public dataset Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD) available for free on Amazon Web Services. At a total data size of 20 gb, they estimated only a single Essentia instance would be necessary to perform all the necessary processing and analysis.
1. Connecting to the Data
The data was already publicly available on AWS as an EBS volume, so they just had to copy the snapshot to an EBS mounted to our EC2 instance, then transfer the files to an S3 bucket. From the Essentia UI, they added the S3 bucket as a data repository, and that’s it.
2. Exploring the Data
Then they made some exploratory data categories to get an idea about the structure and types of data. This was done without performing any ETL, or having to move data out of S3.
3. Preparing Data
Using the command line tools, they created a script that filtered out all non-us weather stations based on country code and then removed any stations that did not contain any data. Once the script is executed, all the target data is then loaded into Essentia’s in-memory DB.
Analyzing the data in ” R “
This is just a simple example of what can be accomplished using Essentia. In addition, the data science team could have mashed up with other data sets or implemented some predictive or machine learning algorithms for future forecasting and modeling.