Datastore category setup and management
Video Demo
Create category
- Click on Data Viewer in the top menu and select a Repository from the drop down.
- Click on the +Add icon to open the input form.
- Define your Category by entering:
- Category name - any arbitrary name (no spaces)
- Pattern - globular matching pattern to describe what types of files to include in your category
- Comment - any arbitrary comment
- Define Category Options (optional) to speed up data scanning (see section 2 for more detail).
- Click on the Save button to create your category. This may take a few minutes while Essentia scans your data.
- After scan is complete, the derived column specifications will be displayed along with metadata about your files. Also, you can now choose Direct Edit to edit the column specification (see section 3 for more detail).
- Your newly added category will be displayed in the category table for the selected repository. From here you can edit, copy or delete a category, view a sample of the data or see the list of files that make up your category.
Define Category Options
- Follow steps 1-3 of creating a category.
- Define either or both of the following options:
- Date Format - matching date format pattern found in filename structure
- Delimiter - the type of delimiter (comma, space, tab, etc) used in your data.
- Or click on the options drop down arrow to display category options and define either of the following options:
- Archive - matching pattern to describe filenames within a compressed file
- Preprocess - command to modify your raw data before it is scanned by Essentia.
Directly Edit Column Specification
- Follow steps 1-5 of creating a category.
- Click on the Direct Edit checkbox to allow the current column spec to be edited.
- From here, you can change column headers (no spaces) and assign data types in case the scan was not correct.
- Click on the Save button to save your changes.
Exploring Your Data Repository
- Click Explore.
- Click the + next to a directory to navigate through the directories on your Repository.
- Your current path is displayed at the top, next to your repository name. This is useful when defining a pattern for the files you want to group into a category.
You can click Size to calculate the total number of files and bytes in your Repository.
You can click Refresh to get the latest list of files on your Repository.
Query setup and management
Video Demo
Create a Query
- Click on Direct Data Query in the top menu and and select a Repository from the drop down
- Enter your SQL like query in the Input your query here area. You can optionally enter a label for this query so you can reference it later.
- Click on the Run button to view your query results on your screen, download your query results into a file on your instance by clicking Download and entering a filename, or generate an OData link for easy loading into Tableau by clicking OData.
- From this point you can access a saved query or run a new query.
Note: If you need to view available categories, click on the Categories drop down arrow to view a list of available categories.
Query Format
select [column_name] | [*] from [category_name]:[start_date | *]:[end_date | *] where ... order by ... limit ...
select count(distinct [column_name] | [*]) from [category_name]:[start_date | *]:[end_date | *] where ...
select [column_name], count(*) from [category_name]:[start_date | *]:[end_date | *] where ... group by [column_name]
Rules
The first query format above is a "select" query.
The second and third query formats above are "count" queries.
1. Group By is NOT supported for SELECT queries.
2. Order By is NOT supported for COUNT queries.
3. Limit is NOT supported for COUNT queries.
4. Group By can only be used when there is no DISTINCT in COUNT queries.
Example
select * from myfavoritedata:*:* where payment >= 50
select * from purchase:2014-09-01:2014-09-15 where articleID>=46 limit 10
To see more examples of the types of queries we allow and work with some sample queries of our public data, please go through our Direct Data Query Examples
Transfer Data with OData
- Create a query following the steps above and click the OData button to generate an OData link to your query.
- Copy this Link using the Copy option on the right of the URL box or highlight the URL and copy it to your clipboard.
- Open Tableau and go to the “To a server” connection section.
- Select OData. Note, you need to click “More Servers” to see the OData option if you are using Tableau Desktop.
- Paste the URL into the box after “Server:” and select No Authentication (this should be the default).
Note:
Our OData service is still in its Beta version and is currently limited to sending 10,000 lines of data (and 100,000 values) into Tableau. However, you can query larger amounts of data as long as the output is less than 10,000 lines (and 100,000 values). This will be improved in the full version, which will be released in the near future, along with support for OData clients other than Tableau.
Working with Saved Queries
- Select your Saved Query from the dropdown. The query should appear in the “Input your query here” area. If you labeled your query, the label should appear next to the saved query dropdown.
- Now you can click the Run button to view your query results on your screen, download your query results into a file on your instance by clicking Download and entering a filename, access the query via an http link by clicking HTTP, or generate an OData link for easy loading into Tableau by clicking OData.
You can generate a new HTTP link for your query by clicking HTTP and then clicking Reset. This is useful if you want to share the link with others, but only want to provide them access for a limited amount of time.
You can search your saved queries by entering any parts of your desired queries into the Search box.
Using RStudio
Setting up RStudio
If you plan to use our RStudio Integration and you haven’t enabled it yet, you need to:
- Go to the AWS Console.
- Right Click on your Instance, click Instance State, and Stop your instance.
- Right Click on your Instance, click Instance Settings, and click View/Change User Data
- Enter “rstudio”.
- Right Click on your Instance, click Instance State, and Start your instance.
Accessing RStudio
Go to the UI and then click the RStudio link in the top menu.
Enter “essentia” as the username and enter the Instance ID of your instance as your password.
You can now use all the capabilities of RStudio directly from your browser.
Running Essentia via RStudio
Essentia’s R Integration package is installed by default. To access it, you simply need to enter the R command library(RESS). See our R Integration Tutorial to see how to use the RESS package to integrate R and Essentia.
To run an Essentia Bash Script that already exists on your file system, you can simply run it from within RStudio by navigating to the directory that contains your script and entering system(“sh Your_Script_name.sh”).
To create an Essentia Bash Script from within RStudio:
- Click File → New File → Text File
- Click File → Save As
- Enter your desired filename followed by .sh (Ex: Your_Script_Name.sh)
You are now free to enter any Essentia commands to accomplish your data preparation, integration, or analysis.
To Save your script, use a shortcut or click File → Save.
To run your script, navigate to the directory that contains your script and then either run system(“sh Your_Script_name.sh”) or click on Run Script in the top right of the Script Panel.