Datastore category setup and management

You are here:
← All Topics

Create category

Video Demo 1

Video Demo 2

  1. Click on Categorize in the top menu and select a Repository from the drop down.
  2. Click on the + icon to open the input form.
  3. Define your Category by entering:
    • Category Name – any arbitrary name (no spaces).
    • Pattern – globular matching pattern(s) to describe what types of files to include in your category.
  4. Optionally define any number of the following options to speed up data scanning or make data management easier:
    • Comment – any arbitrary comment.
    • Exclude – globular matching pattern to describe what files to not include in your category.
      Note: this further restricts the files included by your Pattern.
    • Use cached file list – reference the local file list for the current category instead of accessing the repository.
  5. Or click on the Advanced Options drop down arrow to display additional category options and define either or both of the following options:
    • Date Format – matching date extraction pattern found in filename structure. Specify a regular expression pattern to extract the date from your file path/name, see Date Regex.
    • Delimiter – the type of delimiter (comma, space, tab, etc) used in your data.
  6. Click on the Save button to create your category. This may take a few minutes while Essentia scans your data.
  7. After the scan is complete, the derived column specifications will be displayed along with metadata about your files. Also, you can now choose to do any of the following:
    • Define a Preprocess Command
    • Select a Pattern for Internal Files within Archive Files
    • Directly Edit Column Specification
  8. Your newly added category will be displayed in the category table for the selected repository. From here you can edit, copy, scan, or delete a category, view a sample of the data or see the list of files that make up your category.

Define a Preprocess Command

  1. Follow steps 1-6 of creating a category.
  2. Click on the Advanced Options drop down and enter a Preprocess Command next to Preprocess. You can then Check or save this command to preprocess your data:
    • Preprocess – command to modify your raw data before it is scanned by Essentia.

Select a Pattern for Internal Files within Archive Files

  1. Follow steps 1-6 of creating a category.
  2. Click on the Advanced Options drop down and enter a pattern next to Archive.
    • Archive – matching pattern to describe filenames within a compressed or uncompressed archive file.

Directly Edit Column Specification

  1. Follow steps 1-6 of creating a category.
  2. Choose the table or text display icon on the far right of Column Spec Details to display the determined Column Specifcation in your chosen format.
  3. From here, you can change column headers (no spaces) and assign data types in case the scan was not correct.
  4. Click on the Save button to save your changes.

In the main Categorize tab of the UI you can also click the download symbol to the right of the search box to save all of your categories for a single Repository to an Essentia settings file. Similarly, you can click the upload symbol to the right of the search box to read in all of your categories for a single Repository from an Essentia settings file. This makes sharing your categories with other people easy and makes your work easily transferable between computers.

Caution
Uploading an Essentia settings file for a data repository to your instance will overwrite any existing categories you have defined for that repository.

If new files have been uploaded to your repository recently, you should click Refresh to update all of the summary information shown for your categories in the Categorize tab of the UI. Whenever you use a category for analysis, however, that category always refreshes itself to ensure that your analysis uses the most accurate view of the files in your Repository. The Refresh button in the Categorize tab is only needed to update the displayed summary information.

By clicking the number in the File Count column of your category, you can view a graph displaying the Daily Trend of File Count. You can also click the number in the Total Size column to view a graph showing the Daily Trend of File Size for that category. These graphs can be very useful in tracking the day-to-day changes to your category. In particular, File Size is an important metric since as the File Size increases for a category, your analyses using that category may require instances with more resources (cpu, memory, disk space, …).

By clicking the icon in the right-most column of the category table, you can access additional options to gain information about or manage each category:

  • List Files: View a list of the files currently matched by your category pattern.
  • Sample: View a sample of the raw data in the category.
  • Scan: Run a deep scan of the category to determine detailed information such as type and number of unique elements for each column in that category’s data.
  • Download Files: Save up to 1GB of files from your category onto your local computer.
  • Copy: Create a new category from your existing category. The new category will need to be named and will use the same file pattern and column specification as the original category by default.
  • Export: Save your category defintion for your Repository to an Essentia settings file. This file can then be shared with others or imported to other computers you use to load your category definition.
  • Remove: Remove your category definition. This step cannot be undone!