Log Analysis: Apache Logs
Analyzing Archived Apache Logs Stored In Cloud Based Data Stores
Apache logs are a common source of log data that almost every organization analyzes. Access logs are used both by IT and Marketing to understand what is being accessed, when and by whom. Error logs provide valuable information about the health and performance of their servers and the applications run on them.
Typically, due to storage constraints, these logs are rotated out and archived into low cost cloud storage like Amazon S3 or Microsoft Azure Blob. Sometimes, it becomes necessary to revisit these files after initial analysis for security purposes, or deeper dives into customer behavior, or to enrich with other data sources.
Although there are plenty of tools out there to analyze logs, both in real-time and batch, these solutions are not readily capable of analyzing large quantities of Apache logs that have been compressed, archived and stored in cloud storage. Among the many issues archived files in the cloud present are:
- Large overall data set size
- Target data can be spread across multiple files, in different directories and/or buckets
- Decompression of compressed files slows down data processing
- Significant data cleansing is required due to high occurrence of errors, duplicates, or omissions in the data
With Essentia, preparing and analyzing vast amounts of archived log data is easily performed.
1. Process & Analyze In-Place
2. Log Specific Tools
3. Scale Out or Up Seamlessly
Analysis & Visualization
For example, your data from the cloud can be efficiently parsed, cleaned, and reduced directly into Tableau Extract files (.tde) to generate interactive and compelling visualizations as shown in the examples above.
You can also directly load your prepared log data into any other analytic platform or relational database like Amazon Redshift, MySQL, or Hadoop as part of your data pipeline.