Primary Lines in this Script
Line 8
- Store a vector in the database apache that aggregates the values in the pagecount column for each unique referrer.
- The pagecount column only contains the number ‘1’ so this serves to count the number of times any one referrer was seen in the web logs.
Line 12
- Create a new rule to take any files in your home directory with ‘accesslog’ and ‘125-access_log’ in their name and put them in the 125accesslogs category.
Line 14
- Pipe the files in the category 125accesslogs that were created between November 30th and December 7th, 2014 to the aq_pp command.
- In the aq_pp command, tell the preprocessor to take data from stdin, ignoring errors and not outputting any error messages.
- Then define the incoming data’s columns, skipping all of the columns except referrer, and create a column called pagecount that always contains the value 1.
- Then import the data to the vector in the apache database so the attributes listed there can be applied.
Line 21
- Export the aggregated data from the database, sorting by pagecount and limiting to the 25 most common referrers. Also export the total number of unique referrers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #!/bin/bash
# Simple essentia script to process Apache web logs
#
ess cluster set local
ess purge local
ess server reset
ess create database apache --ports=1
ess create vector vector1 s,pkey:referrer i,+add:pagecount
ess udbd restart
ess select local
ess category add 125accesslogs "$HOME/EssentiaPublic/*accesslog*125-access_log*"
ess stream 125accesslogs "2014-11-30" "2014-12-07" \
"aq_pp -f,qui,eok,div - -d X sep:' ' X sep:' ' X sep:' [' \
X sep:'] \"' X sep:' ' X sep:' ' X sep:'\" ' X sep:' ' \
X sep:' \"' s,clf:referrer \
sep:'\" \"' X sep:'\"' X \
-eval i:pagecount \"1\" -ddef -udb -imp apache:vector1"
ess exec "aq_udb -exp apache:vector1 -sort,dec pagecount -top 25; \
aq_udb -cnt apache:vector1"
|