One common type of log file are those that are collected from Apache web servers. However in many cases the raw log
needs some preprocessing before it can be properly used. Here we will demonstrate the use of logcnv
, another
application in the Essentia toolkit which allows us to perform ETL and analysis of Apache data in a fluid manner.
The script and the data used in this brief demo can be found on the git repository under casestudies/apache
. The script
is called apache.sh
and is designed to find out the most popular ‘referrers’.
It uses logcnv
to parse a line from the log and turn it into a CSV record. This record is then fed into aq_pp
for ETL operations, and then finally fed into the UDB database. We use the UDB to sort and count the number of
records for each referrer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ess server reset
ess create database apache --ports=1
ess create vector vector1 s,pkey:referrer i,+add:pagecount
ess udbd restart
ess select local
ess category add 125accesslogs "$HOME/*accesslog*125-access_log*"
ess stream 125accesslogs "2014-11-30" "2014-12-07" \
"logcnv -f,eok - -d ip:ip sep:' ' s:rlog sep:' ' s:rusr sep:' [' \
i,tim:time sep:'] \"' s,clf:req_line1 sep:' ' s,clf:req_line2 sep:' ' s,clf:req_line3 sep:'\" ' i:res_status sep:' ' \
i:res_size sep:' \"' s,clf:referrer \
sep:'\" \"' s,clf:user_agent sep:'\"' X | \
aq_pp -f,qui,eok - -d X X X X X X X X X s:referrer X \
-eval i:pagecount \"1\" -ddef -udb_imp apache:vector1"
ess exec "aq_udb -exp apache:vector1 -sort pagecount -dec -top 25; \
aq_udb -cnt apache:vector1"
|
Line 3
Line 7
Line 9
Line 17