One common type of log file are those that are collected from Apache web servers. However in many cases the raw log
needs some preprocessing before it can be properly used. Here we will demonstrate the use of logcnv
, another
application in the Essentia toolkit which allows us to perform ETL and analysis of Apache data in a fluid manner.
The script and the data used in this brief demo can be found on the git repository under usecases/
. The script
is designed to find out the most popular ‘referrers’.
It uses logcnv
to parse a line from the log and turn it into a CSV record. This record is then fed into aq_pp
for ETL operations, and then finally fed into the UDB database. We use the UDB to sort and count the number of
records for each referrer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ess instance local
ess spec reset
ess spec create database apache --ports=1
ess spec create vector vector1 s,pkey:referrer i,+add:pagecount
ess udbd restart
ess datastore select ./accesslog
ess datastore scan
ess datastore rule add "*125-access_log*" "125accesslogs" "YYYYMMDD"
ess datastore probe 125accesslogs --apply
ess task stream 125accesslogs "2014-11-30" "2014-12-07" \
"logcnv -f,eok - -d ip:ip sep:' ' s:rlog sep:' ' s:rusr sep:' [' \
i,tim:time sep:'] \"' s,clf,hl1:req_line1 sep:'\" ' i:res_status sep:' ' \
i:res_size sep:' \"' s,clf:referrer sep:'\" \"' s,clf:user_agent sep:'\"' X | \
aq_pp -f,qui,eok - -d X X X X X X X X X s:referrer X \
-evlc i:pagecount \"1\" -ddef -udb_imp apache:vector1"
ess task exec "aq_udb -exp apache:vector1 -sort pagecount -dec -top 25; \
aq_udb -cnt apache:vector1"
|
Line 5
Line 10
Line 19
Line 21