logcnv [-h] Global_Opt Input_Spec Output_Spec
Global_Opt:
[-test] [-verb] [-bz ReadBufSiz]
Input_Spec:
[-f[,AtrLst] File [File ...]] [-d ColSpec [ColSpec ...]]
Output_Spec:
[[-o[,AtrLst] File] [-c ColName [ColName ...]] [-notitle]]
logcnv
is a stream-based log converter.
It processes input log files with a given column/separator spec and
outputs the same data in CSV or binary format.
Supported log format is:
[separator]data_col[separator[data_col ...]][\r]\n
With its stream-based design, logcnv
can process an unlimited amount of
data using a constant amount of memory. The output can either be stored
in a file or piped into another data processing component such as aq_pp.
-test
Test command line arguments and exit.
-verb
-bz ReadBufSiz
ReadBufSiz
is a number in bytes.-f[,AtrLst] File [File ...]
Set the input attributes and files.
If the data come from stdin, set File
to ‘-‘ (a single dash).
Optional AtrLst
is described under Input File Attributes.
If no -f option is specified, stdin is assumed.
Example:
$ logcnv ... -f,+1l,eok file1 -f file2 ...
-d ColSpec [ColSpec ...]
Define the data columns and separators of the input records from all -f specs. Supported record format is:
[separator]data_col[separator[data_col ...]][\r]\n
For a separator, ColSpec
has the form SEP:SepStr
where SEP
(case insensitive) is a keyword and SepStr
is the literal separator
(1 to 31 bytes long).
For a data column,``ColSpec`` has the form Type[,AtrLst]:ColName
.
Supported Types
are:
S
- String.F
- Double precision floating point.L
- 64-bit unsigned integer.LS
- 64-bit signed integer.I
- 32-bit unsigned integer.IS
- 32-bit signed integer.IP
- v4/v6 address.X[Type]
- marks an unwanted input column.
Type is optional. It can be one of the above (default is S
).
ColName is also optional. Such a name is simply discarded.Up to 256 ColSpec
can be defined (excluding X
type columns).
Optional AtrLst
is a comma separated list containing:
clf
- Denote that the input field uses Apache 2.0.46 and up escape
sequences:
esc
- Denote that the input field uses ‘\’ as escape character.
This is different from clf
in that each ‘\’ only escape one
following byte.
hex
- For numeric type. Denote that the input field is in hexdecimal
notation. Starting 0x
is optional. For example, 100
is
converted to 256 instead of 100.
tim
- For I
or IS
type. Denote that the input field is in
Apache default timestamp format (e.g., ‘14/Feb/2009:08:31:30 +0900’).
The field will be converted back to UNIX seconds (e.g., 1234567890).
hl1
- For S
type. Denote that the input field contains the
HTTP request line 1 as in:
GET /index.html?query HTTP/1.0
The field will be
broken up into ColName_f1
(“GET”), ColName_f2
(“/index.html?query”)
and ColName_f3
(“HTTP/1.0”) on output.
ColName
restrictions:
hl1
attribute is splitted into 3
columns - ColName_f1
, ColName_f2
and ColName_f3
;
in this case, ColName
must not exceed 28 bytes long.Example:
$ logcnv ... -d IP:h SEP:' ' S:l SEP:' ' S:u SEP:' [' I,tim:t SEP:'] "' S,clf,hl1:r SEP:'" ' I:s SEP:' ' I:b ...
[-o[,AtrLst] File] [-c ColName [ColName ...]] [-notitle]
Output data rows.
Optional “-o[,AtrLst] File
” sets the output attributes and file.
If File
is a ‘-‘ (a single dash), data will be written to stdout.
Optional AtrLst
is described under Output File Attributes.
Optional “-c ColName [ColName ...]
” selects the columns to output.
Recall that an input column with an hl1
attribute is splitted into 3
columns on output - ColName_f1
, ColName_f2
and ColName_f3
;
selection must be done on those 3 names individually.
Without -c
, all columns are selected by default.
If -c
is specified without a previous -o
, output to stdout is
assumed.
Optional -notitle
suppresses the column name label row from the output.
A label row is normally included by default.
Multiple sets of “-o ... -c ... -notitle
” can be specified.
Example:
$ logcnv ... -d s:Col1 s:Col2 s:Col3 ... -o,esc,noq - -c Col2 Col1
If successful, the program exits with status 0. Otherwise, the program exits with a non-zero status code along error messages printed to stderr. Applicable exit codes are:
Each input file can have these comma separated attributes:
eok
- Make error non-fatal. If there is an input error, program will
try to skip over bad/broken records. If there is a record processing error,
program will just discard the record.qui
- Quiet; i.e., do not print any input/processing error message.+Num[b|r|l]
- Specifies the number of bytes (b
suffix), records (r
suffix) or lines (no suffix or l
suffix) to skip before processing.Some output file can have these comma separated attributes:
app
- Append to file; otherwise, file is overwritten by default.bin
- Input in binary format (default is CSV).esc
- Use ‘\’ to escape ‘,’, ‘”’ and ‘\’ (CSV).noq
- Do not quote string fields (CSV).fmt_g
- Use “%g” as print format for F
type columns. Only use this
to aid data inspection (e.g., during integrity check or debugging).By default, output is in CSV format. Use the esc
and noq
attributes to
set output characteristics as needed.
The following table shows the corresponding logcnv column spec for some common format strings:
Separator specs must be added to complete the record description. For example, consider this Common Log Format spec string:
%h %l %u %t \"%r\" %>s %b
It can be represented by these column spec:
IP:h SEP:' ' S:l SEP:' ' S:u SEP:' [' I,tim:t SEP:'] "' S,clf,hl1:r SEP:'" ' I:s SEP:' ' I:b