logcnv [-h] Global_Opt Input_Spec Output_Spec
Global_Opt:
[-test] [-verb] [-bz ReadBufSiz]
Input_Spec:
[-f[,AtrLst] File [File ...]] [-d ColSpec [ColSpec ...]]
Output_Spec:
[-o[,AtrLst] File] [-c ColName [ColName ...]]
logcnv
is a stream-based log format converter.
It processes input log files with a given column/separator spec and
outputs the same data in CSV or binary format.
Supported log format is:
[separator]data_col[data_col ...][separator[data_col ...]][\r]\n
With its stream-based design, logcnv
can process an unlimited amount of
data using a constant amount of memory. The output can either be stored
in a file or piped into another data processing component such as aq_pp.
-test
Test command line arguments and exit.
-verb
-bz ReadBufSiz
ReadBufSiz
is a number in bytes.-f[,AtrLst] File [File ...]
Set the input attributes and files.
If the data come from stdin, set File
to ‘-‘ (a single dash).
Optional AtrLst
is described under Input File Attributes.
If no -f option is specified, stdin is assumed.
Example:
$ logcnv ... -f,+1l,eok file1 -f file2 ...
-d ColSpec [ColSpec ...]
Define the data columns and separators of the input records from all -f specs. Supported record format is:
[separator]data_col[data_col ...][separator[data_col ...]][\r]\n
For a separator, ColSpec
has the form SEP:SepStr
where SEP
(case insensitive) is a keyword and SepStr
is the literal separator.
The separator string is taken as-is, no escape sequence is interpreted.
For a data column, ColSpec
has the form Type[,AtrLst]:ColName
.
Up to 256 ColSpec
can be defined (excluding X
type columns).
Supported Types
are:
S
- String.F
- Double precision floating point.L
- 64-bit unsigned integer.LS
- 64-bit signed integer.I
- 32-bit unsigned integer.IS
- 32-bit signed integer.IP
- v4/v6 address.X[Type]
- marks an unwanted input column.
Type is optional. It can be one of the above (default is S
).
ColName is also optional. Such a name is simply discarded.Optional AtrLst
is a comma separated list containing:
clf
- Denote that the input field uses Apache 2.0.46 and up escape
sequences:esc
- Denote that the input field uses ‘\’ as escape character.
This is different from clf
in that each ‘\’ only escape one
following byte.hex
- For numeric type. Denote that the input field is in hexdecimal
notation. Starting 0x
is optional. For example, 100
is
converted to 256 instead of 100.trm
- Trim leading/trailing spaces from input field value.lo
, up
- For S
type. Convert input field to lower/upper case.tim
- For I
or IS
type. Denote that the input field is in
Apache default timestamp format (e.g., ‘14/Feb/2009:08:31:30 +0900’).
The field will be converted back to UNIX seconds (e.g., 1234567890).n=Len
- Extract exactly Len
bytes. Use this for a fixed length
data column. If a data column has a length spec, it can be followed by
another data column.ColName
restrictions:
Example:
$ logcnv ... -d IP:h SEP:' ' S:l SEP:' ' S:u SEP:' [' I,tim:t SEP:'] "' S,clf:r SEP:'" ' I:s SEP:' ' I:b ...
[-o[,AtrLst] File] [-c ColName [ColName ...]]
Output data rows.
Optional “-o[,AtrLst] File
” sets the output attributes and file.
If File
is a ‘-‘ (a single dash), data will be written to stdout.
Optional AtrLst
is described under Output File Attributes.
Optional “-c ColName [ColName ...]
” selects the columns to output.
Without -c
, all columns are selected by default.
If -c
is specified without a previous -o
, output to stdout is
assumed.
Multiple sets of “-o ... -c ...
” can be specified.
Example:
$ logcnv ... -d s:Col1 s:Col2 s:Col3 ... -o,esc,noq - -c Col2 Col1
If successful, the program exits with status 0. Otherwise, the program exits with a non-zero status code along error messages printed to stderr. Applicable exit codes are:
Each input file can have these comma separated attributes:
eok
- Make error non-fatal. If there is an input error, program will
try to skip over bad/broken records. If there is a record processing error,
program will just discard the record.qui
- Quiet; i.e., do not print any input/processing error message.+Num[b|r|l]
- Specifies the number of bytes (b
suffix), records (r
suffix) or lines (no suffix or l
suffix) to skip before processing.Some output file can have these comma separated attributes:
app
- Append to file; otherwise, file is overwritten by default.bin
- Input in binary format (default is CSV).esc
- Use ‘\’ to escape ‘,’, ‘”’ and ‘\’ (CSV).noq
- Do not quote string fields (CSV).fmt_g
- Use “%g” as print format for F
type columns. Only use this
to aid data inspection (e.g., during integrity check or debugging).notitle
- Suppress the column name label row from the output.
A label row is normally included by default.By default, output is in CSV format. Use the esc
and noq
attributes to
set output characteristics as needed.
The following table shows the corresponding logcnv column spec for some common format strings:
Separator specs must be added to complete the record description. For example, consider this Common Log Format spec string:
%h %l %u %t \"%r\" %>s %b
It can be represented by these column spec:
IP:h SEP:' ' S:l SEP:' ' S:u SEP:' [' I,tim:t SEP:'] "' S,clf:r SEP:'" ' I:s SEP:' ' I:b
or
IP:h SEP:' ' S:l SEP:' ' S:u SEP:' [' I,tim:t SEP:'] "' S:r_method SEP:' ' S,clf:r_page SEP:' ' S:r_version SEP:'" ' I:s SEP:' ' I:b