aq_tool input specifications
aq_command ...
...
-f[,AtrLst] File [File ...]
-d ColSpec [ColSpec ...] | -d [SepSpec] ColSpec [[SepSpec] ColSpec ...]
...
Most aq_tool commands require input data to operate. Specification of the input data is generally done using two options:
The syntax and usages of these options are the same in all commands that support them. They are described in details below.
Note that certain aq_tools can take supplementary inputs. For example,
aq_pp has a -cat
option that takes the same input
attributes and column specs as the -f
and -d
combination.
The description below applies to those input specs well.
-f[,AtrLst] File [File ...]
The -f
option sets the input attributes (AtrLst
) and sources
(Files
).
If no -f
is given, data will be obtained from the standard input.
Each File
is a data source. It can be a regular file or a stream:
File
.-
(a single dash) as
File
.fifo@PipeName
as File
where PipeName
is the named pipe’s path. The program will create the
pipe if it does not exist or just use it if it does.
If the named pipe is known to exist already, PipeName
alone also works.connect@DomainName:Port
or connect@IP4:Port
or connect@[IP6]:Port
as File
. DomainName
/IP4
/IP6
and Port
are the address and
port to connect to.listen@DomainName:Port
or listen@IP4:Port
or listen@[IP6]:Port
or
listen@Port
as File
. DomainName
/IP4
/IP6
and Port
are the address and
port to listen at.Optional AtrLst
defines the input’s data format and handling
characteristics. It is a list of comma separated attributes containing:
Input format selection:
These attributes are mutually exclusive except for
sep
and csv
that can be used together.
If no input format attribute is given, csv
is assumed.
csv
- Input is in CSV format. This is the default iutput format.
Although CSV implies comma separated, sep=c
can be used to select
a different separator.
This format uses the generic column specification.sep=c
or sep=\xHH
- Input is in ‘c’ (single byte) separated value
format. \xHH
is a way to specify ‘c’ via its HEX value HH
.
This format uses the generic column specification.fix
- Input rows have the form
“Column1Column2...
”
without any separator between column values.
Instead, each column has a fixed byte length so that the columns
can be extracted by byte positions.
Individual column widths are defined as n=Len
attribute
in the generic column specification.div
- Input rows have the form
“[Separator1]Column1[Separator2]Column2...
”
where the sepatators that vary from field to field.
In this format, the separators are defined along with the columns
in the column specification for arbitrary separators.tab
- Input is in HTML table format. Each row has the form
“...<td>Column1</td>...<td>Column2</td>...</tr>
”.
In other words, a row begins at the first “<td ...>
” tag and
ends at a “</tr>
” tag.
This format uses the generic column specification.jsn
- Input is in JSON format. Each record must be an object
or an array that contains objects.
Columns are extracted from object members. The member specification is
given as an extended information under
the column specification for key extraction.xml
- Input is in XML format. Each record comes from a repeated
child (or subchild) under the document root. The child specification is
given as an extended information under
the column specification for key extraction.bin
- Input is in aq_tool’s internal binary format. This format is
designed to improve performance when the input data is also generated by
an aq_tool. Note that the other aq_tool must output its data in bin
format as well.
This format uses the generic column specification.aq
- Input comes from another aq_tool outputting in aq
format.
This is a special format that contains an embedded column spec -
no further column spec will be needed (nor accepted).Column spec attributes that apply to all columns:
esc
- Interpret ‘\’ is an escape character in all input fields.
Only applicable to sep
, csv
, fix
and div
formats.Positioning the start of input:
+Num[b|r|l]
- Specifies the number of bytes (b
suffix),
records (r
suffix) or lines (l
suffix) to skip before processing.
Line is the default.Error handling:
By default, all input related errors are fatal - the program will print an error message and exit.
nox
- Reject records with more fields than the column spec.
For sep
, csv
and tab
formats only. By default, these formats
silently ignore extra (trailing) fields in the input records.eok[=Num[/Rows]]
- Make recoverable input error non-fatal. If there is
an input parse error, the program will try to skip over the bad/broken
data until the beginning of the next record. If there is an input data
processing error, the program will just discard the offending record.
Optional Num
sets a finite number of errors per file to allow.
Num/Rows
allows Num
errors every Rows
rows.qui[=Num]
- Quiet. That is, suppress all input related error
Optional Num
sets a non-zero number of error messages to print
for each input file before becoming quiet. Typically used with eok
.Processing buffer:
bz=BufSize
- Set the per-record buffer size to BufSize
bytes.
It must be big enough to hold the data of all the columns in a record.
Default size is 64KB.-d ColSpec [ColSpec ...]
Define the columns of an input in
sep
, csv
, fix
, tab
or bin
format.
ColSpec
must be specified in the same order as they appear in the input.
Up to 2048 non X` type ColSpec
can be defined.
ColSpec
has the form Type[,AtrLst]:ColName
.
Supported Types
are:
S
- String (65535 byte max).F
- Double precision floating point (±2.23×10−308 to ±1.80×10308).L
- 64-bit unsigned integer (0 to 18,446,744,073,709,551,615).LS
- 64-bit signed integer (−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).I
- 32-bit unsigned integer (0 to 4,294,967,295).IS
- 32-bit signed integer (−2,147,483,648 to 2,147,483,647).IP
- v4/v6 address.X[Type]
- Marks an unwanted input column.
Type
is required only for a bin
input (optional otherwise).
It can have one of the above values.Optional AtrLst
determines how column data are to be extracted from the
input. It is a comma separated list containing:
n=Len
- Extract exactly Len
source bytes. Use this for a fixed
length data column.
Not applicable to tab
and bin
formats.esc
- Interpret ‘\’ as an escape character in the input data.
Do not use this attribute if the data contain multibyte character sequences
that use ‘\’ for encoding.
Not applicable to tab
and bin
formats.clf
- Interpret common log format like encoding in the input data.
Not applicable to tab
and bin
formats.\xHH
where HH
is the hex value of
the byte.hex
- Interpret integers in hexdecimal notation. Default is 10-based.
Starting 0x
is optional. For example, 100
or 0x100
is
converted to 256 instead of 100.
Not applicable to bin
format.trm
- Trim leading/trailing spaces from the field value.lo
, up
- Convert a string field value to lower or upper case.ColName
is the column name (case insensitive). It can contain up to
31 alphanumeric and ‘_’ characters. Its first character cannot be a digit.
It is optional if the column has an X
type.
Example:
$ aq_pp ... -d s:Col1 i,trm:Col2 ...
trm
attribute removes blanks around the value before it is converted
to an integer.$ aq_pp -f,fix ... -d s,n=5:Col1 i,n=12,trm:Col2 ...
fix
format. An n=Len
attribute is needed in all
column specs.$ aq_pp ... -d s:Col1 i,trm:Col2 ... -o,bin - | aq_pp -f,bin - -d s:C1 i,C2 ...
bin
format. Note that the input column types must
match those from the other command’s output columns.-d [SepSpec] ColSpec [[SepSpec] ColSpec ...]
Define the columns of an input in
div
format.
The specification is identical to the Generic Column Specification
except for the added SepSpec
.
The individual SepSpec
in this specification is designed for input data
that have multibyte separators and/or varying separators from field to field.
ColSpec
and SepSpec
must be specified in the same order as they appear
in the input.
Up to 2048 non X` type ColSpec
can be defined.
ColSpec
has the form Type[,AtrLst]:ColName
.
Supported Types
are:
S
- String (65535 byte max).F
- Double precision floating point (±2.23×10−308 to ±1.80×10308).L
- 64-bit unsigned integer (0 to 18,446,744,073,709,551,615).LS
- 64-bit signed integer (−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).I
- 32-bit unsigned integer (0 to 4,294,967,295).IS
- 32-bit signed integer (−2,147,483,648 to 2,147,483,647).IP
- v4/v6 address.X[Type]
- Marks an unwanted input column.
Type
is optional.
It can have one of the above values.Optional AtrLst
determines how a column’s value is to be extracted from the
input. It is a comma separated list containing:
n=Len
- Extract exactly Len
source bytes. Use this for a fixed
length data column.esc
- Interpret ‘\’ as an escape character in the input data.
Do not use this attribute if the data contain multibyte character sequences
that use ‘\’ for encoding.clf
- Interpret common log format like encoding in the input data.\xHH
where HH
is the hex value of
the byte.hex
- Interpret integers in hexdecimal notation. Default is 10-based.
Starting 0x
is optional. For example, 100
or 0x100
is
converted to 256 instead of 100.trm
- Trim leading/trailing spaces from the field value.lo
, up
- Convert a string field value to lower or upper case.ColName
is the column name (case insensitive). It can contain up to
31 alphanumeric and ‘_’ characters. Its first character cannot be a digit.
It is optional if the column has an X
type.
SepSpec
has the form SEP:SepStr
where SEP
(case insensitive) is a keyword and SepStr
is a literal separator of one
or more bytes. Note that SepStr
is taken as-is, there is no special
interpretation. A SepSpec
is generally needed between two adjacent
ColSpec
unless the former column has a n=Len
attribute.
Example:
$ aq_pp ... -d sep:' [' s:time_s sep:'] "' s,clf:url sep:'"' ...
-d ColSpec [ColSpec ...]
Define the columns of an input in
jsn
or xml
format. This spec differs from the other column specs in
these ways:
KeySpec
and not their positions.Up to 2048 non X` type ColSpec
can be defined.
ColSpec
has the form Type[,AtrLst]:ColName:KeySpec
.
Supported Types
are:
S
- String (65535 byte max).F
- Double precision floating point (±2.23×10−308 to ±1.80×10308).L
- 64-bit unsigned integer (0 to 18,446,744,073,709,551,615).LS
- 64-bit signed integer (−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).I
- 32-bit unsigned integer (0 to 4,294,967,295).IS
- 32-bit signed integer (−2,147,483,648 to 2,147,483,647).IP
- v4/v6 address.X[Type]
- Marks an unwanted input column.
Type
is optional.
It can have one of the above values.
Note that an X
type is generally not necessary; instead, only specify
the columns needed.Optional AtrLst
determines how column data are to be extracted from the
input. It is a comma separated list containing:
hex
- Interpret integers in hexdecimal notation. Default is 10-based.
Starting 0x
is optional. For example, 100
or 0x100
is
converted to 256 instead of 100.trm
- Trim leading/trailing spaces from the field value.lo
, up
- Convert a string field value to lower or upper case.base=BaseSpec
- Set an optional base for all the KeySpec
.
BaseSpec
is a list of dot separated elements as in
Element.Element....
. Each Element
has the form:KeyName
selects the value of an object member named KeyName
(case insensitive).[Num]
selects the Num-th
(zero-based) value in an array.
If Num
is *
, all values will be selected
(with certain key extraction limitations).KeyName[Num]
selects the Num-th
(zero-based) value in the array
belonging to an object member named KeyName
(case insensitive).
If Num
is *
, all values will be selected
(with certain key extraction limitations).ColName
is the column name (case insensitive). It can contain up to
31 alphanumeric and ‘_’ characters. Its first character cannot be a digit.
KeySpec
specifies which data field to extract for the column.
It is a list of dot separated elements as in
Element.Element....
. Each Element
has the form:
KeyName
selects the value of an object member named KeyName
(case insensitive).[Num]
selects the Num-th
(zero-based) value in an array.
If Num
is *
, all values will be selected
(with certain key extraction limitations).KeyName[Num]
selects the Num-th
(zero-based) value in the array
belonging to an object member named KeyName
(case insensitive).
If Num
is *
, all values will be selected
(with certain key extraction limitations).If a BaseSpec
attribute is given, KeySpec
will be appended to
BaseSpec
(with a dot in between) to form the actual key.
Example:
{ "Key1" : "Val1", "Key2" : { "Ary" : [ 0, 1, 2 ] } } $ aq_pp -f,jsn ... -d S:Col1:key1 I:Col2:key2.ary[*] ...
<root> <Key1>Val1</Key1> <Key2> <Ary>0</Ary> <Ary>1</Ary> <Ary>2</Ary> </Key2> </root> $ aq_pp -f,xml ... -d S:Col1:root.key1 I:Col2:root.key2.ary[*] ...
{ "k1" : { "k2" : { "k3" : { "k4" : "14", "k5" : "15" } } } } { "k1" : { "k2" : { "k3" : { "k4" : "24", "k5" : "25" } } } } { "k1" : { "k2" : { "k3" : { "k4" : "34", "k5" : "35" } } } } $ aq_pp -f,jsn ... -d I:Col1:k1.k2.k3.k4 I:Col2:k1.k2.k3.k5 ... $ aq_pp -f,jsn,base=k1.k2.k3 ... -d I:Col1:k4 I:Col2:k5 ...
<k1><k2><k3><k4>14</k4><k5>15</k5></k3></k2></k1> <k1><k2><k3><k4>24</k4><k5>25</k5></k3></k2></k1> <k1><k2><k3><k4>34</k4><k5>35</k5></k3></k2></k1> $ aq_pp -f,xml ... -d I:Col1:k1.k2.k3.k4 I:Col2:k1.k2.k3.k5 ... $ aq_pp -f,xml,base=k1.k2.k3 ... -d I:Col1:k4 I:Col2:k5 ...
[ { "k1" : { "k2" : { "k3" : { "k4" : "14", "k5" : "15" } } } }, { "k1" : { "k2" : { "k3" : { "k4" : "24", "k5" : "25" } } } }, { "k1" : { "k2" : { "k3" : { "k4" : "34", "k5" : "35" } } } } ] $ aq_pp -f,jsn,base=[*].k1.k2.k3 ... -d I:Col1:k4 I:Col2:k5 ...
[*]
” in base
to address all the objects in the top array.<k0> <k1><k2><k3><k4>14</k4><k5>15</k5></k3></k2></k1> <k1><k2><k3><k4>24</k4><k5>25</k5></k3></k2></k1> <k1><k2><k3><k4>34</k4><k5>35</k5></k3></k2></k1> </k0> $ aq_pp -f,xml,base=k0.k1[*].k2.k3 ... -d I:Col1:k4 I:Col2:k5 ...
[*]
” in base
to address all the “k1” entries.{ "k1" : { "k2" : { "k3" : [ { "k4" : "14", "k5" : "15" }, { "k4" : "24", "k5" : "25" } ] } } }, { "k1" : { "k2" : { "k3" : [ { "k4" : "34", "k5" : "35" } ] } } } $ aq_pp -f,jsn,base=k1.k2.k3[*] ... -d I:Col1:k4 I:Col2:k5 ...
[*]
” in base
to address all the objects in the “k3” array.<k1><k2><k3><k4>14</k4><k5>15</k5></k3> <k3><k4>24</k4><k5>25</k5></k3></k2></k1> <k1><k2><k3><k4>34</k4><k5>35</k5></k3></k2></k1> $ aq_pp -f,xml,base=k1.k2.k3[*] ... -d I:Col1:k4 I:Col2:k5 ...
[*]
” in base
to address all the objects in the “k3” elements.[ { "k1" : { "k2" : { "k3" : [ { "k4" : "14", "k5" : "15" }, { "k4" : "24", "k5" : "25" } ] } } }, { "k1" : { "k2" : { "k3" : [ { "k4" : "34", "k5" : "35" } ] } } } ] $ aq_pp -f,jsn,base=[*].k1.k2.k3[*] ... -d I:Col1:k4 I:Col2:k5 ...
[*]
” in base
to address all the objects in the top array and
all the objects in the “k3” array.<k0> <k1><k2><k3><k4>14</k4><k5>15</k5></k3> <k3><k4>24</k4><k5>25</k5></k3></k2></k1> <k1><k2><k3><k4>34</k4><k5>35</k5></k3></k2></k1> </k0> $ aq_pp -f,xml,base=k0.k1[*].k2.k3[*] ... -d I:Col1:k4 I:Col2:k5 ...
[*]
” in base
to address all the “k1” entries and
all the “k3” entries.[ 1,2 ] [ 3,4 ] $ aq_pp -f,jsn,base=[*] ... -d I:Col1: ... [ [ 1,2 ], [ 3,4 ] ] $ aq_pp -f,jsn,base=[*].[*] ... -d I:Col1: ... { "k1" : [ 1,2 ] } { "k1" : [ 3,4 ] } $ aq_pp -f,jsn,base=k1[*] ... -d I:Col1: ... <k1>1</k1> <k1>2</k1> <k1>3</k1> <k1>4</k1> $ aq_pp -f,xml,base=k1 ... -d I:Col1: ...
KeySpec
in a ColSpec
can be blank if base
is given.The [*]
extraction may not work sometimes because of the
stream based design of aq_tools. It has to do with the arrangement of the
input data. To illustrate, consider:
{ "Key1" : "Val1", "Key2" : { "Ary" : [ 0, 1, 2 ] } } $ aq_pp -f,jsn ... -d S:Col1:key1 I:Col2:key2.ary[*] ...
Extracting “key1” and “key2.ary” gives the expected result of “Val1,0”, “Val1,1” and “Val1,2”. However, if the input data is arranged differently, as in:
{ "Key2" : { "Ary" : [ 0, 1, 2 ] }, "Key1" : "Val1" } $ aq_pp -f,jsn ... -d S:Col1:key1 I:Col2:key2.ary[*] ...
The same command only extracted ”,0”, ”,1” and ”,2” - i.e., the value of
“key1” is missing. Due to its stream based design, aq_pp
outputs one record for each value of the inner most array “key2.ary”.
However, “key1” is not known when “key2.ary” is processed, so it is given
an empty string value.
To illustrate further, consider:
{ "Key2" : { "Ary" : [ 0, 1, 2 ] }, "Key1" : "Val1", "Key3" : { "Ary" : [ 10, 11, 12 ] } } $ aq_pp -f,jsn ... -d S:Col1:key1 I:Col2:key2.ary[*] I:Col3:key3.ary[*] ...
The result will be ”,0,0”, ”,1,0”, ”,2,0”, “Val1,0,10”, “Val1,0,11” and “Val1,0,12”. There are two inner most arrays of interest in this case. The first 3 result rows come from “key2.ary”, where “key1” and “key3.ary” are not known. The other result rows come from “key3.ary”, where “key1” is known but “key2.ary” is no longer in context.