aq_ord [-h] Global_Opt Input_Spec Sort_Spec Output_Spec
Global_Opt:
[-test] [-verb] [-stat] [-bz ReadBufSiz]
Input_Spec:
[-f[,AtrLst] File [File ...]] [-d ColSpec [ColSpec ...]]
Sort_Spec:
-sort[,AtrLst] ColTyp:ColNum | -sort[,AtrLst] ColName [ColName ...]
Output_Spec:
[-o[,AtrLst] File] [-c ColName [ColName ...]]
aq_ord
sorts input records according to the value of the sort columns.
Sort is done in memory, so it is fast.
However, the entire data set must fit into a single machine’s main memory.
The program offers two sort modes. One is fast and simple but less flexible.
The other requires more processing overhead but is more versatile.
Raw sort mode
In this mode, raw input rows are stored in memory as-is. Column values are not interpreted except for the sort column. Advantages are:
Disadvantages are:
Parsed sort mode
In this mode, a column spec must be defined. Columns are converted before they are stored in memory - numeric and IP address types are stored in binary forms, string type is hashed and the pointer to the hash entry is stored. Advantages are:
Disadvantages are:
-test
Test command line arguments and exit.
-verb
-stat
Print a record count summary line to stderr at the end of processing. The line has the form:
aq_ord:TagLab rec=Count err=Count
-bz ReadBufSiz
ReadBufSiz
is a number in bytes.-f[,AtrLst] File [File ...]
Set the input attributes and files.
If the data come from stdin, set File
to ‘-‘ (a single dash).
Optional AtrLst
is described under Input File Attributes.
If this option is not given, stdin is assumed.
Example:
$ aq_ord ... -f,+1l,eok file1 -f file2 ...
-d ColSpec [ColSpec ...]
Define the columns of the input records from all -f specs.
Only needed in Parsed sort mode.
ColSpec
has the form Type[,AtrLst]:ColName
.
Up to 256 ColSpec
can be defined (excluding X
type columns).
Supported Types
are:
S
- String.F
- Double precision floating point.L
- 64-bit unsigned integer.LS
- 64-bit signed integer.I
- 32-bit unsigned integer.IS
- 32-bit signed integer.IP
- v4/v6 address.X[Type]
- marks an unwanted input column.
Type is optional. It can be one of the above (default is S
).
ColName is also optional. Such a name is simply discarded.Optional AtrLst
is a comma separated list containing:
esc
- Denote that the input field uses ‘\’ as escape character. Data
exported from databases (e.g. MySQL) sometimes use this format. Be careful
when dealing with multibyte character set because ‘\’ can be part of a
multibyte sequence.noq
- Denote that the input field is not quoted. Any quotes in or around
the field are considered part of the field value.hex
- For numeric type. Denote that the input field is in hexdecimal
notation. Starting 0x
is optional. For example, 100
is
converted to 256 instead of 100.trm
- Trim leading/trailing spaces from input field value.lo
, up
- For S
type. Convert input field to lower/upper case.ColName
restrictions:
Example:
$ aq_ord ... -d s:Col1 s,lo:Col2 i,trm:Col3 ...
trm
attribute removes
blanks around the value before it is converted to an internal number.-sort[,AtrLst] ColTyp:ColNum
Define the Raw sort mode sort column.
ColTyp
specifies the sort column’s data type. See -d for a list of
types,``X`` is not supported.
ColNum
specifies the column number (position) of the sort column in each
row. ColNum
of the first column is 1.
Optional AtrLst
is a comma separated list containing:
dec
- Sort in descending order. Default order is ascending.
Descending sort is done by inverting the ascending sort result.Example:
$ aq_ord ... -sort s:2
-sort[,AtrLst] ColName [ColName ...]
Define the Parsed sort mode sort columns.
ColNames
must already be defined under -d.
Optional AtrLst
is a comma separated list containing:
dec
- Sort in descending order. Default order is ascending.
Descending sort is done by inverting the ascending sort result.Example:
$ aq_ord ... -d i:Col1 s:Col2 ... -sort Col2 Col1
[-o[,AtrLst] File] [-c ColName [ColName ...]]
Output data rows.
Optional “-o[,AtrLst] File
” sets the output attributes and file.
If File
is a ‘-‘ (a single dash), data will be written to stdout.
Optional AtrLst
is described under Output File Attributes.
In the Raw sort mode, most output attributes have no effect since
the records are not altered (only their order).
The -c
option is not applicable either.
In the Parsed sort mode,
optional “-c ColName [ColName ...]
” selects the columns to output.
ColName
refers to a column in the data set.
Without -c
, all columns are selected by default.
If -c
is specified without a previous -o
, output to stdout is
assumed.
Multiple sets of “-o ... -c ...
” can be specified.
Example:
$ aq_ord ... -d s:Col1 s:Col2 s:Col3 ... -o,esc,noq - -c Col2 Col1
If successful, the program exits with status 0. Otherwise, the program exits with a non-zero status code along error messages printed to stderr. Applicable exit codes are:
Each input file can have these comma separated attributes:
eok
- Make error non-fatal. If there is an input error, program will
try to skip over bad/broken records. If there is a record processing error,
program will just discard the record.qui
- Quiet; i.e., do not print any input/processing error message.tsv
- Input is in TSV format (default is CSV).sep=c
- Use separator ‘c’ (single byte) as column separactor.bin
- Input is in binary format (default is CSV).esc
- ‘\’ is an escape character in input fields (CSV or TSV).noq
- No quotes around fields (CSV).+Num[b|r|l]
- Specifies the number of bytes (b
suffix), records (r
suffix) or lines (no suffix or l
suffix) to skip before processing.By default, input files are assumed to be in formal CSV format. Use the
tsv
, esc
and noq
attributes to set input characteristics as needed.
Some output file can have these comma separated attributes:
app
- Append to file; otherwise, file is overwritten by default.bin
- Input in binary format (default is CSV).esc
- Use ‘\’ to escape ‘,’, ‘”’ and ‘\’ (CSV).noq
- Do not quote string fields (CSV).fmt_g
- Use “%g” as print format for F
type columns. Only use this
to aid data inspection (e.g., during integrity check or debugging).notitle
- Suppress the column name label row from the output.
A label row is normally included by default.By default, output is in CSV format. Use the esc
and noq
attributes to
set output characteristics as needed.