Record preprocessor
aq_pp [-h] Global_Opt Input_Spec Prep_Spec Process_Spec Output_Spec
Global_Opt:
[-verb] [-stat] [-test]
Input_Spec:
[-f[,AtrLst] File [File ...]] [-d ColSpec [ColSpec ...]] |
[-exp[,AtrLst]|-cnt[,AtrLst] DbName[:TabName] [ExpOpts ...] --]
[-cat[,AtrLst] File [File ...] ColSpec [ColSpec ...]]
Prep_Spec:
[-seed RandSeed]
[-rownum StartNum]
[-var ColSpec Val]
[-alias ColName AltName]
[-renam ColName NewName]
Process_Spec:
[-eval ColSpec|ColName Expr]
[-mapf[,AtrLst] ColName MapFrom] [-mapc ColSpec|ColName MapTo]
[-kenc ColSpec|ColName ColName [ColName ...]]
[-kdec ColName ColSpec|ColName[+] [ColSpec|ColName[+] ...]]
[-filt FilterSpec]
[-map[,AtrLst] ColName MapFrom MapTo]
[-sub[,AtrLst] ColName File [File ...] [ColSPec ...]]
[-grep[,AtrLst] ColName File [File ...] [ColSPec ...]]
[-cmb[,AtrLst] File [File ...] ColSpec [ColSpec ...]]
[-pmod ModSpec [ModSrc]]
Output_Spec:
[-o[,AtrLst] File] [-c ColName [ColName ...]]
[-ovar[,AtrLst] File [-c ColName [ColName ...]]]
[-imp[,AtrLst] DbName[:TabName] [-mod ModSpec [ModSrc]]
aq_pp
is a stream-based record processing tool.
It loads and processes records on at a time through these simple steps:
Other characteristics of the tool include:
With its stream-based design, aq_pp
can process an unlimited amount of
data using a constant amount of memory.
For this reason, it is well suited for the pre-processing of large amount of
raw data, where the extracted and transformed result is used to generate
higher level analytics.
-test
Test command line arguments and exit.
If specified twice (-test -test
), a more throughout test will be
attempted. For example, the program will try to load lookup files and
connect to Udb in test mode.
-verb
-stat
Print a record count summary line to stderr at the end of processing. The line has the form:
aq_pp: rec=Count err=Count out=Count
-f[,AtrLst] File [File ...]
Set the input attributes and files. See the aq_tool input specifications manual for details.
Example:
$ aq_pp ... -f,+1l file1 file2 ...
-d ColSpec [ColSpec ...]
Define the input data columns.
See the aq_tool input specifications manual for details.
In general, ColSpec
has the form Type[,AtrLst]:ColName
.
Supported Types
are:
S
- String.F
- Double precision floating point.L
- 64-bit unsigned integer.LS
- 64-bit signed integer.I
- 32-bit unsigned integer.IS
- 32-bit signed integer.IP
- v4/v6 address.Optional AtrLst
is a comma separated list of column specific attributes.
ColName
is the column name (case insensitive). It can contain up to
31 alphanumeric and ‘_’ characters. Its first character cannot be a digit.
Example:
$ aq_pp ... -d s:Col1 s,lo:Col2 i,trm:Col3 ...
trm
attribute removes blanks around the value before it is converted to
an internal number.-exp[,AtrLst]|-cnt[,AtrLst] DbName[:TabName] [ExpOpts ...] --
Get the input data from an Udb export or count operation.
This will set the data source as well as the column definitions,
so -f`_ and -d are not needed.
DbName
is the database name (see Target Udb Database).
TabName
is a table/vector name in the database to export.
If TabName
is not given or if it is a ”.” (a dot), the primary keys
will be exported/counted.
Optional AtrLst
is a comma separated list containing:
spec=UdbSpec
- Set the spec file directly (see Target Udb Database).ExpOpts
are the -exp
or -cnt
related options as decribed in
aq_udb (except -o
which is not applicable here).
A --
must be specified following the last ExpOpts
. Options given
after --
will be interpreted as aq_pp
options.
Example:
$ aq_pp ... -exp mydb:Test -filt 'Col3 > 123456789' -- ... $ aq_pp ... -exp mydb:Test -- -filt 'Col3 > 123456789' ...
aq_pp
.-cat[,AtrLst] File [File ...] ColSpec [ColSpec ...]
Add rows from Files
to the -f data set.
The file and column specifications are the same as in the -f and -d
options.
See the aq_tool input specifications manual for details.
Note that the columns need not be the same as those from -d (by name).
If they differ, a super set is constructed.
Multiple -cat
can be used such that the final data set will contain
unique columns from -d and all -cat.
Columns that do not exist in a data set will be set to zero or blank
when that data set is loaded.
Example:
$ aq_pp ... -d s:Col1 s:Col2 i:Col3 s:Col4 ... -cat more.csv i:Col3 s:Col1 s:Col5 s:Col6 ...
more.csv
”. Column Col3 and Col1 are common,
so the resulting data set will have Col1, Col2, Col3, Col4, Col5 and Col6.
Since the main data set does not have Col5 and Col6, they are set to
blank when it is loaded.
Similarly, since “more.csv
” does not have Col2 and Col4,
they are set to blank when it is loaded.-seed RandSeed
$Random
evaluation builtin variable.
Default seed is 1.-rownum StartNum
$RowNum
evaluation builtin variable.
StartNum
is the index of the first row.
Default starting row index is 1.-var ColSpec Val
Define a new variable and initialize its value to Val.
A variable stores a value that persists between rows over the entire run.
Recall that normal column values change from row to row.
ColSpec
is the variable’s spec in the form Type:ColName
where Type
is the data type and ColName is the variable’s name. See the -d for
details.
Note that a string Val
must be quoted,
see String Constant spec for details.
Example:
$ aq_pp ... -d i:Col1 ... -var 'i:Sum' 0 ... -eval 'Sum' 'Sum + Col1' ...
-alias ColName AltName
AltName
is the desired alias. An alias allow the same column to be
addressed using multiple names.
If the original column is no longer needed, use -renam instead.-renam ColName NewName
NewName
is the new name of the column/variable/alias.
addressed using multiple names.-eval ColSpec|ColName Expr
Evaluate Expr
and save the result to a column. The column can be a new
column, an existing column/variable or null as explained below.
-
is given, the result will not be saved anywhere. This is
useful when calling a function that puts its result in destinated columns
by itself.ColSpec
is given, a new column will be created using the spec.
See -d for details. Note that the new column cannot participate in
Expr
.Expr
is the expression to evaluate.
Data type of the evaluated result must be compatible with the data type of
the target column. For example, string result for a string column and
numeric result for a numeric column (there is no automatic type conversion;
however, explicit conversion can be done using the To*()
functions
described below).
Operands in the expression can be the names of previously defined columns or
variables, constants, builtin variables and functions.
ToIP()
, ToF()
, ToI()
and ToS()
.Builtin variables:
$Random
$RowNum
Standard functions:
See aq-emod for a list of supported functions.
Example:
$ aq_pp ... -d i:Col1 ... -eval l:Col_evl 'Col1 * 10' ...
$ aq_pp -rownum 101 ... -d i:Col1 ... -eval i:Seq '$RowNum' ...
$ aq_pp ... -d s:Col1 s:Col2 ... -eval is:Dt 'DateToTime(Col2, "Y.m.d.H.M.S.p") - DateToTime(Col1, "Y.m.d.H.M.S.p")' ...
-mapf[,AtrLst] ColName MapFrom
Extract data from a string column. This option should be used in
conjunction with -mapc.
ColName
is a previously defined column/variable to extract data from.
MapFrom
defines the extraction rule.
Optional AtrLst
is a comma separated list containing:
ncas
- Do case insensitive pattern match (default is case sensitive).rx
- Do Regular Expression matching.rx_extended
- Do Regular Expression matching.
In addition, enable POSIX Extended Regular Expression syntax.rx_newline
- Do Regular Expression matching.
In addition, apply certain newline matching restrictions.If any of the Regular Expression related attributes are enabled, then
MapFrom
must use the RegEx MapFrom Syntax.
Otherwise, it must use the RT MapFrom Syntax.
-mapc ColSpec|ColName MapTo
Render data extracted via previous -mapf into a new column or into an existing column/variable. The column must be of string type.
ColSpec
is given, a new column will be created using the spec.
See -d for details.ColName
is given, it must refer to a previously defined
column/variable.MapTo
is the rendering spec. See MapTo Syntax for details.
Example:
$ aq_pp ... -d s:Col1 s:Col2 s:Col3 ... -mapf Col1 '%%v1_beg%%.%%v1_end%%' -mapf,rx Col2 '\(.*\)-\(.*\)' -mapf,rx Col3 '\(.*\)_\(.*\)' -mapc s:Col_beg '%%v1_beg%%,%%1%%,%%4%%' -mapc s:Col_end '%%v1_end%%,%%2%%,%%5%%' ...
MapFrom
expressions do not have named
placeholders for the extracted data. Placeholders are interpreted
implicitly from the the expressions in this way.%%0%%
- Represent the entire match in the first -mapf,rx
(not used in example).%%1%%
- Represent the 1st subpattern match in the first -mapf,rx
.%%2%%
- Represent the 2nd subpattern match in the first -mapf,rx
.%%3%%
- Represent the entire match in the second -mapf,rx
(not used in example).%%4%%
- Represent the 1st subpattern match in the second -mapf,rx
.%%5%%
- Represent the 2nd subpattern match in the second -mapf,rx
.-kenc ColSpec|ColName ColName [ColName ...]
Encode a key column from the given ColNames
.
The key column must be of string type.
The encoded value it stores constains binary data.
ColSpec
is given, a new column will be created using the spec.
See -d for details.ColName
is given, it must refer to a previously defined
column/variable.The source ColNames
must be previously defined.
They can have any data type.
Example:
$ aq_pp ... -d s:Col1 i:Col2 ip:Col3 ... -kenc s:Key1 Col1 Col2 Col3 ...
-kdec ColName ColSpec|ColName[+] [ColSpec|ColName[+] ...]
Decode a key column given by ColName
into one or more columns
given by ColSpec
(new column) or ColName
(existing column/variable).
The key ColName
must be an existing string column/variable.
For the decode-to columns, possible specs are:
Type:ColName[+]
ColName[+]
Type:[+]
Note that the decode-to column types must match those used in the original -kenc spec.
Example:
$ aq_pp ... -d s:Key1 ... -kdec Key1 s:Col1 i:Col2 ip:Col3 ...
$ aq_pp ... -d s:Key1 ... -kdec Key1 s: i:Col2 ip: ...
$ aq_pp ... -d s:Key1 ... -kdec Key1 s: i:Col2+ ip:+ -kdec Key1 i: ip:Col3 ...
-filt FilterSpec
Filter (or select) records based on FilterSpec
.
FilterSpec
is a logical expression that evaluates to either true or false
for each record - if true, the record is selected; otherwise, it is
discarded.
It has the basic form [!] LHS [<compare> RHS]
where:
!
negates the result of the comparison.
It is recommended that !(...)
be used to clarify the intended
operation even though it is not required.==
, >
, <
, >=
, <=
-
LHS and RHS comparison.~==
, ~>
, ~<
, ~>=
, ~<=
-
LHS and RHS case insensitive comparison; string type only.!=
, !~=
-
Negation of the above equal operators.&=
-
Perform a “(LHS & RHS) == RHS” check; numeric types only.!&=
-
Negation of the above.&
-
Perform a “(LHS & RHS) != 0” check; numeric types only.!&
-
Negation of the above.More complex expression can be constructed by using (...)
(grouping),
!
(negation), ||
(or) and &&
(and).
For example:
LHS_1 == RHS_1 && !(LHS_2 == RHS_2 || LHS_3 == RHS_3)
Example:
$ aq_pp ... -d s:Col1 s:Col2 i:Col3 s:Col4 ... -filt 'Col1 === Col4 && Col2 != "" && Col3 >= 100' ...
-map[,AtrLst] ColName MapFrom MapTo
Remap (a.k.a., rewrite) a string column’s value.
ColName
is a previously defined column/variable.
MapFrom
defines the extraction rule.
MapTo
is the rendering spec. See MapTo Syntax for details.
Optional AtrLst
is a comma separated list containing:
ncas
- Do case insensitive pattern match (default is case sensitive).rx
- Do Regular Expression matching.rx_extended
- Do Regular Expression matching.
In addition, enable POSIX Extended Regular Expression syntax.rx_newline
- Do Regular Expression matching.
In addition, apply certain newline matching restrictions.If any of the Regular Expression related attributes are enabled, then
MapFrom
must use the RegEx MapFrom Syntax.
Otherwise, it must use the RT MapFrom Syntax.
Example:
$ aq_pp ... -d s:Col1 ... -map Col1 '%%v1_beg%%-%*' 'beg=%%v1_beg%%' ... $ aq_pp ... -d s:Col1 ... -map,rx Col1 '\(.*\)-*' 'beg=%%1%%' ...
-sub[,AtrLst] ColName File [File ...] [ColSpec ...]
Replace the values of ColName
, a string column in the current data set,
with values from a lookup table loaded from Files
.
Optional AtrLst
is a comma separated list containing:
ncas
- Do case insensitive match (default is case sensitive).pat
- Support ‘?’ and ‘*’ wild cards in the “From” value. Literal ‘?’,
‘*’ and ‘\’ must be escaped by a ‘\’. Without this attribute,
“From” value is assumed constant and no escape is necessary.req
- Discard records not matching any entry in the lookup table.
Normally, column value will remain unchanged if there is no match.all
- Use all matches. Normally, only the first match is used.
With this attribute, one row is produced for each match.ColSpecs
define the input columns as
described in the aq_tool input specifications manual.
The spec is optional, default is “S:from S:to
” (or just “from to
”).
If a spec is defined, it must include these 2 columns (by name):
from
- Marks the column used to match the value of ColName
.
It must have a string type.to
- Marks the column used as the new value of ColName
.
It must have a string type.The from values are generally literals. Patterns can be used if
the pat
attribute description above is set.
The to values are always literals.
Matches are carried out according to the order of the match value in the
files. Match stops when the first match is found. If the files contain both
exact value and pattern, then:
Example:
$ aq_pp ... -d s:Col1 ... -sub Col1 lookup.csv TO X FROM ...
from to
” format, so the column spec must be
given. The X
in the spec marks an unneeded column.-grep[,AtrLst] ColName File [File ...] [ColSpec ...]
Filter by matching the value of ColName
, a string column in the current
data set, against the values loaded from Files
.
Optional AtrLst
is a comma separated list containing:
ncas
- Do case insensitive match (default is case sensitive).pat
- Support ‘?’ and ‘*’ wild cards in the “From” value. Literal ‘?’,
‘*’ and ‘\’ must be escaped by a ‘\’. Without this attribute,
match value is assumed constant and no escape is necessary.ColSpecs
define the input columns as
described in the aq_tool input specifications manual.
The spec is optional, default is “S:from
” (or just “from
”).
If a spec is defined, it must include 1 column (by name):
from
- Marks the column used to match the value of ColName
.
It must have a string type.The from values are generally literals. Patterns can be used if
the pat
attribute description above is set.
Matches are carried out according to the order of the match value in the
files. Match stops when the first match is found. If the files contain both
exact value and pattern, then:
Example:
$ aq_pp ... -d s:Col1 ... -grep,rev Col1 lookup.csv X X FROM ...
X
‘s in the spec mark the unneeded columns.-cmb[,AtrLst] File [File ...] ColSpec [ColSpec ...]
Combine data from Files
into the current data set by joining rows
from both data sets. The new data set will contain unique columns from
both sets. Common columns are automatically used as the join keys
(see ColSpec
description on how to customize join keys).
Optional AtrLst
is a comma separated list containing:
ncas
- Do case insensitive match (default is case sensitive).req
- Discard unmatched records.all
- Use all matches. Normally, only the first match is used.
With this attribute, one row is produced for each match.ColSpecs
define the input columns as
described in the aq_tool input specifications manual.
with these column attribute extensions:
key
- Marks a column as being a join key. It must be a common column.
This is the default for a common column.cmb
- Marks a column to be combined into the current data set.
This is the default for a non-common column.
It is typically used to mark a common column as not a join key.Example:
$ aq_pp ... -d s:Col1 s:Col2 i:Col3 s:Col4 ... -cmb lookup.csv i:Col3 s:Col1 s:Col5 s:Col6 ...
$ aq_pp ... -d s:Col1 s:Col2 i:Col3 s:Col4 ... -cmb lookup.csv i:Col3 s:Col1 s:Col5 s:Col6 s,cmb:Col2 ... $ aq_pp ... -d s:Col1 s:Col2 i:Col3 s:Col4 ... -cmb lookup.csv i,key:Col3 s,key:Col1 s,cmb:Col5 s,cmb:Col6 s,cmb:Col2 ...
-pmod ModSpec [ModSrc]
Use the processing function in the given module to process the current record. The function is typically used to implement custom logics.
ModSpec
has the form ModName
or ModName("Arg1", "Arg2", ...)
where ModName
is the module name and Arg*
are module dependent
arguments. Note that the arguments must be string constants;
for this reason, they must be quoted according to the
string constant spec.
ModSrc
is an optional module source file. It can be:
.so
extension.Without ModSrc
, aq_pp
will look for a preinstalled module matching
ModName
. Standard modules:
unwrap_strv("From_Col", "From_Sep", "To_Col" [, "AtrLst"])
Unwrap a delimiter separated string column into none or more values. The row will be replicated for each of the unwrapped values. This module requires 3 or 4 arguments:
From_Col
- Column containing the string value to unwrap.
It must have type S
.From_Sep
- The single byte delimiter that separate individual
values. The delimiter must be given as-is, no escape is recognized.To_Col
- Column to save each unwrapped value to.
It must have type S
. The To_Col
can be the same as the
From_Col
- the module will remember the original From_Col
value.AtrLst
- Optional. A comma separated attribute list containing:relax
- No trailing delimiter. One is expected by default.noblank
- Skip blank values. Blanks are kept by default.[-o[,AtrLst] File] [-c ColName [ColName ...]]
Output data rows.
Optional “-o[,AtrLst] File
” sets the output attributes and file.
See the aq_tool output specifications manual for details.
Optional “-c ColName [ColName ...]
” selects the columns to output.
ColName
refers to a previously defined column/variable.
A ColName
can be preceeded with a ~
(or !
) negation mark.
This means that the column is to be excluded.
Without -c
, all columns are selected by default. Variables are not
automatically included though.
If -c
is specified without a previous -o
, output to stdout is
assumed.
In case a title line is desired but certain column names are not
appropriate, use -alias or -renam before the -o
to remap the
name of those columns manually.
With -alias, the alternate names must be explicitly selected with -c
.
Multiple sets of “-o ... -c ...
” can be specified.
Example:
$ aq_pp ... -d s:Col1 s:Col2 s:Col3 ... -o,esc,noq - -c Col2 Col1
-ovar[,AtrLst] File [-c ColName [ColName ...]]
Output the final values of all variables defined via the -var option.
Only a single data row is output.
“-ovar[,AtrLst] File
” sets the output attributes and file.
See the aq_tool output specifications manual for details.
Optional “-c ColName [ColName ...]
” selects the variables to output.
ColName
refers to a previously defined variable.
A ColName
can be preceeded with a ~
(or !
) negation mark.
This means that the variable is to be excluded.
Without -c
, all variables are selected by default.
In case a title line is desired but certain variable names are not
appropriate, use -alias or -renam before -ovar
to remap the
name of those variables manually.
With -alias, the alternate names must be explicitly selected with -c
.
Multiple sets of “-ovar ... -c ...
” can be specified.
Example:
$ aq_pp ... -d i:Col1 i:Col2 ... -var i:Sum1 0 -var i:Sum2 0 ... -eval Sum1 'Sum1 + Col1' -eval Sum2 'Sum2 + (Col2 * Col2)' ... -ovar - -c Sum1 Sum2
-imp[,AtrLst] DbName[:TabName] [-mod ModSpec [ModSrc]]
Output data to Udb (i.e., perform an Udb import).
DbName
is the database name (see Target Udb Database).
TabName
is a table/vector name in the database.
If TabName
is not given or if it is a ”.” (a dot), a primary key-only
import will be performed.
Columns (including variables) from the current data set matching
the column names of TabName
are automatically selected for import.
In case certain desired columns in the current data set are named
differently from tbe columns of TabName
, use -alias or -renam
to remap their names manually.
Optional AtrLst
is a comma separated list containing:
spec=UdbSpec
- Set the spec file directly (see Target Udb Database).ddef
- Allow missing target columns. Normally, it is an error when
a target column is missing from the current data set. With this attribute,
0 or blank will be used as the missing columns’ value.nodelay
- Send records to Udb servers as soon as possible.
Otherwise, up to 16KB of data may be buffered before an output occurs.seg=N1[-N2]/N
- Apply sampling by selecting segment N1 or
segment N1 to N2 (inclusive) out of N segments of unique keys from the
input data to import. Keys are segmented based on their hash values.
For example, seg=2-4/10
will divide the keys into 10
segments and import segments 2, 3 and 4; segments 1 and 5-10 are discarded.nobnk
- Exclude records with a blank key from the import.
This only applies with the primary key is made up of a single string column.nonew
- Tell the server not to create any new key during the
import. In other words, records belonging to keys not yet in the DB are
discarded.noold
- The opposite of nonew
.Optional “-mod ModSpec [ModSrc]
” specifies a module to be
loaded on the server side.
ModSpec
has the form ModName
or ModName(Arg1, Arg2, ...)
where ModName
is the module name and Arg*
are module dependent
arguments. Note that the arguments must be literals -
string constants (quoted), numbers or IP addresses.
ModSrc
is an optional module source file containing:
.so
extension.Without ModSrc
, the server will look for a preinstalled module matching
ModName
.
Multiple sets of Udb import options can be specified.
Example:
$ aq_pp ... -d s:Col1 s:Col2 i:Col3 s:Col4 ... -imp mydb:Test
If successful, the program exits with status 0. Otherwise, the program exits with a non-zero status code along error messages printed to stderr. Applicable exit codes are:
A string constant must be quoted between double or single quotes. With double quotes, special character sequences can be used to represent special characters. With single quotes, no special sequence is recognized; in other words, a single quote cannot occur between single quotes.
Character sequences recognized between double quotes are:
\\
- represents a literal backslash character.\"
- represents a literal double quote character.\b
- represents a literal backspace character.\f
- represents a literal form feed character.\n
- represents a literal new line character.\r
- represents a literal carriage return character.\t
- represents a literal horizontal tab character.\v
- represents a literal vertical tab character.\0
- represents a NULL character.\xHH
- represents a character whose HEX value is HH
.\<newline>
- represents a line continuation sequence; both the backslash
and the newline will be removed.Sequences that are not recognized will be kept as-is.
Two or more quoted strings can be used back to back to form a single string. For example,
'a "b" c'" d 'e' f" => a "b" c d 'e' f
RT style MapFrom is used in both -mapf and -map options. The MapFrom spec is used to match and/or extract data from a string column’s value. It has this general syntax:
literal_1%*literal_2%?literal_3
-
%*
matches any number of bytes and %?
matches any 1 byte.
This is like a pattern comparison.%%my_var%%
-
Extract the value into a variable named my_var
. my_var
can later be
used in the MapTo spec.literal_1%%my_var_1%%literal_2%%my_var_2%%
-
A common way to extract specific data portions.literal_1%=literal_2%=literal_3
-
%=
is used to toggle case sensitive/insensitive match. In the above case,
if -mapf or -map does not have the ncas
attribute, then
literal_1
‘s match will be case sensitive, but literal_2
‘s will be
case insensitive, and literal_3
‘s will be case sensitive again.\%\%not_var\%\%%%my_var%%a_backslash\\others
-
If a ‘%’ is used in such a way that resembles an unintended MapFrom spec,
the ‘%’ must be escaped. Literal ‘\’ must also be escaped.
In summary, the following escape sequences are recognized:\%
- represents a literal percent character.\\
- represents a literal backslash character.\"
- represents a literal double quote character.\b
- represents a literal backspace character.\f
- represents a literal form feed character.\n
- represents a literal new line character.\r
- represents a literal carriage return character.\t
- represents a literal horizontal tab character.\v
- represents a literal vertical tab character.\0
- represents a NULL character.\xHH
- represents a character whose HEX value is HH
.\<newline>
- represents a line continuation sequence; both the backslash
and the newline will be removed.Each %%var%%
variable can have additional attributes. The general form of
a variable spec is:
%%VarName[:@class][:[chars]][:min[-max]][,brks]%%
where
VarName
is the variable name which can be used in MapTo. VarName can be a
‘*’; in this case, the extracted data is not stored, but the extraction
attributes are still honored.
Note: Do not use numbers as a RT mapping variable name.
:@class
restricts the exctracted data to belong to a class of characters.
class
is a code with these values and meanings:
n
- Characters 0-9.a
- Characters a-z.b
- Characters A-Z.c
- All printable ASCII characters.x
- The opposite of c
above.s
- All whitespaces.g
- Characters in {}[]()
.q
- Single/double/back quotes.Multiple classes can be used; e.g., %%my_var:@nab%%
for all alphanumerics.
:[chars]
([]
is part of the syntax) is similar to the character class
described above except that the allowed characters are set explicitly.
Note that ranges is not supported, all characters must be specified.
For example,
%%my_var:[0123456789abcdefABCDEF]%%
(same as
%%my_var:@n:[abcdefABCDEF]%%
) for hex digits. To include a ‘]’
as one of the characters, put it first, as in %%my_var:[]xyz]%%
.
:min[-max]
is the min and optional max length (bytes, inclusive) to
extract. Without a max, the default is unlimited (actually ~64Kb).
,brks
defines a list of characters at which extraction of the variable
should stop. For example, %%my_var,,;:%%
will extract data into my_var
until one of ,;:
or end-of-string is encountered. This usuage is often
followed by a wild card, as in %%my_var,,;:%%%*
.
Regular expression style MapFrom
can be used in both -mapf and -map
options. MapFrom
defines what to match and/or extract from a string
value of a column.
Differences between RegEx mapping and RT mapping:
^pattern$
.%%0%%
, %%1%%
, and so on.
See -mapc for an usage example.\\
, \+
, \*
, etc), the following are also recognized:\"
- represents a literal double quote character.\b
- represents a literal backspace character.\f
- represents a literal form feed character.\n
- represents a literal new line character.\r
- represents a literal carriage return character.\t
- represents a literal horizontal tab character.\v
- represents a literal vertical tab character.\0
- represents a NULL character.\xHH
- represents a character whose HEX value is HH
.\<newline>
- represents a line continuation sequence; both the backslash
and the newline will be removed.Regular Expression is very powerful but also complex. Please consult the GNU RegEx manual for details.
MapTo is used in -mapc and -map. It renders the data extracted by MapFrom into a column. Both RT and RegEx MapTo share the same syntax:
%%my_var%%
-
Substitute the value of my_var
.literal_1%%my_var_1%%literal_2%%my_var_2%%
-
A common way to render extracted data.\%\%not_var\%\%%%my_var%%a_backslash\\others
-
If a ‘%’ is used in such a way that resembles an unintended MapTo spec,
the ‘%’ must be escaped. Literal ‘\’ must also be escaped.
See RT MapFrom Syntax for all supported escape sequences.Each %%var%%
variable can have additional attributes. The general form of
a variable spec is:
%%VarName[:cnv][:start[:length]][,brks]%%
where
VarName
is the variable to substitute in.
:cnv
sets a conversion method on the data in the variable. Note that the
data is first subjected to the length and break considerations before the
conversion. Supported conversions are:
b64
- Apply base64 decode.url[Num]
- Apply URL decode. Optional Num
is a number between 1-99.
It is the number of times to apply URL decode.Normally, only use 1 conversion. If both are specified (in any order), URL decode is always done before base64 decode.
:start
is the starting byte position of the extracted data to substitute.
The first byte has position 0. Default is 0.
:length
is the number of bytes (from start
) to substitute. Default is
till the end.
,brks
defines a list of characters at which substitution of the variable’s
value should stop.
See -mapc for an usage example.
aq_pp
obtains information about the target Udb database from a spec file.
The spec file contains server IPs (or domain names) and table/vector
definitions. See udb.spec for details.
aq_pp
finds the relevant spec file in several ways:
spec=UdbSpec
attribute
of the -imp or -exp option.DbName
parameters
of the -imp or -exp option. This method sets the spec file to
“.conf/DbName.spec
” in the runtime directory of aq_pp
.udb.spec
” in the runtime directory of aq_pp
.Some of the data processing options can be placed in conditional groups such that different processing rules can be applied depending on the logical result of another rule. The basic form of a conditional group is:
-if[not] RuleToCheck RuleToRun ... -elif[not] RuleToCheck RuleToRun ... -else RuleToRun ... -endif
Groups can be nested to form more complex conditions.
Supported RuleToCheck
and RuleToRun
are
-eval, -mapf, -mapc, -kenc, -kdec,
-filt, -map, -sub, -grep, -cmb, -pmod,
-o and -imp. Note that some of these rules may be responsible for the
initialization of dynamically created columns. If such rules get skipped
conditionally, numeric 0 or blank string will be assigned to the
uninitialized columns.
There are 2 special RuleToCheck
:
-true
- Evaluate to true.-false
- Evaluate to false.In addition, there are 3 special RuleToRun
for output record disposition
control (they do not change any data):
-skip
- Do not output current row.-quit
- Stop processing entirely.-quitafter
- Stop processing after the current input record.Example:
$ aq_pp ... -d i:Col1 ... -if -filt 'Col1 == 1' -eval s:Col2 '"Is-1"' -elif -filt 'Col1 == 2' -false -else -eval Col2 '"Others"' -endif ...
$ aq_pp ... -d i:Col1 s:Col2 ... -if -filt 'Col1 == 1' -o Out1 -elif -filt 'Col1 == 2' -o Out2 -c Col2 -endif ...