Udb module script compiler
mcc.umod in_script [out.c|out.cpp] [out.so]
This is the Udb module script compiler. It converts a script written in C/C++ and module commands into a dynamic module for Udb.
This compiler is normally used internally by aq_udb -mod
for on-the-fly module generation. However, it can also be used to develop
modules manually.
Simply install the manually created module (the .so
file) in the
appropriate location and Udb will be able
to use it.
in_script
in_script
to ‘-‘ (a single dash).out.c|out.cpp
Save the intermediate source to an output file. This is a C/C++ source file generated based on the input module script. It closely ressembles the original script except for some added support/interface code.
The output file must have a .c
or .cpp
extension.
Only one of the two can be specified.
Whether to save this output is optional. Use it for debugging
or to help module development as needed.
out.so
.so
extension.Module commands abstract and hide most of the module API details.
They resemble C macros, as in COMMAND(parameters)
.
The commands consist of declaration statements,
processing function specifications and
module helpers.
They tell the module compiler what code to generate
before building the final dynamic module.
A module script is primarily a C/C++ source with certain embedded module commands. This is a sample script that normalizes a column:
DECL_LANG(C); DECL_COLUMN_DYNAMIC(Tab.Col_in_out, F); DECL_END; MOD_INIT_FUNC() { if (arg_n != 1) return 0; if (!MOD_COLUMN_BIND(Tab.Col_in_out, arg[0])) return 0; return 1; } MOD_KEY_FUNC() { CDAT_F_T sum; sum = 0; MOD_TABLE_SCAN(Tab) { sum += $Tab.Col_in_out; } if (sum != 0) { MOD_TABLE_SCAN(Tab) { $Tab.Col_in_out /= sum; } } return 1; }
Columns are type specific. Column types are defined in the data spec. In the module script, a C/C++ variable of the appropriate type must be used when copying or manipulating column values. These are the Udb column types and their corresponding module types/typedefs:
Spec type Program typedef Module typedef Description S HStr * CDAT_S_T A pointer to a hash string data structure. It represents a stored string value. F double CDAT_F_T A double precision floating point number. L u_int64_t CDAT_L_T An unsigned (always positive) 64bit integer. LS int64_t CDAT_LS_T A 64bit integer. I u_int32_t CDAT_I_T An unsigned (always positive) 32bit integer. IS int32_t CDAT_IS_T A 32bit integer. IP NetIp CDAT_IP_T An IP address data structure.
Declaration statements are used to declare variables and options. The compiler interprets these declarations and determines what code to generate. For example, column declarations will result in column handling code, variable declarations will result in variable handling code, and so on.
DECL_LANG(Lang);
Tell the compiler what programming language is being used in the script.
Lang
can either be C
or CPP
. Default is C
.
Example:
DECL_LANG(C);
C
is the default,
so this declaration is not strictly necessary.DECL_BUILD_OPT(Arguments);
Supply custom command line arguments for the compiler. Use cases are:
-Imy_include_directory
.define
; e.g., -DMY_DEF=1
.my_dir/my_lib.a
.-lm
for the math library.Example:
DECL_BUILD_OPT(-DMY_VERSION_STRING='"1.1.1"' -lm);
DECL_COLUMN(TabName.ColName, ColType);
Declare a column for use in the script.
TabName.ColName
is a column in the database.
The given name and type will be verified at run time
during module initialization to ensure that the spec is valid.Var
table, set TabName
to Var
.TabName
to the special name PKEY
(all uppercase).Example:
DECL_COLUMN(TabName_1.ColName_1, I);
TabName_1
and ColName_1
are actual table and column names.
They are specified as-is, like a variable (not a string).DECL_COLUMN_DYNAMIC(TabName.ColName, ColType);
Declare a column for the script just like DECL_COLUMN(), except that the actual target table and column names are not known until run time (hence, dynamic).
Example:
DECL_COLUMN_DYNAMIC(Tab.Col_in_out, F); MOD_INIT_FUNC() { if (!MOD_COLUMN_BIND(Tab.Col_in_out, "RealTable.RealColumn")) return 0; ... }
DECL_DATA(VarDecl);
Declare one or more variables as the module’s instance specific data. Unlike global variables which are shared between concurrent instances of the same module, variables declared this way are instance specific (i.e., each instance has its own copies of the variables). This is the recommended way of managing module data.
VarDecl
is a variable declaration like int num1, num2
.MOD_DATA(num1)
and MOD_DATA(num2)
will access
the values of those integers.Example:
DECL_DATA(int flag); DECL_DATA(int num1, num2); MOD_INIT_FUNC() { if (...) MOD_DATA(flag) = 1; else MOD_DATA(flag) = 2; ... } MOD_ROW_FUNC(TabName_1) { if (MOD_DATA(flag) == 1) MOD_DATA(num1) += 1; else if (MOD_DATA(flag) == 2) MOD_DATA(num2) += 1; ... }
flag
is conditionally initialized to
1 or 2 during module initialization. num1
and num2
are already
initialized to 0 automatically.DECL_END;
The processing functions carry out the intended task of a module. There are several predefined module functions - one optional initialization function, one or more processing functions and one optional wrap up function. If any of them are defined, the compiler will generate code that call these function automatically.
A module function is defined like a C function:
PREDEFINED_FUNCTION_NAME(function_dependent_argument) { code_block ... }
MOD_*_FUNC()
)
and argument (function dependent) specification.etc/include/umod.h
”).MOD_INIT_FUNC()
Define a function for module initialization.
ModCntx *mod
- A module instance handle. Pass this to any support
functions that use module helpers.const char *const *arg, int arg_n
- The parameters passed to the
module when it was called on the command line is available here as a
string array. Use them to set up run time parameters as necessary.Example:
MOD_INIT_FUNC() { if (arg_n != 1) return 0; if (!MOD_COLUMN_BIND(Tab.Col_in_out, arg[0])) return 0; return 1; }
arg
and arg_n
are implicit variables in the function).MOD_KEY_FUNC()
Define a function for key-level processing during an Udb export/count/scan operation.
ModCntx *mod
- A module instance handle. Pass this to any support
functions that use module helpers.Example:
MOD_KEY_FUNC() { CDAT_F_T sum; sum = 0; MOD_TABLE_SCAN(Tab) { sum += $Tab.Col_in_out; } if (sum != 0) { MOD_TABLE_SCAN(Tab) { $Tab.Col_in_out /= sum; } } return 1; }
Tab.Col_in_out
to a per-key
average.$TabName.ColName
(or MOD_CDAT()) to address a
column’s value.MOD_ROW_FUNC(TabName)
Define a function for row processing during
an Udb export/count/scan operation on TabName
.
TabName
for each key.TabName
is automatically set to the
relevant row. For this reason, do not use MOD_TABLE_SCAN() or
MOD_TABLE_SET() on TabName
. If a TabName
scan is needed,
use DECL_COLUMN_DYNAMIC() and MOD_COLUMN_BIND() to bind the same
table to another name and scan using that name instead.ModCntx *mod
- A module instance handle. Pass this to any support
functions that use module helpers.Example:
MOD_ROW_FUNC(TabName_1) { if ($TabName_1.ColName_1 >= 100 && $TabName_1.ColName_1 <= 199) return 1; return 0; }
ColName_1
is between 100 and 199, discard otherwise.$TabName.ColName
(or MOD_CDAT()) to address a
column’s value.MOD_VALUE_FUNC(TabName)
Define a function that checks whether to import the input values
of a new row during an Udb import operation on TabName
.
TabName
for a key.Var
table.ModCntx *mod
- A module instance handle. Pass this to any support
functions that use module helpers.Example:
MOD_VALUE_FUNC(TabName_1) { if (MOD_IMP_CDAT(TabName_1.ColName_1) >= 100 && MOD_IMP_CDAT(TabName_1.ColName_1) <= 199) return 1; return 0; }
ColName_1
is between 100 and 199, discard otherwise.MOD_MERGE_FUNC(TabName)
Define a function that checks whether to merge the input values
of a new row into an existing data row during an Udb import operation on
TabName
.
TabName
for a key.TabName
is automatically set to the
existing row. For this reason, do not use MOD_TABLE_SCAN() or
MOD_TABLE_SET() on TabName
. If a TabName
scan is needed,
use DECL_COLUMN_DYNAMIC() and MOD_COLUMN_BIND() to bind the same
table to another name and scan using that name instead.ModCntx *mod
- A module instance handle. Pass this to any support
functions that use module helpers.Example:
MOD_MERGE_FUNC(TabName_1) { if (MOD_IMP_CDAT(TabName_1.ColName_1) == $TabName_1.ColName_1) return 1; return 0; }
ColName_1
is the same as the existing one, discard otherwise.$TabName.ColName
(or MOD_CDAT()) to address a column’s
existing value.MOD_DONE_FUNC()
Define a function that performs module wrap up related tasks. Udb unloads the module.
ModCntx *mod
- A module instance handle. Pass this to any support
functions that use module helpers.Example:
MOD_DONE_FUNC() { ModLog("%s done\n", MOD_NAME); }
These are helpers that are designed specifically for module processing tasks.
They can be used in any processing functions or subroutines called
from these functions (these subroutines must be given a ModCntx *mod
argument).
int MOD_COLUMN_BIND(TabName.ColName, const char *real_name)
Dynamic column setup function.
TabName.ColName
must ba a column declared via DECL_COLUMN_DYNAMIC().real_name
is a C string containing the actual table dot column name.MOD_TABLE_SCAN(TabName) { ... }
A macro that expands to a for
loop over all rows of the given table.
TabName
must be a table declared via DECL_COLUMN() or
DECL_COLUMN_DYNAMIC().MOD_TABLE_SCAN(TabName, RowData *row) { ... }
A macro that expands to a for
loop over all rows of the given table.
The given row
variable will be used as the row iterator.
TabName
must be a table declared via DECL_COLUMN() or
DECL_COLUMN_DYNAMIC().MOD_TABLE_SET(TabName)
A macro that sets the internal table specific row iterator of the given table to the first row of the table. No return value.
TabName
must be a table declared via DECL_COLUMN() or
DECL_COLUMN_DYNAMIC().for
loop)
is not necessary.Example:
DECL_COLUMN(TabName_1.ColName_1, I); DECL_COLUMN(VecName_2.ColName_1, I); MOD_KEY_FUNC() { CDAT_I_T sum; sum = 0; MOD_TABLE_SCAN(TabName_1) { sum += $TabName_1.ColName_1; } MOD_TABLE_SET(VecName_2); $VecName_2.ColName_1 = sum; ... }
TabName_1.ColName_1
over all rows of TabName_1
to vector column VecName_2.ColName_1
.MOD_TABLE_SET_R(TabName, RowData *row)
A macro that sets the given row
variable
to the first row of the table. No return value.
TabName
must be a table declared via DECL_COLUMN() or
DECL_COLUMN_DYNAMIC().for
loop)
is not necessary.Example:
DECL_COLUMN(TabName_1.ColName_1, I); DECL_COLUMN(VecName_2.ColName_1, I); MOD_KEY_FUNC() { CDAT_I_T sum; RowData *row; sum = 0; MOD_TABLE_SCAN_R(TabName_1, row) { sum += MOD_CDAT_R(TabName_1.ColName_1, row); } MOD_TABLE_SET_R(VecName_2, row); MOD_CDAT_R(VecName_2.ColName_1, row) = sum; ... }
*_R()
constructs.RowData *MOD_ROW(TabName)
A macro that returns the internal row iterator of the given table.
TabName
must be a table declared via DECL_COLUMN() or
DECL_COLUMN_DYNAMIC().Example:
DECL_COLUMN(TabName_1.ColName_1, I); MOD_KEY_FUNC() { MOD_TABLE_SET(TabName_1); if (!MOD_ROW(TabName_1)) return 0; ... }
TabName_1
is empty.CDAT_*_T MOD_CDAT(TabName.ColName)
, CDAT_*_T $TabName.ColName
Use either form like a program variable to address the value of a column in the current row.
CDAT_*_T
type (see column datatypes)
derived from ColType
in the declaration.Example:
DECL_COLUMN(TabName_1.InNumColumn, I); DECL_COLUMN_DYNAMIC(TabName_1.OutNumColumn, I); MOD_INIT_FUNC() { MOD_COLUMN_BIND(TabName_1.OutNumColumn, "TabName_1.RealColumn"); ... } MOD_ROW_FUNC(TabName_1) { if ($TabName_1.InNumColumn == 4321) $TabName_1.OutNumColumn += 1; ... }
CDAT_*_T MOD_CDAT_R(TabName.ColName, RowData *row)
Use this like a program variable to address the value of a column in the given row.
CDAT_*_T
type (see column datatypes)
derived from ColType
in the declaration.row
variable must be one that has been prepared through
MOD_TABLE_SET_R(), MOD_TABLE_SCAN_R() or equivalent.Example:
DECL_COLUMN(Tab.Col, I); static void *ParallelScan(void *ag) { ModCntx *mod = (ModCntx *)ag; RowData *row; CDAT_I_T sum; MOD_TABLE_SCAN_R(Tab, row) { sum += MOD_CDAT_R(Tab.Col, row); } ModLog("%s: sum=%u\n", MOD_NAME, sum); } MOD_KEY_FUNC() { pthread_t tid_1, tid_2; pthread_create(&tid_1, 0, ParallelScan, mod); pthread_create(&tid_2, 0, ParallelScan, mod); pthread_join(tid_1, 0); pthread_join(tid_2, 0); ... }
*_R()
constructs were
not used, the program will either crash or produce the wrong sums.CDAT_*_T MOD_IMP_CDAT(TabName.ColName)
Use this like a program variable to address the input value of a column.
CDAT_*_T
type (see column datatypes)
derived from ColType
in the declaration.Example:
MOD_VALUE_FUNC(TabName_1) { if (MOD_IMP_CDAT(TabName_1.ColName_1) < 100) return 0; ... }
int MOD_HAS_KEY
void MOD_CDAT_S_NSET(TabName.ColName, const char *b, unsigned int n)
Set the value of the given column in the current row to a hash string
based on string buffer b
and length n
.
Example:
DECL_COLUMN(TabName_1.StrColumn_1, S); MOD_ROW_FUNC(TabName_1) { MOD_CDAT_S_NSET(TabName_1.StrColumn_1, "abc", 3); ... }
void MOD_CDAT_S_NSET_R(TabName.ColName, const char *b, unsigned int n, RowData *row)
void MOD_CDAT_S_SET(TabName.ColName, CDAT_S_T hs)
Set the value of the given column in the current row to a copy of
hash string hs
.
hs
is an existing hash string (e.g., the value of another string
column).Example:
DECL_COLUMN(TabName_1.StrColumn_1, S); DECL_COLUMN(TabName_1.StrColumn_2, S); MOD_ROW_FUNC(TabName_1) { MOD_CDAT_S_SET(TabName_1.StrColumn_1, $TabName_1.StrColumn_2); ... }
void MOD_CDAT_S_SET_R(TabName.ColName, CDAT_S_T hs, RowData *row)
void MOD_CDAT_S_DEL(TabName.ColName)
void MOD_CDAT_S_DEL_R(TabName.ColName, RowData *row)
const ColDefn *MOD_CDEF(TabName.ColName)
MOD_DATA(variable)
const char *MOD_NAME
MOD_LOG_ERR(const char *format, ...)
Print a message to the Udb server log. If it is called during module initialization, the same message will be returned to the client.
Example:
MOD_INIT_FUNC() { if (arg_n != 1) { MOD_LOG_ERR("missing module argument"); return 0; } ... }
Generic programming supports and convenient functions for module specific
datatype handling.
Note that any memory allocated by the module must be deallocated with
free()
before the module is unloaded (see MOD_DONE_FUNC()).
int ModDifHStr(const CDAT_S_T hs1, const CDAT_S_T hs2, int dif_flag)
Compare the values of 2 hash strings.
hs1
is greater, and -1 otherwise.dif_flag
is either 0 (case sensitive comparision) or
DIF_A_NCAS (case insensitive comparison).Example:
DECL_COLUMN(TabName_1.StrColumn_1, S); DECL_COLUMN(TabName_1.StrColumn_2, S); MOD_ROW_FUNC(TabName_1) { if (ModDifHStr($TabName_1.StrColumn_1, $TabName_1.StrColumn_2, 0) == 0) ... ... }
int ModDifHStrStr(const CDAT_S_T hs, const char *b, int n, int dif_flag)
Compare the value of hash string hs
to string buffer b
of
length n
.
hs
is greater, and -1 otherwise.dif_flag
is either 0 (case sensitive comparision) or
DIF_A_NCAS (case insensitive comparison).Example:
DECL_COLUMN(TabName_1.StrColumn_1, S); MOD_ROW_FUNC(TabName_1) { if (ModDifHStrStr($TabName_1.StrColumn_1, "abc", 3, 0) == 0) ... ... }
int ModDifHStrPat(const CDAT_S_T hs, const char *pat, int n, int dif_flag)
Compare the value of hash string hs
to pattern buffer pat
of
length n
.
pat
may contain ‘*’ (for any number of bytes) and ‘?’
(for any 1 byte). Use a ‘’ to escape literal ‘*’, ‘?’ and ‘\’ in the
pattern. If the pattern is given as a literal, any backslashes in it
must be backslash escaped one more time for the C/C++ interpreter.dif_flag
can have these values:Example:
DECL_COLUMN(TabName_1.StrColumn_1, S); MOD_ROW_FUNC(TabName_1) { if (ModDifHStrPat($TabName_1.StrColumn_1, "a*c", 3, 0) == 0) ... ... }
int ModDifIp(const CDAT_IP_T *ip1, const CDAT_IP_T *ip2)
Compare the values of 2 IP addresses. Note that the arguments are pointers to IP address structures.
ip1
is greater, and -1 otherwise.Example:
DECL_COLUMN(TabName_1.IPColumn_1, IP); DECL_COLUMN(TabName_1.IPColumn_2, IP); MOD_ROW_FUNC(TabName_1) { if (ModDifIp(&$TabName_1.IPColumn_1, &$TabName_1.IPColumn_2) == 0) ... ... }
void ModLog(const char *format, ...)
Print a message to the Udb server log.
Example:
MOD_INIT_FUNC() { if (arg_n != 1) { ModLog("%s: missing module argument\n", MOD_NAME); return 0; } ... }
void *ZAlloc(size_t size)
size
bytes of memory. This is the same as the C function
malloc()
except that the returned memory is initialized to zero.Type *ZALLOC_TYPE(Type)
Type
. This is a macro based on
ZAlloc().Type *ZALLOC_TYPE_N(Type, int num)
num
object of type Type
. This is a macro based on
ZAlloc().int ReAlloc(void *orig_mem, size_t new_size)
This function works like a combination of the C functions
malloc()
and realloc()
- it allocates new_size
bytes if the
original memory address is NULL or reallocates to new_size
otherwise.
orig_mem
is the address of the original memory address
(i.e., an address of an address).char *StrNDup(const char *b, int n)
Duplicate a data buffer b
of length n
(i.e., allocate memory and
copy data).
b
is NULL, NULL is returned regardless of the value of n
.n
is greater than or equal to 0, b
needs not be null
terminated.n
is less than 0, b
must be null terminated. The string length
of b
will be used as the data length.BUF_INIT(BufData *buf)
BufData
structure.
This should be done on any uninitialized BufData
structure before it is
used for the first time.BUF_CLEAR(BufData *buf)
BufData
structure. Do this before destroying a BufData
structure.int BufNCat(BufData *buf, const char *b, int n)
Append data buffer b
of length n
to the buffer in
BufData
structure buf
.
buf->s
string is null terminated.b
is NULL, the size of buf->s
will be increased by n
(if necessary), but no data will be copied. In other words,
buf->s
and buf->z
may change, but buf->n
will not.n
is greater than or equal to 0, b
needs not be null
terminated.n
is less than 0, b
must be null terminated. The string length
of b
will be used as the data length.void HStrNSet(const ColDefn *col, CDAT_S_T *hs, const char *b, unsigned int n)
Replace hash string hs
with one based on string buffer b
and
length n
.
hs
must have a value on input - either a valid hash string or 0.hs
is the value of a column, specify the relevant column definition
as col
. This is similar to what MOD_CDAT_S_NSET() does.hs
is not the value of a column, set col
to 0.Example:
DECL_DATA(CDAT_S_T my_str); MOD_INIT_FUNC() { HStrNSet(0, &MOD_DATA(my_str), "abc", 3); ... } ... MOD_DONE_FUNC() { HStrDel(0, &MOD_DATA(my_str)); ... }
void HStrSet(const ColDefn *col, CDAT_S_T *hs, CDAT_S_T s)
Replace hash string hs
with a copy of s
.
hs
must have a value on input - either a valid hash string or 0.hs
is the value of a column, specify the relevant column definition
as col
. This is similar to what MOD_CDAT_S_SET() does.hs
is not the value of a column, set col
to 0.void HStrDel(const ColDefn *col, CDAT_S_T *hs)
Delete (dereference) hash string hs
. hs
will be set to a generic
blank hash string on return.
hs
must have a value on input - either a valid hash string or 0.hs
is the value of a column, specify the relevant column definition
as col
. This is similar to what MOD_CDAT_S_DEL() does.hs
is not the value of a column, set col
to 0.Additional resources can be found in the low level include file
“etc/include/umod.h
”.
The ability to address columns by their names is a key feature of
the module script API. Both TabName.ColName
and $TabName.ColName
are designed to address columns, but they differ in these ways:
TabName.ColName
(without the leading dollar sign) refers to an
abstract column reference.
It is only valid in module helpers.$TabName.ColName
(with the leading dollar sign) is a shorthand for
MOD_CDAT(TabName.ColName)
. It refers to a column’s value.
It acts like a program variable of type CDAT_*_T
(see column datatypes). It can be used anywhere
program variables are appropriate.