 |
|
| |
AWKA(1) |
USER COMMANDS |
AWKA(1) |
awka - AWK language to ANSI C translator and library
awka [-c fn] [-X] [-x]
[-t] [-o filename] [-a args] [-w
args] [-I include-dir] [-i include-file]
[-L lib-dir] [-l lib-file] [-f
progname] [-d] [program] [--] [exe-args]
awka [-version] [-help]
Awka is two products - a translator of AWK language
programs to ANSI-C, and a library of essential functions against which the
translated code must be linked.
The AWK language is useful for maniplation of datafiles,
text retrieval and processing, and for prototyping and experimenting with
algorithms. Usually AWK is implemented as an interpretive language - there
are several good free interpreters available, notably gawk,
mawk and 'The One True Awk' maintained by Brian Kernighan.
This manpage does not explain how AWK works - refer to the SEE
ALSO section at the end of this page for references.
Awka is a new awk meaning it implements the AWK
language as defined in Aho, Kernighan and Weinberger, The AWK Programming
Language, Addison-Wesley Publishing 1988. Awka includes features
from the Posix 1003.2 (draft 11.3) definition of the AWK language, but does
not necessarily conform in entirety to Posix standards. Awka also
provides a number of extensions not found in other implementations of
AWK.
- -c fn
- Instead of producing a 'main' function, awka will instead generate
'fn' as a controlling function. This is useful where the compiled C
code is to be linked in with a larger application. The -c argument
is not compatible with the -X and -x arguments. See the
section USING awka -c below for more details on how to use this
option.
- -X
- awka will generate C code, which will then be compiled into an
executable, using the C compiler and intallation paths defined when
Awka was installed. The C code will be stored in 'awka_out.c' and
the executable in 'awka.out' or 'awka_out.exe'.
- -x
- The same as -X, except that the compiled program will also be
executed using arguments following the '--' option on the
command-line.
- -t
- To be used in conjunction with -x. The C file and the executable
will be removed following execution of the program.
- -o filename
- To be used in conjunction with -x and -X. The generated
executable will be called 'filename' rather than the default
'awka.out'.
- -a args
- This embeds executable command-line arguments within the translated code
itself. For example, awka -X -a "-We" file.awk will
create an awka.out that will already have -We in its command-line when it
is run. To see what arguments have been embedded in an executable, use
-showarg at runtime.
- -w args
- Prints various warnings to stderr, useful in debugging large, complex AWK
programs. None of these are errors - all are acceptable uses of the AWK
language. Depending on your programming style, however, they could be
useful in narrowing down where problems may be occuring. args can
contain the following characters:-
a - prints a list of all global variables.
b - warns about variables set to a value but not
referenced.
c - warns about variables referenced but not set to a
value.
d - reports use of global vars within a function.
e - reports use of global vars within just one
function.
f - requires declaration of global variables.
g - warns about assignments used as truth
expressions.
NOTE: As at version 0.5.8 only a, b and c are implemented.
- -I include-dir
- Specifies a directory in which include files required by awka, or defined
by the user, reside. You may use as many -I options as you like.
- -i include-file
- Specifies an include filename to be inserted in the translated code.
- -L lib-dir
- Specifies a directory containing libraries that may be required by awka,
or defined for linking by the user. See the awka-elm manpage for
more details.
- -l lib-file
- Specifies a library file to be linked to the translated code generated by
awka at compile time (this only really makes sense if using awka -x). The
lib-file is specified in the same way as C compilers, that is, the library
libmystuff.a would be referred to as "-l mystuff".
Again, see the awka-elm manpage for details on awka
extension libraries. Like the three previous options, you can use this
as often as you like on a commandline.
- -f progname
- Specifies the name of an AWK language program to be translated to C.
Multiple -f arguments may be specified.
- program
- An AWK language program on the command-line, usually surrounded by single
quotes (').
- --
- All arguments following this will be passed to the compiled executable
when it is executed. This argument only makes sense when -x has been
specified.
- exe-args
- Arguments to be passed directly to the executable when it is run.
- -h
- Prints a short summary of command-line options.
- -v
- Prints version information then quits.
An executable formed by compiling Awka-generated code against
libawka.a will also understand several command-line arguments.
- -help
- Prints a short summary of executable command-line options, then
exits.
- -We
- Following command-line arguments will be stored in the ARGV array, and not
parsed as options.
- -Wi
- Sets unbuffered writes to stdout and line buffered reads from stdin.
- -v var=value
- Sets variable 'var' to 'value'. 'var' must be a defined scalar variable
within the original AWK program else an error message will be
generated.
- -F value
- Sets FS to value.
- -showarg
- Displays any embedded command-line arguments, then exits.
- -awkaversion
- Shows which version of awka generated the .c code for the executable.
awka contains a number of builtin functions may or may not
presently be found in standard AWK implementations. The functions have been
added to extend functionality, or to provide a faster method of performing
tasks that AWK could otherwise undertake in an inefficient way.
The new functions are:-
- totitle(s)
- converts a string to Title or Proper case, with the first letter of each
word uppercased, the remainder lowercased.
- abort()
- Exits the AWK program immediately without running the END section.
Originally from TAWK, Gawk now supports abort() as well.
- alength(a)
- returns the number of elements stored in array variable a.
- asort(src
[,dest])
- The function introduced in Gawk 3.1.0. From Gawk's manpage, this
"returns the number of elements in the source array src. The
contents of src are sorted using awka's normal rules for
comparing values, and the indexes of the sorted values of src are
replaced with sequential integers starting with 1. If the optional
destination array dest is specified, then src is first
duplicated into dest, and then dest is sorted, leaving the
indexes of the source array src unchanged."
- ascii(s,n)
- Returns the ascii value of character n in string s. If
n is omitted, the value of the first character will be returned. If
n is longer than the string, the last character will be returned. A
Null string will result in a return value of zero.
- char(n)
- Returns the character associated with the ascii value of n. In
effect, this is the complement of the ascii function above.
- left(s,n)
- Returns the leftmost n characters of string s. This is more
efficient than a call to substr.
- right(s,n)
- Returns the rightmost n characters of string s.
- ltrim(s,
c)
- Returns a string with the preceding characters in c removed from
the left of s. For instance, ltrim(" hello", "h
") will return "ello". If c is not specified,
whitespace will be trimmed.
- rtrim(s,
c)
- Returns a string with the preceding characters in c removed from
the right of s. For instance, ltrim(" hello",
"ol") will return " he". If c is not specified,
whitespace will be trimmed.
- trim(s, c)
- Returns a string with the preceding characters in c removed from
each end of s. For instance, trim(" hello", "oh
") will return "ell". If c is not specified,
whitespace will be trimmed. The three trim functions are considerably more
efficient than calls to sub or gsub.
- min(x1,x2,...,xn)
- Returns the lowest number in the series x1 to xn. A minimum
of two and a maximum of 255 numbers may be passed as arguments to
Min.
- max(x1,x2,...,xn)
- Returns the highest number in the series x1 to xn. A minimum
of two and a maximum of 255 numbers may be passed as arguments to
Max.
- time(year,mon,day,hour,sec)
time()
- returns a number representing the date & time in seconds since the
Epoch, 00:00:00GMT 1 Jan 1970. The arguments allow specification of a
date/time, while no arguments will return the current time.
- systime()
- returns a number representing the current date & time in seconds since
the Epoch, 00:00:00 GMT 1 Jan 1970. This function was included to increase
compatibility with Gawk.
- strftime(format,
n)
- returns a string containing the time indicated by n formatted
according to format. See strftime(3) for more details on format
specification. This function was included to increase compatibility with
Gawk.
- gmtime(n)
gmtime()
- returns a string containing Greenwich Mean Time, in the form:-
Fri Jan 8 01:23:56 1999
n is a number specifying seconds since 1 Jan 1970,
while a call with no arguments will return a string containing the
current time.
- localtime(n)
localtime()
- returns a string containing the date & time adjusted for the local
timezone, including daylight savings. Output format & arguments are
the same as gmtime.
- mktime(str)
- The same as mktime() introduced in Gawk 3.1.0. See Gawk's manpage for a
detailed description of what this function does.
- and(y,x)
- Returns the output of 'y & x'.
- or(y,x)
- Returns the output of 'y | x'.
- xor(y,x)
- Returns the output of 'y ^ x'.
- compl(y)
- Returns the output of '~y'.
- lshift(y,x)
- Returns the output of 'y << x'.
- rshift(y,x)
- Returns the output of 'y >> x'.
- argcount()
- When called from within a function, returns the number of arguments that
were passed to that function.
- argval(n[, arg,
arg...])
- When called from within a function, returns the value of variable n
in the argument list. The optional arg parameters are index
elements used if variable n is an array. You may not specify values
for n that are larger than argcount().
- getawkvar(name[,
arg, arg...])
- Returns the value of global variable "name". The optional
arg parameters work in the same as for argval. The variable
specified by name must actually exist.
- gensub(r,s,f[,v])
- Implementation of Gawk's gensub function. It should perform exactly the
same as it does in Gawk. See Gawk's documentation for details on how to
use gensub.
The SORTTYPE variable controls if and how arrays are sorted
when accessed using 'for (i in j)'. The value of this variable is a bitmask,
which may be set to a combination of the following values:-
0 No Sorting
1 Alphabetical Sorting
2 Numeric Sorting
4 Reverse Order
A value for SORTTYPE of 5, therefore, indicates that the
array is to be sorted Alphabetically, in Reverse order.
Awka also supports the FIELDWIDTHS variable, which works
exactly as it does in Gawk.
If the FIELDWIDTHS variable is set to a space separated
list of positive numbers, each field is expected to have fixed width, and
awka will split up the record using the widths specified in
FIELDWIDTHS. The value of FS is ignored. Assigning a value to
FS overrides the use of FIELDWIDTHS, and restores the default
behaviour.
Awka also introduces the SAVEWIDTHS variable. This applies
when FIELDWIDTHS is in use, and $0 is being rebuilt following
a change to a $1..$n field variable.
If the SAVEWIDTHS variable is set to a space separated list
of positive numbers, each output field will be given a fixed width to match
these numbers. $n values shorter than their specified width will be
padded with spaces; if they are longer than their specified width they will
be truncated. Additional values to those specified in SAVEWIDTHS will
be separated using OFS.
Awka 0.7.5 supports the inet/coprocessing features introduced in
Gawk 3.1.0. See the documentation accompanying the Gawk source, or visit
http://home.vr-web.de/Juergen.Kahrs/gawk/gawkinet.html for details on
how these work.
The command-line arguments above provide a range of ways in which
awka may be used, from output of C code to stdout, through to an
automatic translation compile and execution of the AWK program.
(a) Producing C code:-
1. awka -f myprog.awk >myprog.c
2. awka -c main_one -f myprog.awk -f other.awk >myprog.c
(b) Producing C code and an executable:-
awka -X -f myprog.awk -f other.awk
(c) Producing the C and Executable, run the executable:-
awka -x -f myprog.awk -f other.awk -- input.txt
Afterwards, you could run the executable directly, as in:-
Running the same program using an interpreter such as mawk
would be done as follows:-
mawk -f myprog.awk -f other.awk input.txt
The following will run the program, passing it -v on the
command-line without it being interpreted as an 'option':-
awka.out -We -v input.txt, OR
awka -x -f myprog.awk -- -We -v input.txt
(d) Producing and running the executable, ensuring it
and the C program file are automatically removed:-
awka -x -t -f myprog.awk -f other.awk -- input.txt
(e) A simplistic example of how awka might be used in a
Makefile:-
myprog: myprog.o
gcc myprog.o -lawka -lm -o myprog
myprog.o: myprog.c
myprog.c: myprog.awk
awka -f myprog.awk >myprog.c
The C programs produced by awka call many functions in
libawka.a. This library needs to be linked with your program for a
workable executable to be produced.
Note that when using the -x and -X arguments this is
automatically taken care of for you, so linking is only an issue when you
use Awka to produce C code, which you then compile yourself. Many people
many only wish to use Awka in this way, and never use awka-generated
code as part of larger applications. If this is you, you needn't worry too
much about this section.
As well as linking to libawka.a, your program will also
need to be linked to your system's math library, typically libm.a or
libm.so.
Typical compiler commands to link an awka executable might
be as follows:-
gcc myprog.c -L/usr/local/lib -I/usr/local/include -lawka -lm -o myprog
OR
awka -c my_main -f myprog.awk >myprog.c
gcc -c myprog.c -I/usr/local/include -o myprog.o
gcc -c other.c -o other.o
gcc myprog.o other.o -L/usr/local/lib -lawka -lm -o myapp
If you are not sure of how your compiler works you should consult
the manpage for the compiler. In release 0.7.5 Awka introduced Gawk-3.1.0's
inet and coprocess features. On some platforms this may require you to link
to the socket and nsl libraries (-lsocket -lnsl). To check this, look at
config.h after running the configure script. The #define awka_SOCKET_LIBS
indicate what, if any, extra libraries are required on your system.
The -c option, as described previously, replaces the main()
function with a function name of your choosing. You may then link this code
to other C or C++ code, and thus add AWK functionality to a larger
application.
The command line "awka -c matrix 'BEGIN { print "what is
the matrix?" }'" will produce in its output the function "int
matrix(int argc, char *argv[])". Obviously, this replaces the main()
function, and the argc and argv variables are used the same way - they
handle what awka thinks are command-line arguments. Hence argv is an array
of pointers to char *'s, and argc is the number of elements in this array.
argv[0], from the command-line, holds the name of the running program. You
can populate as many argv[] elements as you like to pass as input to your
AWK program. Just remember this array is managed by your calling function,
not by awka.
That's just about it. You should be able to call your awka
function (eg matrix()) as many times as you like. It will grab a little bit
of memory for itself, but you should see no growing memory use with each
call, as I've taken quite some time to eliminate any potential memory leaks
from awka code.
Oh, one more thing, exit and abort statements in
your AWK program code will still exit your program altogether, so be careful
of where & how you use them.
Awka also allows you to create your own C functions and have them
accessible in your AWK programs as if they were built-in to the AWK
language. See the awka-elm and awka-elmref manpages for
details on how this is done.
libawka.a, libawka.so, awka,
libawka.h, libdfa.a, dfa.h
awk(1), mawk(1), gawk(1), awka-elm(5)
awka-elmref(5), cc(1), gcc(1)
Aho, Kernighan and Weinberger, The AWK Programming Language,
Addison-Wesley Publishing, 1988, (the AWK book), defines the language,
opening with a tutorial and advancing to many interesting programs that
delve into issues of software design and analysis relevant to programming in
any language.
The GAWK Manual, The Free Software Foundation, 1991, is a tutorial
and language reference that does not attempt the depth of the AWK book and
assumes the reader may be a novice programmer. The section on AWK arrays is
excellent. It also discusses Posix requirements for AWK.
Like you, I should probably buy & read these books some
day.
awka does not implement gawk's internal variable
IGNORECASE. Gawk's /dev/pid functions are also absent.
Nextfile and next may not be used within functions. This will
never be supported, unlike the previous features, which may be added to
awka over time. Well, so I thought. As of release 0.7.3 you _can_ use
these from within functions.
Andrew Sumner (andrewsumner@yahoo.com)
The awka homepage is at http://awka.sourceforge.net.
The latest version of awka, along with development 'snapshot'
releases, are available from this page. All major releases will be announced
in comp.lang.awk. If you would like to be notified of new releases, please
send me an email to that effect. Make sure you preface any email messages
with the word "awka" in the title so I know its not spam.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|