void *create_viterbi27(int blocklen);
int init_viterbi27(void *vp,int starting_state);
int update_viterbi27(void *vp,unsigned char sym1,unsigned char sym2);
int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi27(void *vp);
extern char id_viterbi27;
void *create_viterbi29(int blocklen);
int init_viterbi29(void *vp,int starting_state);
int update_viterbi29(void *vp,unsigned char sym1,unsigned char sym2);
int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi29(void *vp);
extern char id_viterbi29;
These functions implement high performance Viterbi decoders for two
convolutional codes: a rate 1/2 constraint length 7 (k=7) code
("viterbi27") and a rate 1/2 k=9 code ("viterbi29"). The decoders use
the Intel IA32 SIMD instruction sets, if available, to improve
There are three different IA32 SIMD instruction sets. The most common
is MMX, first implemented on later Intel Pentiums and then on the
Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe,
etc). SSE was introduced on the Pentium III and later implemented in
the AMD Athlon 4 (AMD calls it "3D Now! Professional"). Most recently,
SSE2 was introduced in the Intel Pentium 4. As of late 2001, there are
no other known implementations of SSE2.
Four separate static libraries implement the decoders for the four different
instruction sets. -lviterbi_port uses no SIMD instructions; it is
intended for pre-MMX IA32 machines and for non-IA32 machines.
-lviterbi_mmx is for IA-32 machines that support the MMX
instructions; -lviterbi_sse is for machines with the SSE
instructions, and -lviterbi_sse2 is for machines with SSE2
support. The function names and calling conventions are the same for
all four versions, although the size of certain internal data
structures are different.
A shared library, -lviterbi is also provided; it is assumed to
refer to the correct version for the current machine.
Two versions of each function are provided, one for the k=7 code and
another for the k=9 code. In the following discussion the k=7 code
will be assumed. To use the k=9 code, simply change all references to
"viterbi27" to "viterbi29".
Before Viterbi decoding can begin, an instance must first be created with
create_viterbi27(). This function creates and returns a pointer to
an internal control structure
containing the path metrics and the branch
decisions. create_viterbi27() takes one argument that gives the
length of the data block in bits. You must not attempt to
decode a block longer than the length given to create_viterbi27().
After a decoder instance is created, and before decoding a new frame,
init_viterbi27() must be called to reset the decoder state.
It accepts the instance pointer returned by
create_viterbi27() and the initial starting state of the
convolutional encoder (usually 0). If the initial starting state is unknown or
incorrect, the decoder will still function but the decoded data may be
incorrect at the start of the block.
Each pair of received symbols is processed with a call to
update_viterbi27(). Each symbol is expected to range from 0
through 15, with 0 corresponding to a "strong 0" and 15 corresponding
to a "strong 1". The caller is responsible for determining the proper
pairing of input symbols (commonly known as decoder symbol phasing).
At the end of the block, the data is recovered with a call to
chainback_viterbi27(). The arguments are the pointer to the
decoder instance, a pointer to a user-supplied buffer into which the
decoded data is to be written, the number of data bits (not bytes)
that are to be decoded, and the terminal state of the convolutional
encoder at the end of the frame (usually 0). If the terminal state is
incorrect or unknown, the decoded data bits at the end of the frame
may be unreliable. The decoded data is written in big-endian order,
i.e., the first bit in the frame is written into the high order bit of
the first byte in the buffer. If the frame is not an integral number
of bytes long, the low order bits of the last byte in the frame will
Note that the decoders assume the use of a tail, i.e., the encoding
and transmission of a sufficient number of padding bits beyond the end
of the user data to force the convolutional encoder into the known
terminal state given to chainback_viterbi27(). The k=7 code
uses 6 tail bits (12 tail symbols) and the k=9 code uses 8 tail
bits (16 tail symbols).
The tail bits are not included in the length arguments to
create_viterbi27() and chainback_viterbi27(). For example, if
the block contains 1000 user bits, then this would be the length
parameter given to create_viterbi27() and
chainback_viterbi27(), and update_viterbi27() would be called
a total of 1006 times - the last 6 with the 12 encoded symbols
representing the tail bits.
After the call to chainback_viterbi27(), the decoder may be reset
with a call to init_viterbi27() and another block can be decoded.
Alternatively, delete_viterbi27() can be called to free all resources
used by the Viterbi decoder.
The MMX and SSE versions of the decoder use registers aliased
onto the Intel floating point registers, so
you must insert calls to emms_viterbi27()
between calls to
update_viterbi27() and any subsequent floating point computations
in your program. You need not do this after every call
if you perform floating point only after the end of the frame.
In this case you may defer the call to emms_viterbi27()
until after chainback_viterbi27() has been called.
emms_viterbi27() is a no-op in the portable and
SSE2 versions of the decoder, so you can safely call it
regardless of library version. (The SSE2 version uses the XMM registers, which do
not interfere with the X87 floating point stack. Hence emms calls
are not necessary with this version.)
The global character string id_viterbi27 identifies the decoder
version in use.