NAME

create_viterbi27, init_viterbi27, update_viterbi27, chainback_viterbi27, delete_viterbi27, create_viterbi29, init_viterbi29, update_viterbi29, chainback_viterbi29, delete_viterbi29 - IA32 SIMD-assisted Viterbi decoders

SYNOPSIS

#include "viterbi27.h"
void *create_viterbi27(int blocklen);
int init_viterbi27(void *vp,int starting_state);
int update_viterbi27(void *vp,unsigned char sym1,unsigned char sym2);
int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi27(void *vp);
void emms_viterbi27(void);
extern char id_viterbi27[];

#include "viterbi29.h"
void *create_viterbi29(int blocklen);
int init_viterbi29(void *vp,int starting_state);
int update_viterbi29(void *vp,unsigned char sym1,unsigned char sym2);
int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi29(void *vp);
void emms_viterbi29(void);
extern char id_viterbi29[];

DESCRIPTION

These functions implement high performance Viterbi decoders for two convolutional codes: a rate 1/2 constraint length 7 (k=7) code ("viterbi27") and a rate 1/2 k=9 code ("viterbi29"). The decoders use the Intel IA32 SIMD instruction sets, if available, to improve performance.

There are three different IA32 SIMD instruction sets. The most common is MMX, first implemented on later Intel Pentiums and then on the Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe, etc). SSE was introduced on the Pentium III and later implemented in the AMD Athlon 4 (AMD calls it "3D Now! Professional"). Most recently, SSE2 was introduced in the Intel Pentium 4. As of late 2001, there are no other known implementations of SSE2.

Four separate static libraries implement the decoders for the four different instruction sets. -lviterbi_port uses no SIMD instructions; it is intended for pre-MMX IA32 machines and for non-IA32 machines. -lviterbi_mmx is for IA-32 machines that support the MMX instructions; -lviterbi_sse is for machines with the SSE instructions, and -lviterbi_sse2 is for machines with SSE2 support. The function names and calling conventions are the same for all four versions, although the size of certain internal data structures are different.

A shared library, -lviterbi is also provided; it is assumed to refer to the correct version for the current machine.

USAGE

Two versions of each function are provided, one for the k=7 code and another for the k=9 code. In the following discussion the k=7 code will be assumed. To use the k=9 code, simply change all references to "viterbi27" to "viterbi29".

Before Viterbi decoding can begin, an instance must first be created with create_viterbi27(). This function creates and returns a pointer to an internal control structure containing the path metrics and the branch decisions. create_viterbi27() takes one argument that gives the length of the data block in bits. You must not attempt to decode a block longer than the length given to create_viterbi27().

After a decoder instance is created, and before decoding a new frame, init_viterbi27() must be called to reset the decoder state. It accepts the instance pointer returned by create_viterbi27() and the initial starting state of the convolutional encoder (usually 0). If the initial starting state is unknown or incorrect, the decoder will still function but the decoded data may be incorrect at the start of the block.

Each pair of received symbols is processed with a call to update_viterbi27(). Each symbol is expected to range from 0 through 15, with 0 corresponding to a "strong 0" and 15 corresponding to a "strong 1". The caller is responsible for determining the proper pairing of input symbols (commonly known as decoder symbol phasing).

At the end of the block, the data is recovered with a call to chainback_viterbi27(). The arguments are the pointer to the decoder instance, a pointer to a user-supplied buffer into which the decoded data is to be written, the number of data bits (not bytes) that are to be decoded, and the terminal state of the convolutional encoder at the end of the frame (usually 0). If the terminal state is incorrect or unknown, the decoded data bits at the end of the frame may be unreliable. The decoded data is written in big-endian order, i.e., the first bit in the frame is written into the high order bit of the first byte in the buffer. If the frame is not an integral number of bytes long, the low order bits of the last byte in the frame will be unused.

Note that the decoders assume the use of a tail, i.e., the encoding and transmission of a sufficient number of padding bits beyond the end of the user data to force the convolutional encoder into the known terminal state given to chainback_viterbi27(). The k=7 code uses 6 tail bits (12 tail symbols) and the k=9 code uses 8 tail bits (16 tail symbols).

The tail bits are not included in the length arguments to create_viterbi27() and chainback_viterbi27(). For example, if the block contains 1000 user bits, then this would be the length parameter given to create_viterbi27() and chainback_viterbi27(), and update_viterbi27() would be called a total of 1006 times - the last 6 with the 12 encoded symbols representing the tail bits.

After the call to chainback_viterbi27(), the decoder may be reset with a call to init_viterbi27() and another block can be decoded. Alternatively, delete_viterbi27() can be called to free all resources used by the Viterbi decoder.

The MMX and SSE versions of the decoder use registers aliased onto the Intel floating point registers, so you must insert calls to emms_viterbi27() between calls to update_viterbi27() and any subsequent floating point computations in your program. You need not do this after every call to update_viterbi27() if you perform floating point only after the end of the frame. In this case you may defer the call to emms_viterbi27() until after chainback_viterbi27() has been called.

emms_viterbi27() is a no-op in the portable and SSE2 versions of the decoder, so you can safely call it regardless of library version. (The SSE2 version uses the XMM registers, which do not interfere with the X87 floating point stack. Hence emms calls are not necessary with this version.)

The global character string id_viterbi27[] identifies the decoder version in use.

RETURN VALUES

create_viterbi27() returns a pointer to the structure containing the decoder state. update_viterbi27() returns the amount by which the decoder path metrics were normalized in the current step. Only the portable C, SSE and SSE2 versions perform normalization; the MMX version uses modulo arithmetic.

AUTHOR

Phil Karn, KA9Q (karn@ka9q.net)