Encoding Message Blocks

To use a code to send messages, we must define a mapping from a bit vector, s, of length K, representing a source message, to a codeword, x, of length N>K. We will consider only linear mappings, which can be written in the form x=G^Ts, where G is a generator matrix. For a code with parity check matrix H, whose codewords satisfy Hx=0, the generator matrix must satisfy HG^T=0. This software assumes that the number of rows in the parity check matrix, M, is equal to N-K, as would normally be the case.

This software deals only with systematic encodings, in which the K bits of s are copied unchanged to some subset of the N bits of x (the message bits), and the remaining M=N-K check bits of x are then set so as to make the result a codeword. For a linear code, a systematic encoding scheme always exists, for some choice of which bits of a codeword are message bits. It is conventional to rearrange the order of the bits in a codeword so that the message bits come first. The first K columns of the K by N generator matrix will then be the identity matrix.

However, this software does not assume that the message bits come first, since different encoding methods prefer different locations for the message bits. Instead, a vector of indexes of where each message bit is located within a codeword is recorded in a file along with a representation of the part of the generator matrix that produces the check bits. More than one such generator matrix file can be created for a single parity check file, in which the locations of the message bits may be different. Decoding of a received message into a codeword (with decode) does not depend on knowing which are the message bits, though this does need to be known in order to reconstruct the original message (with extract).

This software stores representations of generator matrices in files in a format that is not human-readable (except by using the print-gen program). However, these files are readable on a machine with a different architecture than they were written on.

Generator matrix representations

For simplicity of exposition, it will be assumed for the next few paragraphs that the message bits are located at the end of the codeword, so a codeword can be divided into M check bits, c, followed by K message bits, s.

On the above assumption, the parity check matrix, H, can be divided into an M by M matrix A occupying the first M columns of H and an M by K matrix B occupying the remaining columns of H. The requirement that a codeword, x, satisfy all parity checks (ie, that Hx=0) can then be written as

Ac + Bs = 0

Provided that A is non-singular, it follows that

c = A^-1Bs

A may be singular for some choices of which codeword bits are message bits, but a choice for which A is non-singular always exists if the rows of H are linearly independent. It is possible, however, that the rows of H are not linearly independent (ie, some rows are redundant). This is an exceptional and not particularly interesting case, which is mostly ignored in the descriptions below; see the discussion of linear dependence in parity check matrices for the details.

The equation c = A^-1Bs defines what the check bits should be, but actual computation of these check bits can be done in several ways, three of which are implemented in this software. Each method involves a different representation of the generator matrix. (Note that none of these methods involves the explicit representation of the matrix G mentioned above.)

In the dense representation, the M by K matrix A^-1B is computed, and stored in a dense format (see the dense modulo-2 matrix package). A message is encoded by multiplying the source bits, s, by this matrix to obtain the required check bits.

In the mixed representation, the M by M matrix A^-1 is computed and stored in a dense format, and the M by K matrix B, the right portion of the parity check matrix, is also stored, in a sparse format (see the sparse modulo-2 matrix package). To encode a message, the source vector s is first multiplied on the left by B, an operation which can be done very quickly if B is sparse (as it will be for LDPC codes). The result is then multiplied on the left by A^-1. If M<K, the total time may be less than when using the dense representation above.

The sparse representation goes further, and avoids explicitly computing A^-1, which tends to be dense even if H is sparse. Instead, a LU decomposition of A is found, consisting of a lower triangular matrix L and an upper triangular matrix U for which LU=A. The effect of multiplying Bs=z by A^-1 can then be obtained by

Solving Ly=z for y using forward substitution.
Solving Uc=y for c using backward substitution.

Both of these operations will be fast if L and U are sparse. Heuristics are used to try to achieve this, by rearranging the rows and columns of H in the process of selecting A and finding its LU decomposition.

make-gen: Make a generator matrix from a parity check matrix.

make-gen pchk-file gen-file method

where method is one of the following:

sparse [ first | mincol | minprod ] [ abandon-num abandon-when ]

dense [ other-gen-file  ]

mixed [ other-gen-file  ]

Finds a generator matrix for the code whose parity check matrix is in pchk-file, and writes a representation of this generator matrix to gen-file. The remaining arguments specify what representation of the generator matrix is to be used (see the description above), and the method to be used in finding it. A message regarding the density of 1s in the resulting representation is displayed on standard error. For a sparse representation, a smaller number of 1s will produce faster encoding.

All representations include a specification for how the columns of the parity check matrix should be re-ordered so that the message bits come last. References to columns of the parity check matrix below refer to their order after this reordering. For the dense and mixed representations, an other-gen-file may be specified, in which case the ordering of columns will be the same as the ordering stored in that file (which must produce a non-singular A matrix; redundant rows of H are not allowed with this option). Otherwise, make-gen decides on an appropriate ordering of columns itself. Note that the column rearrangement is recorded as part of the representation of the generator matrix; the parity check matrix as stored in its file is not altered.

The dense representation consists of a dense representation of the matrix A^-1B, where A is the matrix consisting of the first M columns (after reordering) of the parity check matrix, and B is the remaining columns. If H contains redundant rows, there is an additional reordering of columns of A in order create the same effect as if the redundant rows came last.

The mixed representation consists of a dense representation of the matrix A^-1, where A is the matrix consisting of the first M columns (after reordering) of the parity check matrix. The remaining columns of the parity check matrix, making up the matrix B, are also part of the representation, but are not written to gen-file, since they can be obtained from pchk-file. As for mixed representations, an additional reordering of columns of A may be needed if H has redundant rows.

A sparse representation consists of sparse representations of the L and U matrices, whose product is A, the first M columns of the parity check matrix (whose columns and rows may both have been reordered). The matrix B, consisting of the remaining columns of the parity check matrix, is also part of the representation, but it is not written to gen-file, since it can be obtained from pchk-file.

If a sparse representation is chosen, arguments after sparse specify what heuristic is used when reordering columns and rows in order to try to make L and U as sparse as possible. The default if no heuristic is specified is minprod. If the abandon-num and abandon-when options are given, some information is discarded in order to speed up the process of finding L and U, at a possible cost in terms of how good a result is obtained. For details on these heuristics, see the descriptions of sparse LU decomposition methods.

Example: A dense representation of a generator matrix for the Hamming code created by the example for make-pchk can be created as follows:

make-gen ham7.pchk ham7.gen dense
Number of 1s per check in Inv(A) X B is 3.0

print-gen: Print a representation of a generator matrix.

print-gen [ -d ] gen-file

Prints in human-readable form the representation of the generator matrix that is stored in gen-file. The -d option causes the matrices involved to be printed in a dense format, even if they are stored in the file in a sparse format. See the description above for details of generator matrix representations. Note that the L matrix for a sparse representation will be lower triangular only after the rows are rearranged, and the U matrix will be upper triangular only after the columns are rearranged. The matrix B that is part of the sparse and mixed representations is not printed, since it is not stored in the gen-file, but is rather a subset of columns of the parity check matrix.

Example: The generator matrix for the Hamming code created by the example for make-gen can be printed as follows:

print-gen ham7.gen

Generator matrix (dense representation):

Column order:

   0   1   2   3   4   5   6

Inv(A) X B:

 1 1 1 0
 1 1 0 1
 0 1 1 1

For this example, the columns did not need to be rearranged, and hence the message bits will be in positions 3, 4, 5, and 6 of a codeword.

encode: Encode message blocks as codewords

encode [ -f ] pchk-file gen-file source-file encoded-file

Encodes message blocks of length K, read from source-file, as codewords of length N, which are written to encoded-file, replacing any previous data in this file. Here, N is the number of columns in the parity check matrix in pchk-file, and K=N-M, where M is the number of rows in the parity check matrix. The generator matrix used, from gen-file, determines which bits of the codeword are set to the message bits, and how the remaining check bits are computed. The generator matrix is created from pchk-file using make-gen.

A newline is output at the end of each block written to encoded-file. Newlines in source-file are ignored.

If the -f option is given, output to encoded-file is flushed after each block. This allows one to use encode as a server, reading blocks to encode from a named pipe, and writing the encoded block to another named pipe.

Back to index for LDPC software