This software deals only with systematic encodings, in which the K bits of s are copied unchanged to some subset of the N bits of x (the message bits), and the remaining M=N-K check bits of x are then set so as to make the result a codeword. For a linear code, a systematic encoding scheme always exists, for some choice of which bits of a codeword are message bits. It is conventional to rearrange the order of the bits in a codeword so that the message bits come first. The first K columns of the K by N generator matrix will then be the identity matrix.
However, this software does not assume that the message bits come first, since different encoding methods prefer different locations for the message bits. Instead, a vector of indexes of where each message bit is located within a codeword is recorded in a file along with a representation of the part of the generator matrix that produces the check bits. More than one such generator matrix file can be created for a single parity check file, in which the locations of the message bits may be different. Decoding of a received message into a codeword (with decode) does not depend on knowing which are the message bits, though this does need to be known in order to reconstruct the original message (with extract).
This software stores representations of generator matrices in files
in a format that is not human-readable (except by using the print-gen program). However, these
files are readable on a machine with a different architecture
than they were written on.
Generator matrix representations
For simplicity of exposition, it will be assumed for the next few paragraphs that the message bits are located at the end of the codeword, so a codeword can be divided into M check bits, c, followed by K message bits, s.
On the above assumption, the parity check matrix, H, can be divided into an M by M matrix A occupying the first M columns of H and an M by K matrix B occupying the remaining columns of H. The requirement that a codeword, x, satisfy all parity checks (ie, that Hx=0) can then be written as
Ac + Bs = 0Provided that A is non-singular, it follows that
c = A-1BsA may be singular for some choices of which codeword bits are message bits, but a choice for which A is non-singular always exists if the rows of H are linearly independent. It is possible, however, that the rows of H are not linearly independent (ie, some rows are redundant). This is an exceptional and not particularly interesting case, which is mostly ignored in the descriptions below; see the discussion of linear dependence in parity check matrices for the details.
The equation c = A-1Bs defines what the check bits should be, but actual computation of these check bits can be done in several ways, three of which are implemented in this software. Each method involves a different representation of the generator matrix. (Note that none of these methods involves the explicit representation of the matrix G mentioned above.)
In the dense representation, the M by K matrix A-1B is computed, and stored in a dense format (see the dense modulo-2 matrix package). A message is encoded by multiplying the source bits, s, by this matrix to obtain the required check bits.
In the mixed representation, the M by M matrix A-1 is computed and stored in a dense format, and the M by K matrix B, the right portion of the parity check matrix, is also stored, in a sparse format (see the sparse modulo-2 matrix package). To encode a message, the source vector s is first multiplied on the left by B, an operation which can be done very quickly if B is sparse (as it will be for LDPC codes). The result is then multiplied on the left by A-1. If M<K, the total time may be less than when using the dense representation above.
The sparse representation goes further, and avoids explicitly computing A-1, which tends to be dense even if H is sparse. Instead, a LU decomposition of A is found, consisting of a lower triangular matrix L and an upper triangular matrix U for which LU=A. The effect of multiplying Bs=z by A-1 can then be obtained by
Solving Ly=z for y using forward substitution.Both of these operations will be fast if L and U are sparse. Heuristics are used to try to achieve this, by rearranging the rows and columns of H in the process of selecting A and finding its LU decomposition.
Solving Uc=y for c using backward substitution.
Finds a generator matrix for the code whose parity check matrix is
in pchk-file, and writes a representation of this
generator matrix to gen-file. The remaining arguments
specify what representation of the generator matrix is to be used (see
the description above), and the method to be
used in finding it. A message regarding the density of 1s in the
resulting representation is displayed on standard error. For a sparse
representation, a smaller number of 1s will produce faster encoding.
All representations include a specification for how the columns of
the parity check matrix should be re-ordered so that the message bits
come last. References to columns of the parity check matrix below
refer to their order after this reordering. For the dense and
mixed representations, an other-gen-file may be
specified, in which case the ordering of columns will be the same as
the ordering stored in that file (which must produce a non-singular
A matrix; redundant rows of H are not allowed with this
option). Otherwise, make-gen decides on an appropriate
ordering of columns itself. Note that the column rearrangement is
recorded as part of the representation of the generator matrix; the
parity check matrix as stored in its file is not altered.
The dense representation consists of a dense representation
of the matrix A-1B, where
A is the matrix consisting of the first M columns (after
reordering) of the parity check matrix, and B is the remaining
columns. If H contains redundant rows, there is an additional
reordering of columns of A in order create the same effect as
if the redundant rows came last.
The mixed representation consists of a dense representation
of the matrix A-1, where A is
the matrix consisting of the first M columns (after reordering)
of the parity check matrix. The remaining columns of the parity check
matrix, making up the matrix B, are also part of the
representation, but are not written to gen-file, since
they can be obtained from pchk-file. As for mixed
representations, an additional reordering of columns of A may
be needed if H has redundant rows.
A sparse representation consists of sparse representations
of the L and U matrices, whose product is A, the
first M columns of the parity check matrix (whose columns and
rows may both have been reordered). The matrix B, consisting
of the remaining columns of the parity check matrix, is also part of
the representation, but it is not written to gen-file,
since it can be obtained from pchk-file.
If a sparse representation is chosen, arguments after
sparse specify what heuristic is used when reordering columns
and rows in order to try to make L and U as sparse as
possible. The default if no heuristic is specified is
minprod. If the abandon-num and
abandon-when options are given, some information is
discarded in order to speed up the process of finding L and
U, at a possible cost in terms of how good a result is
obtained. For details on these heuristics, see the descriptions of sparse LU decomposition methods.
Example: A dense representation of a generator matrix for the
Hamming code created by the example for make-pchk can be created as follows:
Prints in human-readable form the representation of the generator
matrix that is stored in gen-file. The -d
option causes the matrices involved to be printed in a dense format,
even if they are stored in the file in a sparse format. See the description above for details of generator matrix
representations. Note that the L matrix for a sparse representation
will be lower triangular only after the rows are rearranged, and the U
matrix will be upper triangular only after the columns are rearranged.
The matrix B that is part of the sparse
and mixed representations is not printed, since it is not stored
in the gen-file, but is rather a subset of columns
of the parity check matrix.
Example: The generator matrix for the
Hamming code created by the example for make-gen can be printed as follows:
A newline is output at the end of each block written to
encoded-file. Newlines in source-file
are ignored.
If the -f option is given, output to encoded-file
is flushed after each block. This allows one to use encode as a server,
reading blocks to encode from a named pipe, and writing the encoded block
to another named pipe.
make-gen: Make a generator matrix from
a parity check matrix.
make-gen pchk-file gen-file method
where method is one of the following:
sparse [ first | mincol | minprod ] [ abandon-num abandon-when ]
dense [ other-gen-file ]
mixed [ other-gen-file ]
print-gen: Print a representation of a
generator matrix.
print-gen [ -d ] gen-file
encode: Encode message blocks as codewords
Encodes message blocks of length K, read from
source-file, as codewords of length N, which
are written to encoded-file, replacing any previous
data in this file. Here, N is the number of columns in the
parity check matrix in pchk-file, and
K=N-M, where M is the number of rows in the
parity check matrix. The generator matrix used, from
gen-file, determines which bits of the codeword are
set to the message bits, and how the remaining check bits are
computed. The generator matrix is created from
pchk-file using make-gen.
encode [ -f ] pchk-file gen-file source-file encoded-file
Back to index for LDPC software