The sparse modulo-2 matrix LU decomposition routine mod2sparse_decomp (which is used by the make-gen program when it is asked to create a sparse generator matrix) tries to find an sub-matrix of a matrix (for make-gen, the parity check matrix), and an ordering of rows and columns for this sub-matrix, that leads to the lower-triangular matrix L and the upper-triangular matrix U making up the LU decomposition being as sparse as possible. Finding an optimal solution is too difficult, so instead a heuristic strategy is used.
The overall algorithm finds L and U a column at a time, from left to right (as reordered, in the case of U). As this is done, a copy, B, of the original matrix is modified. To create column i of L and U, some element with value 1 in B whose row and column indexes, after reordering, are both greater than i is found. The row and column of this element are considered to come next in the reordering, and the contents of the column containing this element is copied to L and U (upper elements going to U, lower to L). The row containing this element is then added to some later rows so as to clear the lower part of this column to zeros.
At the first step of this process - selecting an element with value 1 from the later rows and columns - there will often be several possibilities. Different choices can lead to the final result being more or less sparse. The possible strategies for picking an element are identified by the constants Mod2sparse_first, Mod2sparse_mincol, and Mod2sparse_minprod. These strategies operate as follows:
Mod2sparse_first
Select the first element with value 1 that is encountered in a top to bottom, left to right search.
Mod2sparse_mincol
Select the first element with value 1 that is contained in a column of B that has the smallest number of 1s of any column.
Mod2sparse_minprod
Select an element with value 1 for which the product of the number of 1s in that row of B minus one times the number of 1s in that column of B minus one is as small as possible.
The abandon_number and abandon_when parameters can modify the basic strategy. If abandon_number is greater than zero, then after abandon_when columns have been selected, abandon_number of the remaining columns are abandoned as candidates for possible future selection, the abandoned columns being those with the greatest number of entries. Abandoning such columns saves space and time, but may make the final result less sparse than it would otherwise be, and can possibly result in the matrix appearing to have lower rank than it actually has.
The methods described here are fairly straightforward adaptations of standard methods for sparse square matrices of reals, as described, for example, in
I. S. Duff, A. M. Erisman, J. K. Reid (1986) Direct Methods for Sparse Matrices, Oxford: Clarendon Press.In the coding context, however, we are interested in matrices of modulo-2 elements, and it is enough to find a sparse LU decomposition of any square sub-matrix that can be obtained by selecting columns of the rectangular parity check matrix. I talked about the application of sparse matrix methods to encoding of LDPC codes at the 1999 IMA workshop on Codes, Systems and Graphical Models (see the references).