WSJT-X/lib/sfrsd2/sfrsd_paper/sfrsd.lyx
Steven Franke 125e8d8e12 Additions to sfrsd document.
git-svn-id: svn+ssh://svn.code.sf.net/p/wsjt/wsjt/branches/wsjtx@6200 ab8295b8-cf94-4d9e-aec4-7959e3be5d79
2015-11-28 23:31:01 +00:00

1014 lines
24 KiB
Plaintext

#LyX 2.1 created this file. For more info see http://www.lyx.org/
\lyxformat 474
\begin_document
\begin_header
\textclass paper
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman default
\font_sans default
\font_typewriter default
\font_math auto
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry true
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 1in
\topmargin 1in
\rightmargin 1in
\bottommargin 1in
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
A stochastic successive erasures soft-decision decoder for the JT65 (63,12)
Reed-Solomon code
\end_layout
\begin_layout Author
Steven J.
Franke, K9AN and Joseph H.
Taylor, K1JT
\end_layout
\begin_layout Abstract
The JT65 mode has revolutionized amateur-radio weak-signal communication
by enabling amateur radio operators with small antennas and relatively
low-power transmitters to communicate over propagation paths that could
not be utilized using traditional technologies.
One reason for the success and popularity of the JT65 mode is its use of
strong error-correction coding.
The JT65 code is a short block-length, low-rate, Reed-Solomon code based
on a 64-symbol alphabet.
Since 200?, decoders for the JT65 code have used the
\begin_inset Quotes eld
\end_inset
Koetter-Vardy
\begin_inset Quotes erd
\end_inset
(KV) algebraic soft-decision decoder.
The KV decoder is implemented in a closed-source program that is licensed
to K1JT for use in amateur applications.
This note describes a new open-source alternative to the KV decoder called
the SFRSD decoder.
The SFRSD decoding algorithm is shown to perform at least as well as the
KV decoder.
The SFRSD algorithm is conceptually simple and is built around the well-known
Berlekamp-Massey errors-and-erasures decoder.
\end_layout
\begin_layout Section
Introduction
\end_layout
\begin_layout Standard
JT65 message frames consist of a short, compressed, message that is encoded
for transmission using a Reed-Solomon code.
Reed-Solomon codes are block codes and, like all block codes, are characterized
by the length of their codewords,
\begin_inset Formula $n$
\end_inset
, the number of message symbols conveyed by the codeword,
\begin_inset Formula $k$
\end_inset
, and the number of possible values for each symbol in the codewords.
The codeword length and the number of message symbols are specified as
a tuple in the form
\begin_inset Formula $(n,k)$
\end_inset
.
JT65 uses a (63,12) Reed-Solomon code with 64 possible values for each
symbol, so each symbol represents
\begin_inset Formula $\log_{2}64=6$
\end_inset
message bits.
The source-encoded messages conveyed by a 63-symbol JT65 frame consist
of 72 bits.
The JT65 code is systematic, which means that the 12 message symbols are
embedded in the codeword without modification and another 51 parity symbols
derived from the message symbols are added to form the codeword consisting
of 63 total symbols.
\end_layout
\begin_layout Standard
The concept of Hamming distance is used as a measure of
\begin_inset Quotes eld
\end_inset
distance
\begin_inset Quotes erd
\end_inset
between different codewords, or between a received word and a codeword.
Hamming distance is the number of code symbols that differ in the two words
that are being compared.
Reed-Solomon codes have minimum Hamming distance
\begin_inset Formula $d$
\end_inset
, where
\begin_inset Formula
\begin{equation}
d=n-k+1.\label{eq:minimum_distance}
\end{equation}
\end_inset
The minimum Hamming distance of the JT65 code is
\begin_inset Formula $d=52$
\end_inset
, which means that any particular codeword differs from all other codewords
in at least 52 positions.
\end_layout
\begin_layout Standard
Given only a received word containing some incorrect symbols (errors), the
received word can be decoded into the correct codeword using a deterministic,
algebraic, algorithm provided that no more than
\begin_inset Formula $t$
\end_inset
symbols were received incorrectly, where
\begin_inset Formula
\begin{equation}
t=\left\lfloor \frac{n-k}{2}\right\rfloor .\label{eq:t}
\end{equation}
\end_inset
For the JT65 code,
\begin_inset Formula $t=25$
\end_inset
, which means that it is always possible to efficiently decode a received
word that contains no more than 25 symbol errors.
\end_layout
\begin_layout Standard
There are a number of well-known algebraic algorithms that can carry out
the process of decoding a received codeword that contains no more than
\begin_inset Formula $t$
\end_inset
errors.
One such algorithm is the Berlekamp-Massey (BM) decoding algorithm.
\end_layout
\begin_layout Standard
A decoder, such as BM, must carry out two tasks:
\end_layout
\begin_layout Enumerate
determine which symbols were received incorrectly
\end_layout
\begin_layout Enumerate
determine the correct value of the incorrect symbols
\end_layout
\begin_layout Standard
If it is somehow known that certain symbols are incorrect, such information
can be used in the decoding algorithm to reduce the amount of work required
in step 1 and to allow step 2 to correct more than
\begin_inset Formula $t$
\end_inset
errors.
In fact, in the unlikely event that the location of each and every error
is known and is provided to the BM decoder, and if no correct symbols are
accidentally labeled as errors, then the BM decoder can correct up to
\begin_inset Formula $d$
\end_inset
errors!
\end_layout
\begin_layout Standard
In the decoding algorithm described herein, a list of symbols that are known
or suspected to be incorrect is sent to the BM decoder.
Symbols in the received word that are flagged as being incorrect are called
\begin_inset Quotes eld
\end_inset
erasures
\begin_inset Quotes erd
\end_inset
.
Symbols that are not erased and that are incorrect will be called
\begin_inset Quotes eld
\end_inset
errors
\begin_inset Quotes erd
\end_inset
.
The BM decoder accepts erasure information in the form of a list of indices
corresponding to the incorrect, or suspected incorrect, symbols in the
received word.
As already noted, if the erasure information is perfect, then up to 51
errors will be corrected.
When the erasure information is imperfect, then some of the erased symbols
will actually be correct, and some of the unerased symbols will be in error.
If a total of
\begin_inset Formula $n_{e}$
\end_inset
symbols are erased and the remaining unerased symbols contain
\begin_inset Formula $x$
\end_inset
errors, then the BM algorithm can find the correct codeword as long as
\begin_inset Formula
\begin{equation}
n_{e}+2x\le d-1\label{eq:erasures_and_errors}
\end{equation}
\end_inset
If
\begin_inset Formula $n_{e}=0$
\end_inset
, then the decoder is said to be an
\begin_inset Quotes eld
\end_inset
errors-only
\begin_inset Quotes erd
\end_inset
decoder and it can correct up to
\begin_inset Formula $t$
\end_inset
errors (
\begin_inset Formula $t$
\end_inset
=25 for JT65).
If
\begin_inset Formula $0<n_{e}\le d-1$
\end_inset
(
\begin_inset Formula $d-1=51$
\end_inset
for JT65), then the decoder is said to be an
\begin_inset Quotes eld
\end_inset
errors-and-erasures
\begin_inset Quotes erd
\end_inset
decoder.
\end_layout
\begin_layout Standard
For the JT65 code, (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
) says that if
\begin_inset Formula $n_{e}$
\end_inset
symbols are declared to be erased, then the BM decoder will find the correct
codeword as long as the remaining un-erased symbols contain no more than
\begin_inset Formula $\left\lfloor \frac{51-n_{e}}{2}\right\rfloor $
\end_inset
errors.
The errors-and-erasures capability of the BM decoder is a very powerful
feature that serves as the core of the new soft-decision decoder described
herein.
\end_layout
\begin_layout Standard
It will be helpful to have some understanding of the errors and erasures
tradeoff described by (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
) to appreciate how the new decoder algorithm works.
Section NN describes some examples that illustrate ho w the errors-and-erasures
capability can be combined with some information about the quality of the
received symbols to enable a decoding algorithm to reliably decode received
words that contain many more than 25 errors.
Section NN describes the SFRSD decoding algorithm.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:You've-got-to"
\end_inset
You've got to ask yourself.
Do I feel lucky?
\end_layout
\begin_layout Standard
Consider a particular received codeword that contains 40 incorrect symbols
and 23 correct symbols.
It is not known which 40 symbols are in error
\begin_inset Foot
status open
\begin_layout Plain Layout
In practice the number of errors will not be known either, but this is not
a serious problem.
\end_layout
\end_inset
.
Suppose that the decoder randomly chooses 40 symbols to erase (
\begin_inset Formula $n_{e}=40$
\end_inset
), leaving 23 unerased symbols.
According to (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
), the BM decoder can successfully decode this word as long as the number
of errors,
\begin_inset Formula $x$
\end_inset
, present in the 23 unerased symbols is 5 or less.
This means that the number of errors captured in the set of 40 erased symbols
must be at least 35.
\end_layout
\begin_layout Standard
The probability of selecting some particular number of bad symbols in a
randomly selected subset of the codeword symbols is governed by the hypergeomet
ric probability distribution.
\end_layout
\begin_layout Standard
Define:
\end_layout
\begin_layout Itemize
\begin_inset Formula $n$
\end_inset
= number of symbols in a codeword (63 for JT65),
\end_layout
\begin_layout Itemize
\begin_inset Formula $X$
\end_inset
= number of incorrect symbols in a codeword,
\end_layout
\begin_layout Itemize
\begin_inset Formula $n_{e}$
\end_inset
= number of symbols erased for errors-and-erasures decoding,
\end_layout
\begin_layout Itemize
\begin_inset Formula $x$
\end_inset
= number of incorrect symbols in the set of erased symbols.
\end_layout
\begin_layout Standard
In an ensemble of received words,
\begin_inset Formula $X$
\end_inset
and
\begin_inset Formula $x$
\end_inset
will be random variables.
Let
\begin_inset Formula $P(x|(X,n_{e}))$
\end_inset
denote the conditional probability mass function for the number of incorrect
symbols,
\begin_inset Formula $x$
\end_inset
, given that the number of incorrect symbols in the codeword is X and the
number of erased symbols is
\begin_inset Formula $n_{e}$
\end_inset
.
Then
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
P(x|(X,n_{e}))=\frac{\binom{X}{x}\binom{n-X}{n_{e}-x}}{\binom{n}{n_{e}}}\label{eq:hypergeometric_pdf}
\end{equation}
\end_inset
where
\begin_inset Formula $\binom{n}{k}=\frac{n!}{k!(n-k)!}$
\end_inset
is the binomial coefficient.
The binomial coefficient can be calculated using the
\begin_inset Quotes eld
\end_inset
nchoosek(
\begin_inset Formula $n,k$
\end_inset
)
\begin_inset Quotes erd
\end_inset
function in Gnu Octave.
The hypergeometric probability mass function defined in (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:hypergeometric_pdf"
\end_inset
) is available in Gnu Octave as function
\begin_inset Quotes eld
\end_inset
hygepdf(
\begin_inset Formula $x,n,X,n_{e}$
\end_inset
)
\begin_inset Quotes erd
\end_inset
.
\end_layout
\begin_layout Paragraph
Case 1
\end_layout
\begin_layout Case
A codeword contains
\begin_inset Formula $X=40$
\end_inset
incorrect symbols.
In an attempt to decode using an errors-and-erasures decoder,
\begin_inset Formula $n_{e}=40$
\end_inset
symbols are randomly selected for erasure.
The probability that
\begin_inset Formula $35$
\end_inset
of the erased symbols are incorrect is:
\begin_inset Formula
\[
P(x=35)=\frac{\binom{40}{35}\binom{63-40}{40-35}}{\binom{63}{40}}=2.356\times10^{-7}.
\]
\end_inset
Similarly:
\begin_inset Formula
\[
P(x=36)=8.610\times10^{-9}.
\]
\end_inset
Since the probability of catching 36 errors is so much smaller than the
probability of catching 35 errors, it is safe to say that the probability
of randomly selecting an erasure vector that can decode the received word
is essentially equal to
\begin_inset Formula $P(X=35)\simeq2.4\times10^{-7}$
\end_inset
.
The odds of successfully decoding the word on the first try are about 1
in 4 million.
\end_layout
\begin_layout Paragraph
Case 2
\end_layout
\begin_layout Case
It is interesting to work out the best choice for the number of symbols
that should be selected at random for erasure if the goal is to maximize
the probability of successfully decoding the word.
By exhaustive search, it turns out that if
\begin_inset Formula $X=40$
\end_inset
, then the best strategy is to erase
\begin_inset Formula $n=45$
\end_inset
symbols, in which case the word will be decoded if the set of erased symbols
contains at least 37 errors.
With
\begin_inset Formula $n=63$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
,
\begin_inset Formula $n_{e}=45$
\end_inset
, then
\begin_inset Formula
\[
P(x\ge37)\simeq2\times10^{-6}.
\]
\end_inset
This probability is about 8 times higher than the probability of success
when only
\begin_inset Formula $40$
\end_inset
symbols were erased, and the odds of successfully decoding on the first
try are roughly 1 in 500,000.
\end_layout
\begin_layout Paragraph
Case 3
\end_layout
\begin_layout Case
Cases 1 and 2 illustrate the fact that a strategy that tries to guess which
symbols to erase is not going to be very successful unless we are prepared
to wait all day for an answer.
Consider a slight modification to the strategy that can tip the odds in
our favor.
Suppose that the codeword contains
\begin_inset Formula $X=40$
\end_inset
incorrect symbols, as before.
In this case it is known that 10 of the symbols are much more reliable
than the other 53 symbols.
The 10 most reliable symbols are all correct and these 10 symbols are protected
from erasure, i.e.
the set of erasures is chosen from the smaller set of 53 less reliable
symbols.
If
\begin_inset Formula $n_{e}=45$
\end_inset
symbols are chosen randomly from the set of
\begin_inset Formula $n=53$
\end_inset
least reliable symbols, it is still necessary for the erased symbols to
include at least 37 errors (as in Case 2).
In this case, with
\begin_inset Formula $n=53$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
,
\begin_inset Formula $n_{e}=45$
\end_inset
,
\begin_inset Formula $P(x\ge37)=0.016$
\end_inset
! Now, the situation is much better.
The odds of decoding the word on the first try are approximately 1 in 62.5!
\end_layout
\begin_layout Standard
Even better odds are obtained with
\begin_inset Formula $n_{e}=47$
\end_inset
which requires
\begin_inset Formula $x\ge38$
\end_inset
.
With
\begin_inset Formula $n=53$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
,
\begin_inset Formula $n_{e}=47$
\end_inset
,
\begin_inset Formula $P(x\ge38)=0.0266$
\end_inset
, which makes the odds the best so far; about 1 in 38.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:The-decoding-algorithm"
\end_inset
The SFRSD decoding algorithm
\end_layout
\begin_layout Standard
Case 3 illustrates how, with the addition of some reliable information about
the quality of just 10 of the 63 symbols, it is possible to devise an algorithm
that can decode received words containing a relatively large number of
errors using only the BM errors-and-erasures decoder.
The key to improving the odds enough to make the strategy of
\begin_inset Quotes eld
\end_inset
guessing
\begin_inset Quotes erd
\end_inset
at the erasure vector useful for practical implementation is to use information
about the quality of the received symbols to decide which ones are most
likely to be in error.
In practice, because the number of errors in the received word is unknown,
rather than erase a fixed number of symbols, it is better use a stochastic
algorithm which assigns a relatively high probability of erasure to the
lowest quality symbols and a relatively low probability of erasure to the
highest quality symbols.
As illustrated by case 3, a good choice of the erasure probabilities can
increase the probability of a successful decode by many orders of magnitude
relative to a bad choice.
\end_layout
\begin_layout Standard
The SFRSD algorithm uses two quality indices available from the JT65 noncoherent
64-FSK demodulator to assign a variable probability of erasure to each
received symbol.
The demodulator identifies the most likely received symbol based on which
of 64 frequency bins contains the the largest signal plus noise power.
The percentage of the total signal plus noise power in the two bins containing
the largest and second largest powers (denoted by,
\begin_inset Formula $p_{1}$
\end_inset
and
\begin_inset Formula $p_{2}$
\end_inset
, respectively) are passed to the decoder from the demodulator as
\begin_inset Quotes eld
\end_inset
soft-symbol
\begin_inset Quotes erd
\end_inset
information.
The decoder derives two metrics from
\begin_inset Formula $\{p_{1},p_{2}\}:$
\end_inset
\end_layout
\begin_layout Itemize
\begin_inset Formula $p_{1}$
\end_inset
-rank: the rank
\begin_inset Formula $\{1,2,\ldots,63\}$
\end_inset
of the symbol's power percentage,
\begin_inset Formula $p_{1}$
\end_inset
in the sorted list of
\begin_inset Formula $p_{1}$
\end_inset
values.
High ranking symbols have larger signal to noise ratio than lower ranked
symbols.
\end_layout
\begin_layout Itemize
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
: when
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
is not small compared to 1, the most likely symbol is not much better than
the second most likely symbol
\end_layout
\begin_layout Standard
The decoder has a built-in table of symbol error probabilities derived from
a large dataset of received words that have been successfully decoded.
The table provides an estimate of the
\emph on
a-priori
\emph default
probability of symbol error that is expected based on a given symbol's
\begin_inset Formula $p_{1}$
\end_inset
-rank and
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
metrics.
These
\emph on
a-priori
\emph default
symbol error probabilities will be close to 1 for lower-quality symbols
and closer to 0 for high-quality symbols.
Recall, from Cases 2 and 3, that the best performance was obtained when
\begin_inset Formula $n_{e}>X$
\end_inset
.
Correspondingly, the SFRSD algorithm works best when the probability of
erasing a symbol is somewhat larger than the probability that the symbol
is incorrect.
Empirically, it was determined that good performance of the SFRSD algorithm
is obtained when the symbol erasure probability is a factor of
\begin_inset Formula $1.3$
\end_inset
larger than the symbol error probability.
\end_layout
\begin_layout Standard
The SFRSD algorithm successively tries to decode the received word using
educated guesses at the symbols that should be erased.
In each iteration, an independent stochastic erasure vector is generated
based on the symbol erasure probabilities.
The guessed erasure vector is provided to the BM decoder along with the
received word.
If the BM decoder finds a candidate codeword, then the codeword is assigned
a quality metric, defined to be the soft distance,
\begin_inset Formula $d_{s}$
\end_inset
, between the received word and the codeword, where
\begin_inset Formula
\begin{equation}
d_{s}=\sum_{i=1}^{n}(1+p_{1,i})\alpha_{i}.\label{eq:soft_distance}
\end{equation}
\end_inset
and
\begin_inset Formula $p_{1,i}$
\end_inset
is the fractional power associated with the i'th received symbol and
\begin_inset Formula $\alpha_{i}=0$
\end_inset
if the i'th received symbol is the same as the corresponding symbol in
the codeword, and
\begin_inset Formula $\alpha_{i}=1$
\end_inset
if the i'th symbol in the received word and the codeword are different.
This soft distance can be written as two terms, the first of which is just
the Hamming distance between the received word and the codeword.
The second term ensures that if two candidate codewords have the same Hamming
distance from the received word, a smaller distance will be assigned to
the one where the different symbols occurred in lower quality symbols.
\end_layout
\begin_layout Standard
Technically, the algorithm is a list-decoder, potentially generating a list
of candidate codewords.
Among the list of candidate codewords found by this stochastic search algorithm
, only the one with the smallest soft-distance from the received word is
kept.
As with all such algorithms, a stopping criterion is necessary.
SFRSD accepts a codeword unconditionally if its soft distance is smaller
than an empirically determined acceptance threshold,
\begin_inset Formula $d_{a}$
\end_inset
.
A timeout is employed to limit the execution time of the algorithm in cases
where no codewords within soft distance
\begin_inset Formula $d_{a}$
\end_inset
of the received word are found in a reasonable number of trials.
\end_layout
\begin_layout Paragraph
Algorithm
\end_layout
\begin_layout Enumerate
For each symbol in the received word, define the erasure probability to
be 1.3 times the a priori symbol-error probability determined by the soft-symbol
information
\begin_inset Formula $\{p_{1}\textrm{-rank},p_{2}/p_{1}\}$
\end_inset
.
\end_layout
\begin_layout Enumerate
Make independent decisions about whether or not to erase each symbol in
the word using the symbol's erasure probability.
Allow a total of up to 51 symbols to be erased.
\end_layout
\begin_layout Enumerate
Attempt BM errors-and-erasures decoding with the set of erased symbols that
was determined in step 2.
If the BM decoder is successful go to step 5.
\end_layout
\begin_layout Enumerate
If decoding is not successful, go to step 2.
\end_layout
\begin_layout Enumerate
Calculate the soft distance,
\begin_inset Formula $d_{s}$
\end_inset
, between the candidate codeword and the received word.
Set
\begin_inset Formula $d_{s,min}=d_{s}$
\end_inset
if the soft distance is the smallest one encountered so far.
\end_layout
\begin_layout Enumerate
If
\begin_inset Formula $d_{s,min}\le d_{a}$
\end_inset
, go to 8.
\end_layout
\begin_layout Enumerate
If the number of trials is less than the maximum allowed number, go to 2.
Otherwise, declare decoding failure and exit.
\end_layout
\begin_layout Enumerate
A codeword with
\begin_inset Formula $d_{s}\le d_{a}$
\end_inset
has been found.
Declare a successful decode.
Return the best codeword found so far.
\end_layout
\begin_layout Section
Results
\end_layout
\begin_layout Section
Summary
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
key "key-1"
\end_inset
\end_layout
\end_body
\end_document