WSJT-X/lib/sfrsd2/sfrsd_paper/sfrsd.lyx
Joe Taylor 66946ca3dd Correct the numerical value for P(x>=37) = 1.9e-6. Fix a typo.
git-svn-id: svn+ssh://svn.code.sf.net/p/wsjt/wsjt/branches/wsjtx@6211 ab8295b8-cf94-4d9e-aec4-7959e3be5d79
2015-12-01 19:17:50 +00:00

1002 lines
23 KiB
Plaintext

#LyX 2.1 created this file. For more info see http://www.lyx.org/
\lyxformat 474
\begin_document
\begin_header
\textclass paper
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman default
\font_sans default
\font_typewriter default
\font_math auto
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize 12
\spacing onehalf
\use_hyperref false
\papersize default
\use_geometry true
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 1in
\topmargin 1in
\rightmargin 1in
\bottommargin 1in
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
A stochastic successive erasures soft-decision decoder for the JT65 (63,12)
Reed-Solomon code
\end_layout
\begin_layout Author
Steven J.
Franke, K9AN and Joseph H.
Taylor, K1JT
\end_layout
\begin_layout Abstract
The JT65 mode has revolutionized amateur-radio weak-signal communication
by enabling amateur radio operators with small antennas and relatively
low-power transmitters to communicate over propagation paths not usable
with traditional technologies.
A major reason for the success and popularity of JT65 is its use of a strong
error-correction code: a short block-length, low-rate Reed-Solomon code
based on a 64-symbol alphabet.
Since 2004, most JT65 decoders have used the patented Koetter-Vardy (KV)
algebraic soft-decision decoder, licensed to K1JT and implemented in a
closed-source program for use in amateur radio applications.
We describe here a new open-source alternative called the FT algotithm.
It is conceptually simple, built around the well-known Berlekamp-Massey
errors-and-erasures algorithm, and performs at least as well as the KV
decoder.
\end_layout
\begin_layout Section
Introduction
\end_layout
\begin_layout Standard
JT65 message frames consist of a short compressed message encoded for transmissi
on with a Reed-Solomon code.
Reed-Solomon codes are block codes characterized by
\begin_inset Formula $n$
\end_inset
, the length of their codewords,
\begin_inset Formula $k$
\end_inset
, the number of message symbols conveyed by the codeword, and the number
of possible values for each symbol in the codewords.
The codeword length and the number of message symbols are specified with
the notation
\begin_inset Formula $(n,k)$
\end_inset
.
JT65 uses a (63,12) Reed-Solomon code with 64 possible values for each
symbol.
Each of the 12 message symbols represents
\begin_inset Formula $\log_{2}64=6$
\end_inset
message bits.
The source-encoded messages conveyed by a 63-symbol JT65 frame thus consist
of 72 bits.
The JT65 code is systematic, which means that the 12 message symbols are
embedded in the codeword without modification and another 51 parity symbols
derived from the message symbols are added to form a codeword of 63 symbols.
\end_layout
\begin_layout Standard
The concept of Hamming distance is used as a measure of
\begin_inset Quotes eld
\end_inset
distance
\begin_inset Quotes erd
\end_inset
between different codewords, or between a received word and a codeword.
Hamming distance is the number of code symbols that differ in the two words
being compared.
Reed-Solomon codes have minimum Hamming distance
\begin_inset Formula $d$
\end_inset
, where
\begin_inset Formula
\begin{equation}
d=n-k+1.\label{eq:minimum_distance}
\end{equation}
\end_inset
The minimum Hamming distance of the JT65 code is
\begin_inset Formula $d=52$
\end_inset
, which means that any particular codeword differs from all other codewords
in at least 52 symbol positions.
\end_layout
\begin_layout Standard
Given a received word containing some incorrect symbols (errors), the received
word can be decoded into the correct codeword using a deterministic, algebraic
algorithm provided that no more than
\begin_inset Formula $t$
\end_inset
symbols were received incorrectly, where
\begin_inset Formula
\begin{equation}
t=\left\lfloor \frac{n-k}{2}\right\rfloor .\label{eq:t}
\end{equation}
\end_inset
For the JT65 code,
\begin_inset Formula $t=25$
\end_inset
, so it is always possible to efficiently decode a received word having
no more than 25 symbol errors.
Any one of several well-known algebraic algorithms, such as the widely
used Berlekamp-Massey (BM) algorithm, can carry out the decoding.
Two steps are ncessarily involved in this process, namely
\end_layout
\begin_layout Enumerate
Determine which symbols were received incorrectly.
\end_layout
\begin_layout Enumerate
Find the correct value of the incorrect symbols.
\end_layout
\begin_layout Standard
If we somehow know that certain symbols are incorrect, this information
can be used to reduce the work involved in step 1 and allow step 2 to correct
more than
\begin_inset Formula $t$
\end_inset
errors.
In the unlikely event that the location of every error is known and if
no correct symbols are accidentally labeled as errors, the BM algorithm
can correct up to
\begin_inset Formula $d$
\end_inset
errors.
\end_layout
\begin_layout Standard
The FT algorithm creates lists of symbols suspected of being incorrect and
sends them to the BM decoder.
Symbols flagged in this way are called
\begin_inset Quotes eld
\end_inset
erasures,
\begin_inset Quotes erd
\end_inset
while other incorrect symbols will be called
\begin_inset Quotes eld
\end_inset
errors.
\begin_inset Quotes erd
\end_inset
As already noted, with perfect erasure information up to 51 errors can
be corrected.
Imperfect erasure information means that some erased symbols may be correct,
and some other symbols in error.
If
\begin_inset Formula $s$
\end_inset
symbols are erased and the remaining
\begin_inset Formula $n-s$
\end_inset
symbols contain
\begin_inset Formula $e$
\end_inset
errors, the BM algorithm can find the correct codeword as long as
\begin_inset Formula
\begin{equation}
s+2e\le d-1.\label{eq:erasures_and_errors}
\end{equation}
\end_inset
If
\begin_inset Formula $s=0$
\end_inset
, the decoder is said to be an
\begin_inset Quotes eld
\end_inset
errors-only
\begin_inset Quotes erd
\end_inset
decoder.
If
\begin_inset Formula $0<s\le d-1$
\end_inset
(
\begin_inset Formula $d-1=51$
\end_inset
for JT65), the decoder is called an
\begin_inset Quotes eld
\end_inset
errors-and-erasures
\begin_inset Quotes erd
\end_inset
decoder.
The possibility of doing errors-and-erasures decoding lies at the heart
of the FT algorithm.
On that foundation we have built a capability for using
\begin_inset Quotes eld
\end_inset
soft
\begin_inset Quotes erd
\end_inset
information on symbol reliability, thereby producing a soft-decision decoder.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:You've-got-to"
\end_inset
Do I feel lucky?
\end_layout
\begin_layout Standard
The FT algorithm uses the estimated quality of received symbols to generate
lists of symbols considered likely to be in error, thus enabling reliable
decoding of received words with more than 25 errors.
As a specific example, consider a received JT65 word with 23 correct symbols
and 40 errors.
We do not know which symbols are in error.
Suppose that the decoder randomly selects
\begin_inset Formula $s=40$
\end_inset
symbols for erasure, leaving 23 unerased symbols.
According to Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
), the BM decoder can successfully decode this word as long as
\begin_inset Formula $e$
\end_inset
, the number of errors present in the 23 unerased symbols, is 5 or less.
The number of errors captured in the set of 40 erased symbols must therefore
be at least 35.
\end_layout
\begin_layout Standard
The probability of selecting some particular number of incorrect symbols
in a randomly selected subset of received symbols is governed by the hypergeome
tric probability distribution.
Let us define
\begin_inset Formula $N$
\end_inset
as the number of symbols from which erasures will be selected,
\begin_inset Formula $X$
\end_inset
as the number of incorrect symbols in the received set, and
\begin_inset Formula $x$
\end_inset
as the number of errors in the erased symbols.
In an ensemble of many received words,
\begin_inset Formula $X$
\end_inset
and
\begin_inset Formula $x$
\end_inset
will be random variables.
The conditional probability mass function for
\begin_inset Formula $x$
\end_inset
given stated values of
\begin_inset Formula $N$
\end_inset
,
\begin_inset Formula $X$
\end_inset
, and
\begin_inset Formula $s$
\end_inset
may be written as
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
P(x|N,X,s)=\frac{\binom{X}{x}\binom{N-X}{s-x}}{\binom{N}{s}}\label{eq:hypergeometric_pdf}
\end{equation}
\end_inset
where
\begin_inset Formula $\binom{n}{k}=\frac{n!}{k!(n-k)!}$
\end_inset
is the binomial coefficient.
The binomial coefficient can be calculated using the function
\begin_inset Quotes eld
\end_inset
nchoosek(
\begin_inset Formula $n,k$
\end_inset
)
\begin_inset Quotes erd
\end_inset
in the interpreted language GNU Octave.
The hypergeometric probability mass function defined in Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:hypergeometric_pdf"
\end_inset
) is available in GNU Octave as function
\begin_inset Quotes eld
\end_inset
hygepdf(
\begin_inset Formula $x,N,X,s$
\end_inset
)
\begin_inset Quotes erd
\end_inset
.
\end_layout
\begin_layout Paragraph
Example 1:
\end_layout
\begin_layout Standard
Suppose a codeword contains
\begin_inset Formula $X=40$
\end_inset
incorrect symbols.
In an attempt to decode using an errors-and-erasures decoder,
\begin_inset Formula $s=40$
\end_inset
symbols are randomly selected for erasure from the full set of
\begin_inset Formula $N=n=63$
\end_inset
symbols.
The probability that
\begin_inset Formula $x=35$
\end_inset
of the erased symbols are actually incorrect is then
\begin_inset Formula
\[
P(x=35)=\frac{\binom{40}{35}\binom{63-40}{40-35}}{\binom{63}{40}}\simeq2.4\times10^{-7}.
\]
\end_inset
Similarly, the probability that
\begin_inset Formula $x=36$
\end_inset
of the erased symbols are incorrect is
\begin_inset Formula
\[
P(x=36)\simeq8.6\times10^{-9}.
\]
\end_inset
Since the probability of erasing 36 errors is so much smaller than the probabili
ty of erasing 35 errors, we may safely conclude that the probability of
randomly choosing an erasure vector that can decode the received word is
approximately
\begin_inset Formula $P(x=35)\simeq2.4\times10^{-7}$
\end_inset
.
The odds of successfully decoding the word on the first try are very poor,
about 1 in 4 million.
\end_layout
\begin_layout Paragraph
Example 2:
\end_layout
\begin_layout Standard
How might we best choose the number of symbols to erase, in order to maximize
the probability of successful decoding? By exhaustive search over all possible
values up to
\begin_inset Formula $s=51$
\end_inset
, it turns out that for
\begin_inset Formula $X=40$
\end_inset
the best strategy is to erase
\begin_inset Formula $s=45$
\end_inset
symbols.
Decoding will then be assured if the set of erased symbols contains at
least 37 errors, and with
\begin_inset Formula $N=63$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=45$
\end_inset
, the probability of successful decode in a single try is
\begin_inset Formula
\[
P(x\ge37)\simeq1.9\times10^{-6}.
\]
\end_inset
This probability is about 8 times higher than the probability of success
when only 40 symbols were erased.
Nevertheless, the odds of successfully decoding on the first try are still
only about 1 in 500,000.
\end_layout
\begin_layout Paragraph
Example 3:
\end_layout
\begin_layout Standard
Examples 1 and 2 show that a random strategy for selecting symbols to erase
is unlikely to be successful unless we are prepared to wait a long time
for an answer.
So let's modify the strategy to tip the odds in our favor.
Let the received word contain
\begin_inset Formula $X=40$
\end_inset
incorrect symbols, as before, but suppose we know that 10 symbols are significa
ntly more reliable than the other 53.
We might therefore protect the 10 most reliable symbols from erasure, selecting
erasures from the smaller set of
\begin_inset Formula $N=53$
\end_inset
less reliable symbols.
If
\begin_inset Formula $s=45$
\end_inset
symbols are chosen randomly in this way, it is still necessary for the
erased symbols to include at least 37 errors, as in Example 2.
However, the probabilities are now much more favorable: with
\begin_inset Formula $N=53$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=45$
\end_inset
, Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:hypergeometric_pdf"
\end_inset
) yields
\begin_inset Formula $P(x\ge37)=0.016$
\end_inset
.
Even better odds are obtained by choosing
\begin_inset Formula $s=47$
\end_inset
, which requires
\begin_inset Formula $x\ge38$
\end_inset
.
With
\begin_inset Formula $N=53$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=47$
\end_inset
,
\begin_inset Formula $P(x\ge38)=0.0266$
\end_inset
.
The odds for successful decoding on the first try are now about 1 in 38.
A few hundred independently randomized tries would be enough to all-but-guarant
ee production of a valid codeword by the BM decoder.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:The-decoding-algorithm"
\end_inset
The FT decoding algorithm
\end_layout
\begin_layout Standard
Example 3 shows how reliable information about symbol quality should make
it possible to decode received frames having a large number of errors.
In practice the number of errors in the received word is unknown, so we
use a stochastic algorithm to assign high erasure probability to low-quality
symbols and relatively low probability to high-quality symbols.
As illustrated by Example 3, a good choice of these probabilities can increase
the chance of a successful decode by many orders of magnitude.
\end_layout
\begin_layout Standard
The FT algorithm uses two quality indices made available by a noncoherent
64-FSK demodulator.
The demodulator identifies the most likely value for each symbol based
on the largest signal-plus-noise power in 64 frequency bins.
The fraction of total power in the two bins containing the largest and
second-largest powers (denoted by
\begin_inset Formula $p_{1}$
\end_inset
and
\begin_inset Formula $p_{2}$
\end_inset
, respectively) are passed to the decoder from the demodulator as
\begin_inset Quotes eld
\end_inset
soft-symbol
\begin_inset Quotes erd
\end_inset
information.
The decoder derives two metrics from
\begin_inset Formula $\{p_{1},p_{2}\}:$
\end_inset
\end_layout
\begin_layout Itemize
\begin_inset Formula $p_{1}$
\end_inset
-rank: the rank
\begin_inset Formula $\{1,2,\ldots,63\}$
\end_inset
of the symbol's fractional power,
\begin_inset Formula $p_{1}$
\end_inset
in the sorted list of
\begin_inset Formula $p_{1}$
\end_inset
values.
High ranking symbols have larger signal-to-noise ratio than those with
lower rank.
\end_layout
\begin_layout Itemize
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
: when
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
is not small compared to 1, the most likely symbol value is only slightly
more reliable than the second most likely one.
\end_layout
\begin_layout Standard
The FT decoder uses a table of symbol error probabilities derived from a
large dataset of received words that have been successfully decoded.
The table provides an estimate of the
\emph on
a-priori
\emph default
probability of symbol error based on a given symbol's
\begin_inset Formula $p_{1}$
\end_inset
-rank and
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
metrics.
These probabilities are close to 1 for low-quality symbols and close to
0 for high-quality symbols.
Recall from Examples 2 and 3 that best performance was obtained with
\begin_inset Formula $s>X$
\end_inset
.
Correspondingly, the FT algorithm works best when the probability of erasing
a symbol is somewhat larger than the probability that the symbol is incorrect.
We found empirically that good decoding performance is obtained when the
symbol erasure probability is about 1.3 times the symbol error probability.
\end_layout
\begin_layout Standard
The FT algorithm tries successively to decode the received word using independen
t
\begin_inset Quotes eld
\end_inset
educated guesses
\begin_inset Quotes erd
\end_inset
to select symbols for erasure.
For each iteration a stochastic erasure vector is generated based on the
symbol erasure probabilities.
The erasure vector is sent to the BM decoder along with the full set of
63 received symbols.
When the BM decoder finds a candidate codeword it is assigned a quality
metric
\begin_inset Formula $d_{s}$
\end_inset
defined as the soft distance between the received word and the codeword,
where
\begin_inset Formula
\begin{equation}
d_{s}=\sum_{i=1}^{n}\alpha_{i}\,(1+p_{1,i}).\label{eq:soft_distance}
\end{equation}
\end_inset
Here
\begin_inset Formula $\alpha_{i}=0$
\end_inset
if received symbol
\begin_inset Formula $i$
\end_inset
is the same as the corresponding symbol in the codeword,
\begin_inset Formula $\alpha_{i}=1$
\end_inset
if the received symbol and codeword symbol are different, and
\begin_inset Formula $p_{1,i}$
\end_inset
is the fractional power associated with received symbol
\begin_inset Formula $i$
\end_inset
.
Think of the soft distance as made up of two terms: the first is the Hamming
distance between the received word and the codeword, and the second ensures
that if two candidate codewords have the same Hamming distance from the
received word, a smaller soft distance will be assigned to the one where
differences occur in symbols of lower estimated reliability.
\end_layout
\begin_layout Standard
Technically the FT algorithm is a list decoder, potentially generating a
list of candidate codewords.
Among the list of candidate codewords found by the stochastic search algorithm,
only the one with the smallest soft distance from the received word is
retained.
As with all such algorithms, a stopping criterion is necessary.
FT accepts a codeword unconditionally if its soft distance is smaller than
an empirically determined acceptance threshold,
\begin_inset Formula $d_{a}$
\end_inset
.
A timeout is used to limit the algorithm's execution time if no codewords
within soft distance
\begin_inset Formula $d_{a}$
\end_inset
of the received word are found in a reasonable number of trials.
\end_layout
\begin_layout Paragraph
Algorithm pseudo-code:
\end_layout
\begin_layout Enumerate
For each received symbol, define the erasure probability as 1.3 times the
\emph on
a priori
\emph default
symbol-error probability determined from soft-symbol information
\begin_inset Formula $\{p_{1}\textrm{-rank},\,p_{2}/p_{1}\}$
\end_inset
.
\end_layout
\begin_layout Enumerate
Make independent stochastic decisions about whether to erase each symbol
by using the symbol's erasure probability, allowing a maximum of 51 erasures.
\end_layout
\begin_layout Enumerate
Attempt errors-and-erasures decoding by using the BM algorithm and the set
of eraseures determined in step 2.
If the BM decoder is successful go to step 5.
\end_layout
\begin_layout Enumerate
If decoding is not successful, go to step 2.
\end_layout
\begin_layout Enumerate
Calculate the soft distance
\begin_inset Formula $d_{s}$
\end_inset
between the candidate codeword and the received symbols.
Set
\begin_inset Formula $d_{s,min}=d_{s}$
\end_inset
if the soft distance is the smallest one encountered so far.
\end_layout
\begin_layout Enumerate
If
\begin_inset Formula $d_{s,min}\le d_{a}$
\end_inset
, go to 8.
\end_layout
\begin_layout Enumerate
If the number of trials is less than the maximum allowed number, go to 2.
Otherwise, declare decoding failure and exit.
\end_layout
\begin_layout Enumerate
A
\begin_inset Quotes eld
\end_inset
best
\begin_inset Quotes erd
\end_inset
codeword with
\begin_inset Formula $d_{s,min}\le d_{a}$
\end_inset
has been found.
Declare a successful decode and return this codeword .
\end_layout
\begin_layout Section
Results and Comparison with KVASD
\end_layout
\begin_layout Standard
Possible figures:
\end_layout
\begin_layout Itemize
histogram of
\begin_inset Formula $s$
\end_inset
(number of erasures) for successful decodes with HF and EME data
\end_layout
\begin_layout Itemize
histogram of
\begin_inset Quotes eld
\end_inset
ntrials
\begin_inset Quotes erd
\end_inset
(or execution time)
\end_layout
\begin_layout Itemize
Number of decodes vs.
ntrials
\end_layout
\begin_layout Itemize
Probability of successful decode vs.
Es/No or S/N in 2500 Hz BW
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_psuccess.pdf
lyxscale 120
scale 120
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
Percentage of JT65 messages successfully decoded as a function of SNR in
2.5 kHz bandwidth.
Results are shown for the hard-decision Berlekamp-Massey (BM) and soft-decision
Franke-Taylor (FT) decoding algorithms.
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\begin_layout Itemize
other...
?
\end_layout
\begin_layout Section
Summary
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
key "key-1"
\end_inset
\end_layout
\end_body
\end_document