WSJT-X/lib/ftrsd/ftrsd_paper/ftrsd.lyx
Steven Franke 509cb9efd6 Add ftdata-100 and ftdata-10 and update fig_wer2 and associated text.
git-svn-id: svn+ssh://svn.code.sf.net/p/wsjt/wsjt/branches/wsjtx@6322 ab8295b8-cf94-4d9e-aec4-7959e3be5d79
2015-12-28 21:20:10 +00:00

2028 lines
44 KiB
Plaintext

#LyX 2.1 created this file. For more info see http://www.lyx.org/
\lyxformat 474
\begin_document
\begin_header
\textclass paper
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman default
\font_sans default
\font_typewriter default
\font_math auto
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\float_placement H
\paperfontsize 12
\spacing onehalf
\use_hyperref false
\papersize default
\use_geometry true
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 1in
\topmargin 1in
\rightmargin 1in
\bottommargin 1in
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
Open Source Soft-Decision Decoder for the JT65 (63,12) Reed-Solomon code
\end_layout
\begin_layout Author
Steven J.
Franke, K9AN and Joseph H.
Taylor, K1JT
\end_layout
\begin_layout Standard
\begin_inset CommandInset toc
LatexCommand tableofcontents
\end_inset
\end_layout
\begin_layout Abstract
The JT65 protocol has revolutionized amateur-radio weak-signal communication
by enabling amateur radio operators with small antennas and relatively
low-power transmitters to communicate over propagation paths not usable
with traditional technologies.
A major reason for the success and popularity of JT65 is its use of a strong
error-correction code: a short block-length, low-rate Reed-Solomon code
based on a 64-symbol alphabet.
Since 2004, most programs implementing JT65 have used the patented Koetter-Vard
y (KV) algebraic soft-decision decoder, licensed to K1JT and implemented
in a closed-source program for use in amateur radio applications.
We describe here a new open-source alternative called the Franke-Taylor
(FT, or K9AN-K1JT) algorithm.
It is conceptually simple, built around the well-known Berlekamp-Massey
errors-and-erasures algorithm, and in this application it performs even
better than the KV decoder.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Introduction-and-Motivation"
\end_inset
Introduction and Motivation
\end_layout
\begin_layout Standard
The following paragraph may not belong here - feel free to get rid of it,
change it, whatever.
\end_layout
\begin_layout Standard
The Franke-Taylor (FT) decoder is a probabilistic list-decoder that we have
developed for use in the short block-length, low-rate Reed-Solomon code
used in JT65.
JT65 provides a unique sandbox for playing with decoding algorithms.
Several seconds are available for decoding a single 63-symbol message.
This is a long time! The luxury of essentially unlimited time allows us
to experiment with decoders that have high computational complexity.
The payoff is that we can extend the decoding threshold by many dB over
the hard-decision, Berlekamp-Massey decoder on a typical fading channel,
and by a meaningful amount over the KV decoder, long considered to be the
best available soft-decision decoder.
In addition to its excellent performance, the FT algorithm has other desirable
properties, not the least of which is its conceptual simplicity.
Decoding performance and complexity scale in a useful way, providing steadily
increasing soft-decision decoding gain as a tunable computational complexity
parameter is increased over more than 5 orders of magnitude.
This means that appreciable gain should be available from our decoder even
on very simple (and slow) computers.
On the other hand, because the algorithm requires a large number of independent
decoding trials, it should be possible to obtain significant performance
gains through parallelization on high-performance computers.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:JT65-messages-and"
\end_inset
JT65 messages and Reed Solomon Codes
\end_layout
\begin_layout Standard
JT65 message frames consist of a short compressed message encoded for transmissi
on with a Reed-Solomon code.
Reed-Solomon codes are block codes characterized by
\begin_inset Formula $n$
\end_inset
, the length of their codewords,
\begin_inset Formula $k$
\end_inset
, the number of message symbols conveyed by the codeword, and the number
of possible values for each symbol in the codewords.
The codeword length and the number of message symbols are specified with
the notation
\begin_inset Formula $(n,k)$
\end_inset
.
JT65 uses a (63,12) Reed-Solomon code with 64 possible values for each
symbol.
Each of the 12 message symbols represents
\begin_inset Formula $\log_{2}64=6$
\end_inset
message bits.
The source-encoded messages conveyed by a 63-symbol JT65 frame thus consist
of 72 information bits.
The JT65 code is systematic, which means that the 12 message symbols are
embedded in the codeword without modification and another 51 parity symbols
derived from the message symbols are added to form a codeword of 63 symbols.
\end_layout
\begin_layout Standard
The concept of Hamming distance is used as a measure of
\begin_inset Quotes eld
\end_inset
distance
\begin_inset Quotes erd
\end_inset
between different codewords, or between a received word and a codeword.
Hamming distance is the number of code symbols that differ in two words
being compared.
Reed-Solomon codes have minimum Hamming distance
\begin_inset Formula $d$
\end_inset
, where
\begin_inset Formula
\begin{equation}
d=n-k+1.\label{eq:minimum_distance}
\end{equation}
\end_inset
The minimum Hamming distance of the JT65 code is
\begin_inset Formula $d=52$
\end_inset
, which means that any particular codeword differs from all other codewords
in at least 52 symbol positions.
\end_layout
\begin_layout Standard
Given a received word containing some incorrect symbols (errors), the received
word can be decoded into the correct codeword using a deterministic, algebraic
algorithm provided that no more than
\begin_inset Formula $t$
\end_inset
symbols were received incorrectly, where
\begin_inset Formula
\begin{equation}
t=\left\lfloor \frac{n-k}{2}\right\rfloor .\label{eq:t}
\end{equation}
\end_inset
For the JT65 code
\begin_inset Formula $t=25$
\end_inset
, so it is always possible to decode a received word having 25 or fewer
symbol errors.
Any one of several well-known algebraic algorithms, such as the widely
used Berlekamp-Massey (BM) algorithm, can carry out the decoding.
Two steps are necessarily involved in this process.
We must (1) determine which symbols were received incorrectly, and (2)
find the correct value of the incorrect symbols.
If we somehow know that certain symbols are incorrect, that information
can be used to reduce the work involved in step 1 and allow step 2 to correct
more than
\begin_inset Formula $t$
\end_inset
errors.
In the unlikely event that the location of every error is known and if
no correct symbols are accidentally labeled as errors, the BM algorithm
can correct up to
\begin_inset Formula $d-1=n-k$
\end_inset
errors.
\end_layout
\begin_layout Standard
The FT algorithm creates lists of symbols suspected of being incorrect and
sends them to the BM decoder.
Symbols flagged in this way are called
\begin_inset Quotes eld
\end_inset
erasures,
\begin_inset Quotes erd
\end_inset
while other incorrect symbols will be called
\begin_inset Quotes eld
\end_inset
errors.
\begin_inset Quotes erd
\end_inset
With perfect erasure information up to 51 incorrect symbols can be corrected
for the JT65 code.
Imperfect erasure information means that some erased symbols may be correct,
and some other symbols in error.
If
\begin_inset Formula $s$
\end_inset
symbols are erased and the remaining
\begin_inset Formula $n-s$
\end_inset
symbols contain
\begin_inset Formula $e$
\end_inset
errors, the BM algorithm can find the correct codeword as long as
\begin_inset Formula
\begin{equation}
s+2e\le d-1.\label{eq:erasures_and_errors}
\end{equation}
\end_inset
If
\begin_inset Formula $s=0$
\end_inset
, the decoder is said to be an
\begin_inset Quotes eld
\end_inset
errors-only
\begin_inset Quotes erd
\end_inset
decoder.
If
\begin_inset Formula $0<s\le d-1$
\end_inset
, the decoder is called an
\begin_inset Quotes eld
\end_inset
errors-and-erasures
\begin_inset Quotes erd
\end_inset
decoder.
The possibility of doing errors-and-erasures decoding lies at the heart
of the FT algorithm.
On that foundation we have built a capability for using
\begin_inset Quotes eld
\end_inset
soft
\begin_inset Quotes erd
\end_inset
information on the reliability of individual symbols, thereby producing
a soft-decision decoder.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Statistical Framework"
\end_inset
Statistical Framework
\end_layout
\begin_layout Standard
The FT algorithm uses the estimated quality of received symbols to generate
lists of symbols considered likely to be in error, thus enabling decoding
of received words with more than 25 errors using the errors-and-erasures
capability of the BM decoder.
Algorithms of this type are generally called
\begin_inset Quotes eld
\end_inset
reliability based
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
probabilistic
\begin_inset Quotes erd
\end_inset
decoding methods
\begin_inset CommandInset citation
LatexCommand cite
after "Chapter 10"
key "key-1"
\end_inset
.
Such algorithms involve some amount of educating guessing about which received
symbols are in error or, alternatively, about which received symbols are
correct.
The guesses are informed by
\begin_inset Quotes eld
\end_inset
soft-symbol
\begin_inset Quotes erd
\end_inset
quality metrics associated with the received symbols.
To illustrate why it is absolutely essential to use such soft-symbol informatio
n in these algorithms it helps to consider what would happen if we tried
to use completely random guesses, ignoring any available soft-symbol informatio
n.
\end_layout
\begin_layout Standard
As a specific example, we will consider a received JT65 word with 23 correct
symbols and 40 errors.
We do not know which symbols are in error.
Suppose that the decoder randomly selects
\begin_inset Formula $s=40$
\end_inset
symbols for erasure, leaving 23 unerased symbols.
According to Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
), the BM decoder can successfully decode this word as long as
\begin_inset Formula $e$
\end_inset
, the number of errors present in the 23 unerased symbols, is 5 or less.
The number of errors captured in the set of 40 erased symbols must therefore
be at least 35.
\end_layout
\begin_layout Standard
The probability of selecting some particular number of incorrect symbols
in a randomly selected subset of received symbols is governed by the hypergeome
tric probability distribution.
Let us define
\begin_inset Formula $N$
\end_inset
as the number of symbols from which erasures will be selected,
\begin_inset Formula $X$
\end_inset
as the number of incorrect symbols in the set of
\begin_inset Formula $N$
\end_inset
symbols, and
\begin_inset Formula $x$
\end_inset
as the number of errors in the symbols actually erased.
In an ensemble of many received words
\begin_inset Formula $X$
\end_inset
and
\begin_inset Formula $x$
\end_inset
will be random variables but for this example we will assume that
\begin_inset Formula $X$
\end_inset
is known and that only
\begin_inset Formula $x$
\end_inset
is random.
The conditional probability mass function for
\begin_inset Formula $x$
\end_inset
with stated values of
\begin_inset Formula $N$
\end_inset
,
\begin_inset Formula $X$
\end_inset
, and
\begin_inset Formula $s$
\end_inset
may be written as
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
P(x=\epsilon|N,X,s)=\frac{\binom{X}{\epsilon}\binom{N-X}{s-\epsilon}}{\binom{N}{s}}\label{eq:hypergeometric_pdf}
\end{equation}
\end_inset
where
\begin_inset Formula $\binom{n}{k}=\frac{n!}{k!(n-k)!}$
\end_inset
is the binomial coefficient.
The binomial coefficient can be calculated using the function
\begin_inset Quotes eld
\end_inset
nchoosek(
\begin_inset Formula $n,k$
\end_inset
)
\begin_inset Quotes erd
\end_inset
in the interpreted language GNU Octave, or with one of many free online
calculators.
The hypergeometric probability mass function defined in Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:hypergeometric_pdf"
\end_inset
) is available in GNU Octave as function
\begin_inset Quotes eld
\end_inset
hygepdf(
\begin_inset Formula $x,N,X,s$
\end_inset
)
\begin_inset Quotes erd
\end_inset
.
The cumulative probability that at least
\begin_inset Formula $\epsilon$
\end_inset
errors are captured in a subset of
\begin_inset Formula $s$
\end_inset
erased symbols selected from a group of
\begin_inset Formula $N$
\end_inset
symbols containing
\begin_inset Formula $X$
\end_inset
errors is
\begin_inset Formula
\begin{equation}
P(x\ge\epsilon|N,X,s)=\sum_{j=\epsilon}^{s}P(x=j|N,X,s).\label{eq:cumulative_prob}
\end{equation}
\end_inset
\end_layout
\begin_layout Paragraph
Example 1:
\end_layout
\begin_layout Standard
Suppose a received word contains
\begin_inset Formula $X=40$
\end_inset
incorrect symbols.
In an attempt to decode using an errors-and-erasures decoder,
\begin_inset Formula $s=40$
\end_inset
symbols are randomly selected for erasure from the full set of
\begin_inset Formula $N=n=63$
\end_inset
symbols.
The probability that
\begin_inset Formula $x=35$
\end_inset
of the erased symbols are actually incorrect is then
\begin_inset Formula
\[
P(x=35)=\frac{\binom{40}{35}\binom{63-40}{40-35}}{\binom{63}{40}}\simeq2.4\times10^{-7}.
\]
\end_inset
Similarly, the probability that
\begin_inset Formula $x=36$
\end_inset
of the erased symbols are incorrect is
\begin_inset Formula
\[
P(x=36)\simeq8.6\times10^{-9}.
\]
\end_inset
Since the probability of erasing 36 errors is so much smaller than that
for erasing 35 errors, we may safely conclude that the probability of randomly
choosing an erasure vector that can decode the received word is approximately
\begin_inset Formula $P(x=35)\simeq2.4\times10^{-7}$
\end_inset
.
The odds of producing a valid codeword on the first try are very poor,
about 1 in 4 million.
\end_layout
\begin_layout Paragraph
Example 2:
\end_layout
\begin_layout Standard
How might we best choose the number of symbols to erase, in order to maximize
the probability of successful decoding? By exhaustive search over all possible
values up to
\begin_inset Formula $s=51$
\end_inset
, it turns out that for
\begin_inset Formula $X=40$
\end_inset
the best strategy is to erase
\begin_inset Formula $s=45$
\end_inset
symbols.
Decoding will then be assured if the set of erased symbols contains at
least 37 errors.
With
\begin_inset Formula $N=63$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=45$
\end_inset
, the probability of successful decode in a single try is
\begin_inset Formula
\[
P(x\ge37)\simeq1.9\times10^{-6}.
\]
\end_inset
This probability is about 8 times higher than the probability of success
when only 40 symbols were erased.
Nevertheless, the odds of successfully decoding on the first try are still
only about 1 in 500,000.
\end_layout
\begin_layout Paragraph
Example 3:
\end_layout
\begin_layout Standard
Examples 1 and 2 show that a random strategy for selecting symbols to erase
is unlikely to be successful unless we are prepared to wait a long time
for an answer.
So let's modify the strategy to tip the odds in our favor.
Let the received word contain
\begin_inset Formula $X=40$
\end_inset
incorrect symbols, as before, but suppose we know that 10 received symbols
are significantly more reliable than the other 53.
We might therefore protect the 10 most reliable symbols from erasure, selecting
erasures from the smaller set of
\begin_inset Formula $N=53$
\end_inset
less reliable symbols.
If
\begin_inset Formula $s=45$
\end_inset
symbols are chosen randomly for erasure in this way, it is still necessary
for the erased symbols to include at least 37 errors, as in Example 2.
However, the probabilities are now much more favorable: with
\begin_inset Formula $N=53$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=45$
\end_inset
, Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:hypergeometric_pdf"
\end_inset
) yields
\begin_inset Formula $P(x\ge37)=0.016$
\end_inset
.
Even better odds are obtained by choosing
\begin_inset Formula $s=47$
\end_inset
, which requires
\begin_inset Formula $x\ge38$
\end_inset
.
With
\begin_inset Formula $N=53$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=47$
\end_inset
,
\begin_inset Formula $P(x\ge38)=0.027$
\end_inset
.
The odds for producing a codeword on the first try are now about 1 in 38.
A few hundred independently randomized tries would be enough to all-but-guarant
ee production of a valid codeword by the BM decoder.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:The-decoding-algorithm"
\end_inset
The Franke-Taylor decoding algorithm
\end_layout
\begin_layout Standard
Example 3 shows how statistical information about symbol quality should
make it possible to decode received frames having a large number of errors.
In practice the number of errors in the received word is unknown, so we
use a stochastic algorithm to assign high erasure probability to low-quality
symbols and relatively low probability to high-quality symbols.
As illustrated by Example 3, a good choice of erasure probabilities can
increase by many orders of magnitude the chance of producing a codeword.
Note that at this stage we must treat any codeword obtained by errors-and-erasu
res decoding as no more than a
\emph on
candidate
\emph default
.
Our next task is to find a metric that can reliably select one of many
proffered candidates as the codeword actually transmitted.
\end_layout
\begin_layout Standard
The FT algorithm uses quality indices made available by a noncoherent 64-FSK
demodulator.
The demodulator computes the power spectrum
\begin_inset Formula $S(i,j)$
\end_inset
for each signalling interval; for the JT65 protocol
\begin_inset Formula $i=1,64$
\end_inset
is the frequency index and
\begin_inset Formula $j=1,63$
\end_inset
the symbol index.
The most likely value for symbol
\begin_inset Formula $j$
\end_inset
is taken as the frequency bin with largest signal-plus-noise power over
all values of
\begin_inset Formula $i$
\end_inset
.
The fractions of total power in the two bins containing the largest and
second-largest powers, denoted respectively by
\begin_inset Formula $p_{1}$
\end_inset
and
\begin_inset Formula $p_{2}$
\end_inset
, are passed from demodulator to decoder as soft-symbol information.
The FT decoder derives two metrics from
\begin_inset Formula $p_{1}$
\end_inset
and
\begin_inset Formula $p_{2}$
\end_inset
, namely
\end_layout
\begin_layout Itemize
\begin_inset Formula $p_{1}$
\end_inset
-rank: the rank
\begin_inset Formula $\{1,2,\ldots,63\}$
\end_inset
of the symbol's fractional power
\begin_inset Formula $p_{1,\, j}$
\end_inset
in a sorted list of
\begin_inset Formula $p_{1}$
\end_inset
values.
High ranking symbols have larger signal-to-noise ratio than those with
lower rank.
\end_layout
\begin_layout Itemize
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
: when
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
is not small compared to 1, the most likely symbol value is only slightly
more reliable than the second most likely one.
\end_layout
\begin_layout Standard
We use an empirical table of symbol error probabilities derived from a large
dataset of received words that were successfully decoded.
The table provides an estimate of the
\emph on
a priori
\emph default
probability of symbol error based on the
\begin_inset Formula $p_{1}$
\end_inset
-rank and
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
metrics.
These probabilities are close to 1 for low-quality symbols and close to
0 for high-quality symbols.
Recall from Examples 2 and 3 that candidate codewords are produced with
higher probability when
\begin_inset Formula $s>X$
\end_inset
.
Correspondingly, the FT algorithm works best when the probability of erasing
a symbol is somewhat larger than the probability that the symbol is incorrect.
We found empirically that good decoding performance is obtained when the
symbol erasure probability is about 1.3 times the symbol error probability.
\end_layout
\begin_layout Standard
The FT algorithm tries successively to decode the received word using independen
t
\begin_inset Quotes eld
\end_inset
educated guesses
\begin_inset Quotes erd
\end_inset
to select symbols for erasure.
For each iteration a stochastic erasure vector is generated based on the
symbol erasure probabilities.
The erasure vector is sent to the BM decoder along with the full set of
63 hard-decision symbol values.
When the BM decoder finds a candidate codeword it is assigned a quality
metric
\begin_inset Formula $d_{s}$
\end_inset
, the soft distance between the received word and the codeword:
\begin_inset Formula
\begin{equation}
d_{s}=\sum_{j=1}^{n}\alpha_{j}\,(1+p_{1,j}).\label{eq:soft_distance}
\end{equation}
\end_inset
Here
\begin_inset Formula $\alpha_{j}=0$
\end_inset
if received symbol
\begin_inset Formula $j$
\end_inset
is the same as the corresponding symbol in the codeword,
\begin_inset Formula $\alpha_{j}=1$
\end_inset
if the received symbol and codeword symbol are different, and
\begin_inset Formula $p_{1,j}$
\end_inset
is the fractional power associated with received symbol
\begin_inset Formula $j$
\end_inset
.
Think of the soft distance as made up of two terms: the first is the Hamming
distance between the received word and the codeword, and the second ensures
that if two candidate codewords have the same Hamming distance from the
received word, a smaller soft distance will be assigned to the one where
differences occur in symbols of lower estimated reliability.
\end_layout
\begin_layout Standard
In practice we find that
\begin_inset Formula $d_{s}$
\end_inset
can reliably indentify the correct codeword if the signal-to-noise ratio
for individual symbols is greater than about 4 in power units, or
\begin_inset Formula $E_{s}/N_{0}\apprge6$
\end_inset
dB.
We also find that weaker signals frequently can be decoded by using soft-symbol
information beyond that contained in
\begin_inset Formula $p_{1}$
\end_inset
and
\begin_inset Formula $p_{2}$
\end_inset
.
To this end we define an additional metric
\begin_inset Formula $u$
\end_inset
, the average signal-plus-noise power in all symbols, according to a candidate
codeword's symbol values:
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
u=\frac{1}{n}\sum_{j=1}^{n}S(c_{j},\, j).
\]
\end_inset
Here the
\begin_inset Formula $c_{j}$
\end_inset
's are the symbol values for the candidate codeword being tested.
\end_layout
\begin_layout Standard
The correct JT65 codeword produces a value for
\begin_inset Formula $u$
\end_inset
equal to average of
\begin_inset Formula $n=63$
\end_inset
bins containing both signal and noise power.
Incorrect codewords have at most
\begin_inset Formula $k-1=11$
\end_inset
such bins and at least
\begin_inset Formula $n-k+1=52$
\end_inset
bins containing noise only.
Thus, if the spectral array
\begin_inset Formula $S(i,\, j)$
\end_inset
has been normalized so that its median value (essentially the average noise
level) is unity, the correct codeword is expected to yield the metric value
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
u=(1\pm n^{-\frac{1}{2}})(1+y)\approx(1.0\pm0.13)(1+y),
\]
\end_inset
where
\begin_inset Formula $y$
\end_inset
is the signal-to-noise ratio (in linear power units) and the quoted one-standar
d-deviation uncertainty range assumes Gaussian statistics.
Incorrect codewords will yield metric values no larger than
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
u=\frac{n-k+1\pm\sqrt{n-k+1}}{n}+\frac{k-1\pm\sqrt{k-1}}{n}(1+y).
\]
\end_inset
For JT65 this expression evaluates to
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
u\approx1\pm0.11+(0.17\pm0.05)\, y.
\]
\end_inset
As a specific example, consider signal strength
\begin_inset Formula $y=4$
\end_inset
, corresponding to
\begin_inset Formula $E_{s}/N_{0}=6$
\end_inset
dB.
For JT65, the corresponding SNR in 2500 Hz bandwidth is
\begin_inset Formula $-23.7$
\end_inset
dB.
The correct codeword is then expected to yield
\begin_inset Formula $u\approx5.0\pm$
\end_inset
0.6, while incorrect codewords will give
\begin_inset Formula $u\approx1.7\pm0.3$
\end_inset
or less.
We find that a threshold set at
\begin_inset Formula $u_{0}=4.4$
\end_inset
(about 8 standard deviations above the expected maximum for incorrect codewords
) reliably serves to distinguish correct codewords from all other candidates,
while ensuring a very small probability of false decodes.
\end_layout
\begin_layout Standard
Technically the FT algorithm is a list decoder.
Among the list of candidate codewords found by the stochastic search algorithm,
only the one with the largest
\begin_inset Formula $u$
\end_inset
is retained.
As with all such algorithms, a stopping criterion is necessary.
FT accepts a codeword unconditionally if
\begin_inset Formula $u>u_{0}$
\end_inset
.
A timeout is used to limit the algorithm's execution time if no acceptable
codeword is found in a reasonable number of trials,
\begin_inset Formula $T$
\end_inset
.
Today's personal computers are fast enough that
\begin_inset Formula $T$
\end_inset
can be set as large as
\begin_inset Formula $10^{5},$
\end_inset
or even higher.
\end_layout
\begin_layout Paragraph
Algorithm pseudo-code:
\end_layout
\begin_layout Enumerate
For each received symbol, define the erasure probability as 1.3 times the
\emph on
a priori
\emph default
symbol-error probability determined from soft-symbol information
\begin_inset Formula $\{p_{1}\textrm{-rank},\, p_{2}/p_{1}\}$
\end_inset
.
\end_layout
\begin_layout Enumerate
Make independent stochastic decisions about whether to erase each symbol
by using the symbol's erasure probability, allowing a maximum of 51 erasures.
\end_layout
\begin_layout Enumerate
Attempt errors-and-erasures decoding by using the BM algorithm and the set
of erasures determined in step 2.
If the BM decoder produces a candidate codeword, go to step 5.
\begin_inset Foot
status open
\begin_layout Plain Layout
Our implementation of the FT-algorithm is based on the excellent open-source
BM decoder written by Phil Karn, KA9Q.
\end_layout
\end_inset
\end_layout
\begin_layout Enumerate
If BM decoding was not successful, go to step 2.
\end_layout
\begin_layout Enumerate
Calculate the hard-decision Hamming distance between the candidate codeword
and the received symbols, the corresponding soft distance
\begin_inset Formula $d_{s}$
\end_inset
, and the quality metric
\begin_inset Formula $u$
\end_inset
.
If
\begin_inset Formula $u$
\end_inset
is the largest one encountered so far, set
\begin_inset Formula $u_{max}=u$
\end_inset
.
\end_layout
\begin_layout Enumerate
If
\begin_inset Formula $u_{max}>u_{0}$
\end_inset
, go to step 8.
\end_layout
\begin_layout Enumerate
If the number of trials is less than the timeout limit
\begin_inset Formula $T,$
\end_inset
go to 2.
Otherwise, declare decoding failure and exit.
\end_layout
\begin_layout Enumerate
An acceptable codeword with
\begin_inset Formula $u_{max}>u_{0}$
\end_inset
has been found.
Declare a successful decode and return this codeword.
\end_layout
\begin_layout Standard
The inspiration for the FT decoding algorithm came from a number of sources,
particularly references
\begin_inset CommandInset citation
LatexCommand cite
key "key-2"
\end_inset
and
\begin_inset CommandInset citation
LatexCommand cite
key "key-3"
\end_inset
and the textbook by Lin and Costello
\begin_inset CommandInset citation
LatexCommand cite
key "key-1"
\end_inset
.
After developing this algorithm, we became aware that our approach is conceptua
lly similar to a
\begin_inset Quotes eld
\end_inset
stochastic erasures-only list decoding algorithm
\begin_inset Quotes erd
\end_inset
, described in reference
\begin_inset CommandInset citation
LatexCommand cite
key "key-4"
\end_inset
.
The algorithm in
\begin_inset CommandInset citation
LatexCommand cite
key "key-4"
\end_inset
is applied to higher-rate Reed-Solomon codes on a binary-input channel
over which BPSK-modulated symbols are transmitted.
Our 64-ary input channel with 64-FSK modulation required us to develop
our own unique methods for assigning erasure probabilities and for defining
an acceptance criteria to select the best codeword from the list of candidates.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Hinted-Decoding"
\end_inset
Hinted Decoding
\end_layout
\begin_layout Standard
To be written...
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Implementation-in-WSJT-X"
\end_inset
Implementation in WSJT-X
\end_layout
\begin_layout Standard
To be written...
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Theory,-Simulation,-and"
\end_inset
Decoder Performance Evaluation
\end_layout
\begin_layout Subsection
Simulated results on the AWGN channel
\end_layout
\begin_layout Standard
Comparisons of decoding performance are usually presented in the professional
literature as plots of word error-rate versus
\begin_inset Formula $E_{b}/N_{0}$
\end_inset
, the ratio of the energy collected per information bit to the one-sided
noise power spectral density,
\begin_inset Formula $N_{0}$
\end_inset
.
In amateur radio circles performance is usually plotted as the probability
of successfully decoding a received word vs signal-to-noise ratio in a
2.5 kHz reference bandwidth,
\begin_inset Formula $\mathrm{SNR}{}_{2.5\,\mathrm{kHz}}$
\end_inset
.
The relationship between
\begin_inset Formula $E_{b}/N_{o}$
\end_inset
and
\begin_inset Formula $\mathrm{SNR}{}_{2.5\,\mathrm{kHz}}$
\end_inset
is described in Appendix
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:Appendix:SNR"
\end_inset
.
\end_layout
\begin_layout Standard
Results of simulations using the BM, FT, and KV decoding algorithms on the
JT65 (63,12) code are presented in terms of word error-rate vs
\begin_inset Formula $E_{b}/N_{o}$
\end_inset
in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
.
For these tests we generated at least 1000 signals at each signal-to-noise
ratio, assuming the additive white gaussian noise (AWGN) channel, and processed
the data using each algorithm.
For word error-rates less than 0.1 it was necessary to process 10,000 or
even 100,000 simulated signals in order to capture enough errors to make
the estimates of word-error-rate statistically meaningful.
As a test of the fidelity of our numerical simulations, Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
also shows theoretical results (filled squares) for comparison with the
BM results.
The simulated BM results agree with theory to within about 0.1 dB.
This difference between simulated BM results and theory is caused by small
errors in the estimates of time- and frequency-offset of the received signal
in the simulated results.
Such
\begin_inset Quotes eld
\end_inset
sync losses
\begin_inset Quotes erd
\end_inset
are not accounted for in the idealized theoretical results.
\end_layout
\begin_layout Standard
As expected, the soft-decision algorithms, FT and KV, are about 2 dB better
than the hard-decision BM algorithm.
In addition, FT has a slight edge (about 0.2 dB) over KV.
On the other hand, the execution time for FT with
\begin_inset Formula $T=10^{5}$
\end_inset
is longer than the execution time for the KV algorithm.
Nevertheless, the execution time required for the FT algorithm with
\begin_inset Formula $T=10^{5}$
\end_inset
is small enough to be practical on most computers.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_bodide.pdf
\end_inset
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:bodide"
\end_inset
Word error rates as a function of
\begin_inset Formula $E_{b}/N_{0},$
\end_inset
the signal-to-noise ratio per bit.
The single curve marked with filled squares shows a theoretical prediction
for the BM decoder.
Open squares illustrate simulation results for an AWGN channel with the
BM, FT (
\begin_inset Formula $T=10^{5}$
\end_inset
) and KV (
\begin_inset Formula $\lambda=15$
\end_inset
) decoders used in program
\emph on
WSJT-X
\emph default
.
The KV results are for decoding complexity coefficient
\begin_inset Formula $\lambda=15$
\end_inset
, the most aggressive setting that has historically been used in earlier
versions of the WSJT programs.
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Because of the importance of error-free transmission in commercial applications,
plots like that in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
often extend downward to much smaller error rates, say
\begin_inset Formula $10^{-6}$
\end_inset
or less.
The circumstances for minimal amateur-radio QSOs are very different, however.
Error rates of order 0.1 or higher may be acceptable.
In this case the essential information is better presented in a plot showing
the percentage of transmissions copied correctly as a function of signal-to-noi
se ratio.
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:WER2"
\end_inset
shows the FT results for
\begin_inset Formula $T=10^{5}$
\end_inset
and the KV results that were shown in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
in this format along with additional FT results for
\begin_inset Formula $T=10^{4},10^{3},10^{2}$
\end_inset
and
\begin_inset Formula $10^{1}$
\end_inset
.
The KV results are plotted with open triangles.
It is apparent that the FT decoder produces more decodes than KV when
\begin_inset Formula $T=10^{4}$
\end_inset
or larger.
It also provides a very significant gain over the hard-decision BM decoder
even when limited to at most 10 trials.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_wer2.pdf
lyxscale 120
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:WER2"
\end_inset
Percent of JT65 messages copied as a function of SNR in 2.5 kHz bandwidth.
Solid lines with filled round circles are results from the FT decoder with
\begin_inset Formula $T=10^{5},10^{4},10^{3},10^{2}$
\end_inset
and
\begin_inset Formula $10$
\end_inset
, respectively, from left to right.
The dashed line with open triangles is the KV decoder with complexity coefficie
nt
\begin_inset Formula $\lambda=15$
\end_inset
.
Results from the BM algorithm are also shown with filled triangles.
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
The timeout parameter
\begin_inset Formula $T$
\end_inset
employed in the FT algorithm is the maximum number of symbol-erasure trials
allowed for a particular attempt at decoding a received word.
Most successful decodes take only a small fraction of the maximum allowed
number of trials.
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:N_vs_X"
\end_inset
shows the number of stochastic erasure trials required to find the correct
codeword versus the number of hard-decision errors in the received word
for a run with 1000 simulated transmissions at
\begin_inset Formula $\mathrm{SNR}=-24$
\end_inset
dB, just slightly above the decoding threshold.
The timeout parameter was
\begin_inset Formula $T=10^{5}$
\end_inset
for this run.
No points are shown for
\begin_inset Formula $X\le25$
\end_inset
because all such words were successfully decoded by the BM algorithm.
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:N_vs_X"
\end_inset
shows that the FT algorithm decoded received words with as many as
\begin_inset Formula $X=43$
\end_inset
symbol errors.
The results also show that, on average, the number of trials increases
with the number of errors in the received word.
The variability of the decoding time also increases dramatically with the
number of errors in the received word.
These results also provide insight into the mean and variance of the execution
time for the FT algorithm, as execution time will be roughly proportional
to the number of required trials.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_ntrials_vs_nhard.pdf
lyxscale 120
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:N_vs_X"
\end_inset
Number of trials needed to decode a received word versus Hamming distance
between the received word and the decoded codeword, for 1000 simulated
frames on an AWGN channel with no fading.
The SNR in 2500 Hz bandwidth is -24 dB (
\begin_inset Formula $E_{b}/N_{o}=5.1$
\end_inset
dB).
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsection
Simulated results for hinted decoding and Rayleigh fading
\end_layout
\begin_layout Standard
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Psuccess"
\end_inset
presents the results of simulations for signal-to-noise ratios ranging
from
\begin_inset Formula $-18$
\end_inset
to
\begin_inset Formula $-30$
\end_inset
dB, again using 1000 simulated signals for each plotted point.
We include three curves for each decoding algorithm: one for the AWGN channel
and no fading, and two more for simulated Doppler spreads of 0.2 and 1.0
Hz.
For reference, we note that the JT65 symbol rate is about 2.69 Hz.
The simulated Doppler spreads are comparable to those encountered on HF
ionospheric paths and for EME at VHF and lower UHF bands.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_psuccess.pdf
lyxscale 90
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:Psuccess"
\end_inset
Percentage of JT65 messages successfully decoded as a function of SNR in
2500 Hz bandwidth.
Results are shown for the hard-decision Berlekamp-Massey (BM) and soft-decision
Franke-Taylor (FT) decoding algorithms.
Curves labeled DS correspond to the hinted-decode (
\begin_inset Quotes eld
\end_inset
Deep Search
\begin_inset Quotes erd
\end_inset
) matched-filter algorithm.
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Section
Summary
\end_layout
\begin_layout Standard
...
Still to come ...
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "1"
key "key-1"
\end_inset
Error Control Coding, 2nd edition, Shu Lin and Daniel J.
Costello, Pearson-Prentice Hall, 2004.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "2"
key "key-2"
\end_inset
"Stochastic Chase Decoding of Reed-Solomon Codes", Camille Leroux, Saied
Hemati, Shie Mannor, Warren J.
Gross, IEEE Communications Letters, Vol.
14, No.
9, September 2010.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "3"
key "key-3"
\end_inset
"Soft-Decision Decoding of Reed-Solomon Codes Using Successive Error-and-Erasure
Decoding," Soo-Woong Lee and B.
V.
K.
Vijaya Kumar, IEEE
\begin_inset Quotes eld
\end_inset
GLOBECOM
\begin_inset Quotes erd
\end_inset
2008 proceedings.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "4"
key "key-4"
\end_inset
\begin_inset Quotes erd
\end_inset
Stochastic Erasure-Only List Decoding Algorithms for Reed-Solomon Codes,
\begin_inset Quotes erd
\end_inset
Chang-Ming Lee and Yu T.
Su, IEEE Signal Processing Letters, Vol.
16, No.
8, August 2009.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "5"
key "key-5"
\end_inset
“Algebraic soft-decision decoding of Reed-Solomon codes,” R.
Köetter and A.
Vardy, IEEE Trans.
Inform.
Theory, Vol.
49, Nov.
2003.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "6"
key "key-6"
\end_inset
Berlekamp-Massey decoder written by Phil Karn, http://www.ka9q.net/code/fec/
\end_layout
\begin_layout Section
\start_of_appendix
\begin_inset CommandInset label
LatexCommand label
name "sec:Appendix:SNR"
\end_inset
Appendix: Signal to Noise Ratios
\end_layout
\begin_layout Standard
The signal to noise ratio in a bandwidth,
\begin_inset Formula $B$
\end_inset
, that is at least as large as the bandwidth occupied by the signal is:
\begin_inset Formula
\begin{equation}
\mathrm{SNR}_{B}=\frac{P_{s}}{N_{o}B}\label{eq:SNR}
\end{equation}
\end_inset
where
\begin_inset Formula $P_{s}$
\end_inset
is the signal power (W),
\begin_inset Formula $N_{o}$
\end_inset
is one-sided noise power spectral density (W/Hz), and
\begin_inset Formula $B$
\end_inset
is the bandwidth in Hz.
In amateur radio applications, digital modes are often compared based on
the SNR defined in a 2.5 kHz reference bandwidth,
\begin_inset Formula $\mathrm{SNR}_{2.5\,\mathrm{kHz}}$
\end_inset
.
\end_layout
\begin_layout Standard
In the professional literature, decoder performance is characterized in
terms of
\begin_inset Formula $E_{b}/N_{o}$
\end_inset
, the ratio of the energy collected per information bit,
\begin_inset Formula $E_{b}$
\end_inset
, to the one-sided noise power spectral density,
\begin_inset Formula $N_{o}$
\end_inset
.
Denote the duration of a channel symbol by
\begin_inset Formula $\tau_{s}$
\end_inset
(for JT65,
\begin_inset Formula $\tau_{s}=0.3715\,\mathrm{s}$
\end_inset
).
Signal power is related to the energy per symbol by
\begin_inset Formula
\begin{equation}
P_{s}=E_{s}/\tau_{s}.\label{eq:signal_power}
\end{equation}
\end_inset
The total energy in a received JT65 message consisting of
\begin_inset Formula $n=63$
\end_inset
channel symbols is
\begin_inset Formula $63E_{s}$
\end_inset
.
The energy collected for each of the 72 bits of information conveyed by
the message is then
\begin_inset Formula
\begin{equation}
E_{b}=\frac{63E_{s}}{72}=0.875E_{s.}\label{eq:Eb_Es}
\end{equation}
\end_inset
Using equations (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:SNR"
\end_inset
)-(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:Eb_Es"
\end_inset
),
\begin_inset Formula $\mathrm{SNR}_{2.5\,\mathrm{kHz}}$
\end_inset
can be written in terms of
\begin_inset Formula $E_{b}/N_{o}$
\end_inset
:
\begin_inset Formula
\[
\mathrm{SNR}_{2.5\,\mathrm{kHz}}=1.23\times10^{-3}\frac{E_{b}}{N_{o}}.
\]
\end_inset
If all quantities are expressed in dB, then:
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
SNR_{2.5\,\mathrm{kHz}}=(E_{b}/N_{o})_{\mathrm{dB}}-29.1\,\mathrm{dB}.
\]
\end_inset
\end_layout
\end_body
\end_document