WSJT-X/lib/ftrsd/ftrsd_paper/ftrsd.lyx
Steven Franke 5c08d03c0b Tweaks to accommodate overcite citation style.
git-svn-id: svn+ssh://svn.code.sf.net/p/wsjt/wsjt/branches/wsjtx@6448 ab8295b8-cf94-4d9e-aec4-7959e3be5d79
2016-01-28 12:06:33 +00:00

3290 lines
72 KiB
Plaintext

#LyX 2.1 created this file. For more info see http://www.lyx.org/
\lyxformat 474
\begin_document
\begin_header
\textclass paper
\begin_preamble
\usepackage{ragged2e}
\exhyphenpenalty=10000\hyphenpenalty=10000
\fancyhf{}
\fancyhead[L]{Franke \& Taylor: {\it Open Source Soft-Decision Decoder \ldots}}
\fancyhead[R]{\thepage}
\makeatletter
\let\ps@plain\ps@fancy % Plain page style = fancy page style
\makeatother
\usepackage{nomencl}
\usepackage{overcite}
\renewcommand{\nomname}{Sidebar: Glossary of Specialized Terms}
\end_preamble
\use_default_options true
\begin_modules
boxedfloat
\end_modules
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman lmodern
\font_sans lmss
\font_typewriter lmtt
\font_math auto
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\float_placement H
\paperfontsize 12
\spacing onehalf
\use_hyperref false
\papersize default
\use_geometry true
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification false
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 1in
\topmargin 1in
\rightmargin 1in
\bottommargin 1in
\secnumdepth 3
\tocdepth 3
\paragraph_separation skip
\defskip bigskip
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle fancy
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
Open Source Soft-Decision Decoder for the JT65 (63,12) Reed-Solomon Code
\end_layout
\begin_layout SubTitle
\emph on
Under-the-hood description of the JT65 decoding procedure, including a wholly
new algorithm for its powerful error-correcting code.
\end_layout
\begin_layout Author
Steven J.
Franke, K9AN and Joseph H.
Taylor, K1JT
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Introduction-and-Motivation"
\end_inset
Background and Motivation
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
RaggedRight
\end_layout
\end_inset
The JT65 protocol has revolutionized amateur-radio weak-signal communication
by enabling operators with small or compromise antennas and relatively
low-power transmitters to communicate over propagation paths not usable
with traditional technologies.
The protocol was developed in 2003 for Earth-Moon-Earth (EME, or
\begin_inset Quotes eld
\end_inset
moonbounce
\begin_inset Quotes erd
\end_inset
) communication
\begin_inset CommandInset citation
LatexCommand cite
key "jt65_protocol"
\end_inset
, where the scattered return signals are always weak.
It was soon found that JT65 also enables worldwide communication on the
HF bands with low power, modest antennas, and efficient spectral usage.
Thousands of amateurs now use JT65 on a regular basis, making contacts
on all bands from 160 meters through microwaves.
\end_layout
\begin_layout Standard
JT65 uses timed transmitting and receiving sequences one minute long.
Messages are short and structured so as to streamline minimal exchanges
between two amateur operators over potentially difficult radio paths.
Most messages contain two callsigns and a grid locator, signal report,
acknowledgment, or sign-off; one of the tokens CQ, QRZ, or DE may be substitute
d for the first callsign.
Alternatively, a message may contain up to 13 Latin characters of arbitrary
text.
All messages are efficiently compressed into exactly 72 bits of digital
information.
It should be obvious that the JT65 protocol is intended for the basic purpose
of completing legitimate, documented two-way contacts, but not for extended
conversations.
Full details of the message structure and encoding procedure were presented
in an earlier publication
\begin_inset CommandInset citation
LatexCommand cite
key "jt65_protocol"
\end_inset
.
For a concise description of the overall process of transmitting and receiving
a JT65 message, see the accompanying sidebar
\series bold
JT65 Message Processing
\series default
.
\end_layout
\begin_layout Standard
A major reason for the success and popularity of JT65 is its use of a strong
error-correction code.
Before transmission, each 72-bit message is divided into 12 six-bit
\emph on
symbols
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Symbol: }"
description "The information carried in one signalling interval, usually an integral number of bits. JT65 uses 6-bit symbols."
\end_inset
\emph default
and augmented with 51 additional symbols of error-correcting information.
These 51
\emph on
parity symbols
\emph default
are computed according to information-theory rules that maximize the probabilit
y of correctly decoding the message, even if many symbols are received incorrect
ly.
The JT65 code is properly described as a short block-length, low-rate Reed-Solo
mon code based on a 64-symbol
\emph on
alphabet.
\emph default
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Alphabet: }"
description "A sequence of possible symbol values used for signaling. JT65 uses a 64-character alphabet, values in the range 0 to 63."
\end_inset
Characters in this alphabet are mapped onto 64 different frequencies for
transmission.
\end_layout
\begin_layout Standard
Reed Solomon codes are widely used to ensure reliability in data transmission
and storage.
In hardware implementations, decoding is generally accomplished with a
procedure such as the Berlekamp-Massey (BM) algorithm, based on
\emph on
hard decisions
\emph default
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Hard decision: }"
description "Received symbols are assigned definite values by the demodulator."
\end_inset
for each of the symbol values received.
\emph on
Soft decisions
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Soft decision: }"
description "Received symbols are assigned tentative values (most probable, second most probable, etc.) and quality indicators."
\end_inset
\emph default
are potentially more powerful, however.
For each received JT65 symbol we can estimate not only the value most likely
to be correct, but also the second, third, etc., most likely.
Most importantly, we can also estimate the probability that each of those
possible values is the correct one.
Decoders that make use of such information are called
\emph on
soft-decision decoders.
\end_layout
\begin_layout Standard
Until now, nearly all programs implementing JT65 have used the patented
Kötter-Vardy (KV) algebraic soft-decision decoder
\begin_inset CommandInset citation
LatexCommand cite
key "kv2001"
\end_inset
, licensed to and implemented by K1JT as a closed-source executable for
use only in amateur radio applications.
Since 2001 the KV decoder has been considered the best available soft-decision
decoder for Reed Solomon codes.
\end_layout
\begin_layout Standard
We describe here a new open-source alternative called the Franke-Taylor
(FT, or K9AN-K1JT) soft-decision decoding algorithm.
It is conceptually simple, built on top of the BM hard-decision decoder,
and in this application it performs even better than the KV decoder.
The FT algorithm is implemented in the popular programs
\emph on
WSJT
\emph default
,
\emph on
MAP65
\emph default
, and
\emph on
WSJT-X
\emph default
, widely used for amateur weak-signal communication using JT65 and other
specialized digital protocols.
These programs are open-source, freely available
\begin_inset CommandInset citation
LatexCommand cite
key "wsjt"
\end_inset
, and licensed under the GNU General Public License.
\end_layout
\begin_layout Standard
The JT65 protocol specifies transmissions that start one second into a UTC
minute and last for 46.8 seconds.
Receiving software therefore has as much as ten seconds to decode a message
before the start of the next minute, when the operator will send a reply.
With today's personal computers, this relatively long time encourages experimen
tation with decoders of high computational complexity.
With time to spare, the FT algorithm lowers the decoding threshold on a
typical fading channel by many dB over the hard-decision BM decoder, and
by a meaningful amount over the KV decoder.
In addition to its excellent performance, the new algorithm has other desirable
properties, not least of which is its conceptual simplicity.
Decoding performance and computational complexity scale in a convenient
way, providing steadily increasing soft-decision decoding gain as a tunable
parameter is increased over more than five orders of magnitude.
Appreciable gain is available from our decoder even on very simple (and
relatively slow) computers.
On the other hand, because the algorithm benefits from a large number of
independent decoding trials, further performance gains should be achievable
through parallelization on high-performance computers.
\end_layout
\begin_layout Standard
The remainder of this paper is organized as follows.
Section 2 presents a brief overview of the nature of Reed Solomon codes
and their error-correcting capabilities.
Section 3 provides statistical motivation for the FT algorithm, and Section
4 describes the algorithm in full detail.
Material in these two sections is important because it documents our approach
and underlines its fundamental technical contributions.
These sections are heavier in formal mathematics than common in
\emph on
QEX
\emph default
; for this reason, some readers may choose to skip or skim them and proceed
more quickly to the results.
Most readers will benefit by reviewing the original paper on the JT65 protocol
\begin_inset CommandInset citation
LatexCommand cite
key "jt65_protocol"
\end_inset
.
A procedure for
\emph on
hinted decoding
\emph default
--- determining which one, if any, of a list of likely messages matches
the one that was received --- is outlined in Section 5.
Finally, in Section 6 we present performance measurements of the FT and
hinted decoding algorithms and make explicit comparisons to the BM and
KV decoders familiar to users of older versions of
\emph on
WSJT
\emph default
,
\emph on
MAP65
\emph default
and
\emph on
WSJT-X
\emph default
.
Section 7 summarizes some on-the-air experiences with the new decoder.
Refer to the sidebar
\series bold
Glossary of Specialized Terms
\series default
for brief definitions of some potentially unfamiliar language.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:JT65-messages-and"
\end_inset
JT65 Messages and Reed Solomon Codes
\end_layout
\begin_layout Standard
The JT65 message frame consists of a short, compressed 72-bit message encoded
for transmission with a Reed-Solomon code.
Reed-Solomon codes are
\emph on
block codes
\emph default
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Block code: }"
description "An error-correcting code that treats data in blocks of fixed size."
\end_inset
characterized by
\begin_inset Formula $n$
\end_inset
, the length of their
\emph on
codewords
\emph default
;
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Codeword:}"
description "For the JT65 code, a vector of 63 symbol values each in the range 0 to 63."
\end_inset
\begin_inset Formula $k$
\end_inset
, the number of message symbols conveyed by the codeword; and the transmission
alphabet, or number of possible values for each symbol in a codeword.
The codeword length and the number of message symbols are specified with
the notation
\begin_inset Formula $(n,k)$
\end_inset
.
JT65 uses a (63,12) Reed-Solomon code with an alphabet of 64 possible values
for each symbol.
Each of the 12 message symbols represents
\begin_inset Formula $\log_{2}64=6$
\end_inset
message bits.
The source-encoded
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Source encoding: }"
description "Compression of a message to use a minimum number or bits. JT65 source-encodes all messages to 72 bits."
\end_inset
message conveyed by a 63-symbol JT65 frame thus consists of 72 information
bits.
The JT65 code is
\emph on
systematic
\emph default
, which means that the 12 message symbols are embedded in the codeword without
modification and another 51 parity symbols derived from the message symbols
are added to form a codeword of 63 symbols.
\end_layout
\begin_layout Standard
In coding theory the concept of
\emph on
Hamming distance
\emph default
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Hamming distance: }"
description "The Hamming distance between two codewords, or between a received word and a codeword, is equal to the number of symbol positions in which they differ."
\end_inset
is used as a measure of disagreement between different codewords, or between
a received word
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Received word: }"
description "A vector of symbol values, possibly accompanied by soft information on individual reliabilities."
\end_inset
and a codeword.
Hamming distance is the number of code symbols that differ in two words
being compared.
Reed-Solomon codes have guaranteed minimum Hamming distance
\begin_inset Formula $d$
\end_inset
, where
\begin_inset Formula
\begin{equation}
d=n-k+1.\label{eq:minimum_distance}
\end{equation}
\end_inset
With
\begin_inset Formula $n=63$
\end_inset
and
\begin_inset Formula $k=12$
\end_inset
the minimum Hamming distance of the JT65 code is
\begin_inset Formula $d=52$
\end_inset
.
With 72 information bits in each message, JT65 can transmit any one of
\begin_inset Formula $2^{72}\approx4.7\times10^{21}$
\end_inset
possible messages.
The codeword for any message differs from every other codeword in at least
52 of the 63 symbol positions.
\end_layout
\begin_layout Standard
A received word containing some
\emph on
errors
\emph default
(incorrect symbols) can be decoded into the correct codeword using a determinis
tic,
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Deterministic algorithm: }"
description "A series of computational steps that for the same input always produces the same output."
\end_inset
algebraic algorithm provided that no more than
\begin_inset Formula $t$
\end_inset
symbols were received incorrectly, where
\begin_inset Formula
\begin{equation}
t=\left\lfloor \frac{n-k}{2}\right\rfloor .\label{eq:t}
\end{equation}
\end_inset
For the JT65 code
\begin_inset Formula $t=25$
\end_inset
, so it is always possible to decode a received word having 25 or fewer
symbol errors.
Any one of several well-known algebraic algorithms, such as the BM algorithm,
can carry out this hard-decision decoding.
Two steps are necessarily involved in this process.
We must (1) determine which symbols were received incorrectly, and (2)
find the correct value of the incorrect symbols.
If we somehow know that certain symbols are incorrect, that information
can be used to reduce the work involved in step 1 and allow step 2 to correct
more than
\begin_inset Formula $t$
\end_inset
errors.
In the unlikely event that the location of every error is known, and if
no correct symbols are accidentally labeled as errors, the BM algorithm
can correct up to
\begin_inset Formula $d-1=n-k$
\end_inset
errors.
\end_layout
\begin_layout Standard
The FT algorithm creates lists of symbols suspected of being incorrect and
sends them to the BM decoder.
Symbols flagged in this way are called
\emph on
erasures
\emph default
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Erasure: }"
description "A received symbol may be ``erased'' when confidence in its value is so low that it is unlikely to provide useful information. "
\end_inset
.
With perfect erasure information up to
\begin_inset Formula $n-k=51$
\end_inset
incorrect symbols can be corrected for the JT65 code.
Imperfect erasure information means that some erased symbols may be correct,
and some other symbols in error.
If
\begin_inset Formula $s$
\end_inset
symbols are erased and the remaining
\begin_inset Formula $n-s$
\end_inset
symbols contain
\begin_inset Formula $e$
\end_inset
errors, the BM algorithm can find the correct codeword as long as
\begin_inset Formula
\begin{equation}
s+2e\le d-1.\label{eq:erasures_and_errors}
\end{equation}
\end_inset
If
\begin_inset Formula $s=0$
\end_inset
, the decoder is said to be an
\emph on
errors-only
\emph default
decoder.
If
\begin_inset Formula $0<s\le d-1$
\end_inset
, the decoder is called an
\emph on
errors-and-erasures
\emph default
decoder.
The possibility of doing errors-and-erasures decoding lies at the heart
of the FT algorithm.
On that foundation we have built a capability for using soft information
on the reliability of individual symbol values, thereby producing a soft-decisi
on decoder.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Statistical Framework"
\end_inset
Statistical Framework
\end_layout
\begin_layout Standard
The FT algorithm uses the estimated quality of received symbols to generate
lists of symbols considered likely to be in error, thus enabling decoding
of received words with more than 25 errors.
Algorithms of this type are generally called
\emph on
reliability-based
\emph default
or
\emph on
probabilistic
\emph default
decoding methods
\begin_inset CommandInset citation
LatexCommand cite
key "lc2004"
\end_inset
.
Such algorithms involve some amount of educating guessing about which received
symbols are in error or, alternatively, about which received symbols are
correct.
The guesses are informed by quality metrics associated with the received
symbols.
To illustrate why it is absolutely essential to use such soft-symbol informatio
n in these algorithms it helps to consider what would happen if we tried
to use completely random guesses, ignoring any available soft-symbol informatio
n.
\end_layout
\begin_layout Standard
As a specific example, consider a received JT65 word with 23 correct symbols
and 40 errors.
We do not know which symbols are in error.
Suppose that the decoder randomly selects
\begin_inset Formula $s=40$
\end_inset
symbols for erasure, leaving 23 unerased symbols.
According to Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
), the BM decoder can successfully decode this word as long as
\begin_inset Formula $e$
\end_inset
, the number of errors present in the 23 unerased symbols, is 5 or less.
The number of errors captured in the set of 40 erased symbols must therefore
be at least 35.
\end_layout
\begin_layout Standard
The probability of selecting some particular number of incorrect symbols
in a randomly selected subset of received symbols is governed by the hypergeome
tric probability distribution.
Let us define
\begin_inset Formula $N$
\end_inset
as the number of symbols from which erasures will be selected,
\begin_inset Formula $X$
\end_inset
as the number of incorrect symbols in the set of
\begin_inset Formula $N$
\end_inset
symbols, and
\begin_inset Formula $x$
\end_inset
as the number of errors in the symbols actually erased.
In an ensemble of many received words
\begin_inset Formula $X$
\end_inset
and
\begin_inset Formula $x$
\end_inset
will be random variables, but for this example we will assume that
\begin_inset Formula $X$
\end_inset
is known and that only
\begin_inset Formula $x$
\end_inset
is random.
The conditional probability mass function for
\begin_inset Formula $x$
\end_inset
with stated values of
\begin_inset Formula $N$
\end_inset
,
\begin_inset Formula $X$
\end_inset
, and
\begin_inset Formula $s$
\end_inset
may be written as
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
P(x=\epsilon|N,X,s)=\frac{\binom{X}{\epsilon}\binom{N-X}{s-\epsilon}}{\binom{N}{s}}\label{eq:hypergeometric_pdf}
\end{equation}
\end_inset
where
\begin_inset Formula $\binom{n}{k}=\frac{n!}{k!(n-k)!}$
\end_inset
is the binomial coefficient.
The binomial coefficient can be calculated using the function
\family typewriter
nchoosek(n,k)
\family default
in the numerical computing language
\emph on
GNU Octave
\emph default
, or with one of many free online calculators.
The hypergeometric probability mass function defined in Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:hypergeometric_pdf"
\end_inset
) is available in
\emph on
GNU Octave
\emph default
as function
\family typewriter
hygepdf(x,N,X,s)
\family default
.
The cumulative probability that at least
\begin_inset Formula $\epsilon$
\end_inset
errors are captured in a subset of
\begin_inset Formula $s$
\end_inset
erased symbols selected from a group of
\begin_inset Formula $N$
\end_inset
symbols containing
\begin_inset Formula $X$
\end_inset
errors is
\begin_inset Formula
\begin{equation}
P(x\ge\epsilon|N,X,s)=\sum_{j=\epsilon}^{s}P(x=j|N,X,s).\label{eq:cumulative_prob}
\end{equation}
\end_inset
\end_layout
\begin_layout Paragraph
Example 1:
\end_layout
\begin_layout Standard
Suppose a received word contains
\begin_inset Formula $X=40$
\end_inset
incorrect symbols.
In an attempt to decode using an errors-and-erasures decoder,
\begin_inset Formula $s=40$
\end_inset
symbols are randomly selected for erasure from the full set of
\begin_inset Formula $N=n=63$
\end_inset
symbols.
The probability that
\begin_inset Formula $x=35$
\end_inset
of the erased symbols are actually incorrect is then
\begin_inset Formula
\[
P(x=35)=\frac{\binom{40}{35}\binom{63-40}{40-35}}{\binom{63}{40}}\simeq2.4\times10^{-7}.
\]
\end_inset
Similarly, the probability that
\begin_inset Formula $x=36$
\end_inset
of the erased symbols are incorrect is
\begin_inset Formula
\[
P(x=36)\simeq8.6\times10^{-9}.
\]
\end_inset
Since the probability of erasing 36 errors is so much smaller than that
for erasing 35 errors, we may safely conclude that the probability of randomly
choosing an erasure vector that can decode the received word is approximately
\begin_inset Formula $P(x=35)\simeq2.4\times10^{-7}$
\end_inset
.
The odds of producing a valid codeword on the first try are very poor,
about 1 in 4 million.
\end_layout
\begin_layout Paragraph
Example 2:
\end_layout
\begin_layout Standard
How might we best choose the number of symbols to erase, in order to maximize
the probability of successful decoding? By exhaustive search over all possible
values up to
\begin_inset Formula $s=51$
\end_inset
, it turns out that for
\begin_inset Formula $X=40$
\end_inset
the best strategy is to erase
\begin_inset Formula $s=45$
\end_inset
symbols.
According to equation
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
, with
\begin_inset Formula $s=45$
\end_inset
and
\begin_inset Formula $d=52$
\end_inset
then
\begin_inset Formula $e$
\end_inset
must be 3 or less.
Decoding will be assured if the set of erased symbols contains at least
\begin_inset Formula $40-3=37$
\end_inset
errors.
With
\begin_inset Formula $N=63$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=45$
\end_inset
, the probability of successful decode in a single try is
\begin_inset Formula
\[
P(x\ge37)\simeq1.9\times10^{-6}.
\]
\end_inset
This probability is about 8 times higher than the probability of success
when only 40 symbols were erased.
Nevertheless, the odds of successfully decoding on the first try are still
only about 1 in 500,000.
\end_layout
\begin_layout Paragraph
Example 3:
\end_layout
\begin_layout Standard
Examples 1 and 2 show that a random strategy for selecting symbols to erase
is unlikely to be successful unless we are prepared to wait a long time
for an answer.
So let's modify the strategy to tip the odds in our favor.
Let the received word contain
\begin_inset Formula $X=40$
\end_inset
incorrect symbols, as before, but suppose we know that 10 received symbols
are significantly more reliable than the other 53.
We might therefore protect the 10 most reliable symbols and select erasures
from the smaller set of
\begin_inset Formula $N=53$
\end_inset
less reliable ones.
If
\begin_inset Formula $s=45$
\end_inset
symbols are chosen randomly for erasure in this way, it is still necessary
for the erased symbols to include at least 37 errors, as in Example 2.
However, the probabilities are now much more favorable: with
\begin_inset Formula $N=53$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=45$
\end_inset
, Eq.
(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:cumulative_prob"
\end_inset
) yields
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
linebreak
\end_layout
\end_inset
\begin_inset Formula $\mbox{\ensuremath{P(x\geq37)=0.016.}}$
\end_inset
Even better odds are obtained by choosing
\begin_inset Formula $s=47$
\end_inset
, which requires
\begin_inset Formula $x\ge38$
\end_inset
.
With
\begin_inset Formula $N=53$
\end_inset
,
\begin_inset Formula $X=40$
\end_inset
, and
\begin_inset Formula $s=47$
\end_inset
,
\begin_inset Formula $P(x\ge38)=0.027$
\end_inset
.
The odds for producing a codeword on the first try are now about 1 in 38.
A few hundred independently randomized tries would be enough to all-but-guarant
ee production of a valid codeword by the BM decoder.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:The-decoding-algorithm"
\end_inset
The Franke-Taylor Decoding Algorithm
\end_layout
\begin_layout Standard
Example 3 shows how statistical information about symbol quality should
make it possible to decode received frames having a large number of errors.
In practice the number of errors in the received word is unknown, so our
algorithm simply assigns a high erasure probability to low-quality symbols
and relatively low probability to high-quality symbols.
As illustrated by Example 3, a good choice of erasure probabilities can
increase the chance of producing a codeword by many orders of magnitude.
Once erasure probabilities have been assigned to each of the 63 received
symbols, the FT algorithm uses a random number generator to decide whether
or not to erase each symbol, according to its assigned erasure probability.
The list of erased symbols is then submitted to the BM decoder, which produces
either a codeword or a flag indicating failure to decode.
\end_layout
\begin_layout Standard
The process of selecting the list of symbols to erase and calling the BM
decoder comprises one cycle of the FT algorithm.
The next cycle proceeds with a new selection of erased symbols.
At this stage we must treat any codeword obtained by errors-and-erasures
decoding as no more than a
\emph on
candidate
\emph default
.
Our next task is to find a metric that can reliably select one of many
proffered candidates as the codeword that was actually transmitted.
\end_layout
\begin_layout Standard
The FT algorithm uses quality indices made available by a noncoherent 64-FSK
demodulator (see the sidebar
\series bold
JT65 Message Processing
\series default
).
The demodulator computes binned power spectra for each signaling interval;
the result is a two-dimensional array
\begin_inset Formula $S(i,j)$
\end_inset
, where the frequency index
\begin_inset Formula $i$
\end_inset
assumes values 0 to 63 and the symbol index
\begin_inset Formula $j$
\end_inset
has values 1 to 63.
The most likely value for each symbol is taken as the frequency bin with
largest signal-plus-noise power over all values of
\begin_inset Formula $i$
\end_inset
.
The fractions of total power in the two bins containing the largest and
second-largest powers, denoted respectively by
\begin_inset Formula $p_{1}$
\end_inset
and
\begin_inset Formula $p_{2}$
\end_inset
, are computed for each symbol and passed from demodulator to decoder as
soft-symbol information.
The FT decoder derives two metrics from
\begin_inset Formula $p_{1}$
\end_inset
and
\begin_inset Formula $p_{2}$
\end_inset
, namely
\begin_inset Formula $p_{1}$
\end_inset
-rank (the rank
\begin_inset Formula $\{1,2,\ldots,63\}$
\end_inset
of the symbol's fractional power
\begin_inset Formula $p_{1,\, j}$
\end_inset
in a sorted list of
\begin_inset Formula $p_{1}$
\end_inset
values) and the ratio
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
.
High ranking symbols have larger signal-to-noise ratio than those with
lower rank.
When
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
is close to 1, the most likely symbol value is only slightly more reliable
than the second most likely one.
\end_layout
\begin_layout Standard
We use 3-bit quantization of the metrics
\begin_inset Formula $p_{1}$
\end_inset
-rank and
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
to index the entries in an
\begin_inset Formula $8\times8$
\end_inset
table of symbol error probabilities.
The probabilities were derived empirically from a large data set of received
words that were successfully decoded.
The table provides an estimate of the
\emph on
a priori
\emph default
probability of symbol error based on the metrics
\begin_inset Formula $p_{1}$
\end_inset
-rank and
\begin_inset Formula $p_{2}/p_{1}$
\end_inset
.
This table is a key element of the algorithm, as it determines which symbols
are effectively protected from erasure.
The
\emph on
a priori
\emph default
symbol error probabilities are close to 1 for low-quality symbols and close
to 0 for high-quality symbols.
Recall from Examples 2 and 3 that candidate codewords are produced with
higher probability when the number of erased symbols is larger than the
number of incorrect symbols.
Correspondingly, the FT algorithm works best when the probability of erasing
a symbol is somewhat larger than the probability that the symbol is incorrect.
For the JT65 code we found empirically that good decoding performance is
obtained when the symbol erasure probability is about 1.3 times the symbol
error probability.
\end_layout
\begin_layout Standard
The FT algorithm tries successively to decode the received word using independen
t educated guesses to select symbols for erasure.
For each iteration a stochastic erasure vector is generated based on the
symbol erasure probabilities.
The erasure vector is sent to the BM decoder along with the full set of
63 hard-decision symbol values.
When the BM decoder finds a candidate codeword it is assigned a quality
metric
\begin_inset Formula $d_{s}$
\end_inset
, the
\emph on
soft distance
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Soft distance: }"
description "The soft distance between a received word and a codeword is a measure of how greatly they differ, taking into account available soft information on symbol values."
\end_inset
\emph default
between the received word and the codeword:
\begin_inset Formula
\begin{equation}
d_{s}=\sum_{j=1}^{n}\alpha_{j}\,(1+p_{1,\, j}).\label{eq:soft_distance}
\end{equation}
\end_inset
Here
\begin_inset Formula $\alpha_{j}=0$
\end_inset
if received symbol
\begin_inset Formula $j$
\end_inset
is the same as the corresponding symbol in the codeword,
\begin_inset Formula $\alpha_{j}=1$
\end_inset
if the received symbol and codeword symbol are different, and
\begin_inset Formula $p_{1,\, j}$
\end_inset
is the fractional power associated with received symbol
\begin_inset Formula $j$
\end_inset
.
Think of the soft distance as made up of two terms: the first is the Hamming
distance between the received word and the codeword, and the second ensures
that if two candidate codewords have the same Hamming distance from the
received word, a smaller soft distance will be assigned to the one where
differences occur in symbols of lower estimated reliability.
\end_layout
\begin_layout Standard
In practice we find that
\begin_inset Formula $d_{s}$
\end_inset
can reliably identify the correct codeword if the signal-to-noise ratio
for individual symbols is greater than about 4 in linear power units.
We also find that significantly weaker signals can be decoded by using
soft-symbol information beyond that contained in
\begin_inset Formula $p_{1}$
\end_inset
and
\begin_inset Formula $p_{2}$
\end_inset
.
To this end we define an additional metric
\begin_inset Formula $u$
\end_inset
, the average signal-plus-noise power in all received symbols according
to a candidate codeword's symbol values:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
u=\frac{1}{n}\sum_{j=1}^{n}S(c_{j},\, j).\label{eq:u-metric}
\end{equation}
\end_inset
Here the
\begin_inset Formula $c_{j}$
\end_inset
's are the symbol values for the candidate codeword being tested.
\end_layout
\begin_layout Standard
The correct JT65 codeword produces a value for
\begin_inset Formula $u$
\end_inset
equal to the average of
\begin_inset Formula $n=63$
\end_inset
bins containing both signal and noise power.
Incorrect codewords have at most
\begin_inset Formula $k-1=11$
\end_inset
such bins and at least
\begin_inset Formula $n-k+1=52$
\end_inset
bins containing noise only.
Thus, if the spectral array
\begin_inset Formula $S(i,\, j)$
\end_inset
has been normalized so that the average value of the noise-only bins is
unity,
\begin_inset Formula $u$
\end_inset
for the correct codeword has expectation value (average over many random
realizations) given by
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\bar{u}_{c}=1+y,\label{eq:u1-exp}
\end{equation}
\end_inset
where
\begin_inset Formula $y$
\end_inset
is the signal-to-noise ratio in linear power units.
If we assume Gaussian statistics and a large number of trials, the standard
deviation of measured values of
\begin_inset Formula $u$
\end_inset
is
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\sigma_{c}=\left(\frac{1+2y}{n}\right)^{1/2}.\label{eq:sigma1}
\end{equation}
\end_inset
In contrast, the expected value and standard deviation of the
\begin_inset Formula $u$
\end_inset
-metric for an incorrect codeword (randomly selected from a population of
all
\begin_inset Quotes eld
\end_inset
worst case
\begin_inset Quotes erd
\end_inset
codewords,
\emph on
i.e.
\emph default
, those with
\begin_inset Formula $k-1$
\end_inset
symbols identical to corresponding ones in the correct word) are given
by
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\bar{u}_{i}=1+\left(\frac{k-1}{n}\right)y,\label{eq:u2-exp}
\end{equation}
\end_inset
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\sigma_{i}=\frac{1}{n}\left[n+2y(k-1)\right]^{1/2},\label{eq:sigma2}
\end{equation}
\end_inset
where the subscript
\begin_inset Formula $i$
\end_inset
is an abbreviation for
\begin_inset Quotes eld
\end_inset
incorrect
\begin_inset Quotes erd
\end_inset
.
\end_layout
\begin_layout Standard
If
\begin_inset Formula $u$
\end_inset
is evaluated for a large number of candidate codewords, one of which is
correct, we should expect the largest value
\begin_inset Formula $u_{1}$
\end_inset
to be drawn from a population with statistics described by
\begin_inset Formula $\bar{u}_{c}$
\end_inset
and
\begin_inset Formula $\sigma_{c}.$
\end_inset
If no tested codeword is correct,
\begin_inset Formula $u_{1}$
\end_inset
is likely to come from the
\begin_inset Formula $(\bar{u}_{i},\,\sigma_{i})$
\end_inset
population and to be several standard deviations above the mean.
In either case the second-largest value,
\begin_inset Formula $u_{2},$
\end_inset
will likely come from the
\begin_inset Formula $(\bar{u}_{i},\,\sigma_{i})$
\end_inset
population, again several standard deviations above the mean.
If the signal-to-noise ratio
\begin_inset Formula $y$
\end_inset
is too small for decoding to be possible or the correct codeword is never
presented as a candidate, the ratio
\begin_inset Formula $r=u_{2}/u_{1}$
\end_inset
will likely be close to 1.
On the other hand, correctly identified codewords will produce
\begin_inset Formula $u_{1}$
\end_inset
significantly larger than
\begin_inset Formula $u_{2}$
\end_inset
and thus smaller values of
\begin_inset Formula $r$
\end_inset
.
We therefore apply a ratio threshold test, say
\begin_inset Formula $r<R_{1}$
\end_inset
, to identify codewords with high probability of being correct.
As described in Section
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:Theory,-Simulation,-and"
\end_inset
, we use simulations to set an empirical acceptance threshold
\begin_inset Formula $R_{1}$
\end_inset
that maximizes the probability of correct decodes while ensuring a low
rate of false positives.
\end_layout
\begin_layout Standard
As with all decoding algorithms that generate a list of possible codewords,
a stopping criterion is necessary.
FT accepts a codeword unconditionally if the Hamming distance
\begin_inset Formula $X$
\end_inset
and soft distance
\begin_inset Formula $d_{s}$
\end_inset
obey specified criteria
\begin_inset Formula $X<X_{0}$
\end_inset
and
\begin_inset Formula $d_{s}<D_{0}$
\end_inset
.
Secondary acceptance criteria
\begin_inset Formula $d_{s}<D_{1}$
\end_inset
and
\begin_inset Formula $r<R_{1}$
\end_inset
are used to validate additional codewords that fail the first test.
A timeout is used to limit execution time if no acceptable codeword is
found in a reasonable number of trials,
\begin_inset Formula $T$
\end_inset
.
Today's personal computers are fast enough that
\begin_inset Formula $T$
\end_inset
can be set as large as
\begin_inset Formula $10^{5},$
\end_inset
or even higher.
Pseudo-code for the FT algorithm is presented in an accompanying box,
\series bold
Algorithm 1
\series default
.
\end_layout
\begin_layout Standard
\begin_inset Float algorithm
wide false
sideways false
status open
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
Pseudo-code for the FT algorithm.
\end_layout
\end_inset
\end_layout
\begin_layout Enumerate
For each received symbol, define the erasure probability as 1.3 times the
\emph on
a priori
\emph default
symbol-error probability determined from soft-symbol information
\begin_inset Formula $\{p_{1}\textrm{-rank},\, p_{2}/p_{1}\}$
\end_inset
.
\end_layout
\begin_layout Enumerate
Make independent stochastic
\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "{\\bf Stochastic algorithm: }"
description "An algorithm involving chance or probability in determining the series of computational steps to be taken."
\end_inset
decisions about whether to erase each symbol by using the symbol's erasure
probability, allowing a maximum of 51 erasures.
\end_layout
\begin_layout Enumerate
Attempt errors-and-erasures decoding using the BM algorithm and the set
of erasures determined in step 2.
If the BM decoder produces a candidate codeword, go to step 5.
\end_layout
\begin_layout Enumerate
If BM decoding was not successful, go to step 2.
\end_layout
\begin_layout Enumerate
Calculate the hard-decision Hamming distance
\begin_inset Formula $X$
\end_inset
between the candidate codeword and the received symbols, along with the
corresponding soft distance
\begin_inset Formula $d_{s}$
\end_inset
and the quality metric
\begin_inset Formula $u$
\end_inset
.
\end_layout
\begin_layout Enumerate
If
\begin_inset Formula $u$
\end_inset
is the largest one encountered so far, preserve any previous value of
\begin_inset Formula $u_{1}$
\end_inset
by setting
\begin_inset Formula $u_{2}=u_{1}.$
\end_inset
Then set
\begin_inset Formula $u_{1}=u,$
\end_inset
\begin_inset Formula $d_{1}=d_{s},$
\end_inset
\begin_inset Formula $X_{1}=X,$
\end_inset
and save the codeword.
\end_layout
\begin_layout Enumerate
If
\begin_inset Formula $X_{1}<X_{0}$
\end_inset
and
\begin_inset Formula $d_{1}<D_{0}$
\end_inset
, go to step 11.
\end_layout
\begin_layout Enumerate
If the number of trials is less than the timeout limit
\begin_inset Formula $T,$
\end_inset
go to 2.
\begin_inset Formula $ $
\end_inset
\end_layout
\begin_layout Enumerate
If
\begin_inset Formula $d_{1}<D_{1}$
\end_inset
and
\begin_inset Formula $r=u_{2}/u_{1}<R_{1},$
\end_inset
go to step 11.
\end_layout
\begin_layout Enumerate
Otherwise, declare decoding failure and exit.
\end_layout
\begin_layout Enumerate
An acceptable codeword has been found.
Declare a successful decode and return the saved codeword.
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Inspiration for the FT decoding algorithm came from a number of sources.
\begin_inset CommandInset citation
LatexCommand cite
key "lc2004"
\end_inset
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
textsuperscript{,}
\end_layout
\end_inset
\begin_inset CommandInset citation
LatexCommand cite
key "lhmg2010"
\end_inset
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
textsuperscript{,}
\end_layout
\end_inset
\begin_inset CommandInset citation
LatexCommand cite
key "lk2008"
\end_inset
After developing this algorithm, we became aware that our approach is conceptua
lly similar to a stochastic, erasures-only list decoding algorithm described
in another reference
\begin_inset CommandInset citation
LatexCommand cite
key "ls2009"
\end_inset
.
That algorithm is applied to higher-rate Reed-Solomon codes on a symmetric
channel using binary phase-shift keying (BPSK).
Our 64-ary input channel with 64-FSK modulation required us to develop
unique methods for assigning erasure probabilities and for defining acceptance
criteria to select the best codeword from the list of tested candidates.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Hinted-Decoding"
\end_inset
Hinted Decoding
\end_layout
\begin_layout Standard
The FT algorithm is completely general.
With equal sensitivity it can recover any one of the
\begin_inset Formula $2^{72}\approx4.7\times10^{21}$
\end_inset
different messages that can be transmitted with the JT65 protocol.
In some circumstances it's easy to imagine a
\emph on
much
\emph default
smaller list of messages (say, a few thousand messages or less) that would
be among the most likely ones to be received.
One such favorable situation exists when making short ham-radio contacts
that exchange minimal information including callsigns, signal reports,
perhaps Maidenhead locators, and acknowledgments.
On the EME path or a VHF or UHF band with limited geographical coverage,
the most common received messages frequently originate from callsigns that
have been decoded before.
Saving a list of previously decoded callsigns and associated locators makes
it easy to generate a list of hypothetical messages and their corresponding
codewords at very little computational expense.
The resulting candidate codewords can be tested in almost the same way
as those generated by the probabilistic method described in Section
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:The-decoding-algorithm"
\end_inset
.
We call this approach
\begin_inset Quotes eld
\end_inset
hinted decoding;
\begin_inset Quotes erd
\end_inset
it is sometimes referred to as the
\emph on
Deep Search
\emph default
algorithm.
In certain limited situations it can provide enhanced sensitivity for the
principal task of any decoder, namely to determine precisely what message
was sent.
\end_layout
\begin_layout Standard
For hinted decoding we again invoke a ratio threshold test, but in this
case we use it to answer a more limited question.
Over the full list of messages considered likely, we want to know whether
a suitable metric can distinguish with confidence between the one correct
codeword and all others in the generated list --- or, alternatively, to
determine that the correct codeword is
\emph on
not
\emph default
contained in the list.
We again find that the most effective metric involves a comparison of
\begin_inset Formula $u_{1}$
\end_inset
and
\begin_inset Formula $u_{2},$
\end_inset
the largest and second-largest values of total signal-plus-noise power
among all the tested codewords.
The criterion for comparison is chosen empirically to maximize the number
of correct decodes while ensuring that false decodes are rare.
Because tested candidate codewords are drawn from a list typically no longer
than a few thousand entries, rather than
\begin_inset Formula $2^{72},$
\end_inset
the limit can can be more relaxed than that used in the FT algorithm.
Thus, for the limited subset of messages suggested by previous experience
to be likely, hinted decodes can be obtained at lower signal levels than
required for the full universe of
\begin_inset Formula $2^{72}$
\end_inset
possible messages.
Pseudo-code for the hinted-decoding algorithm is presented as
\series bold
Algorithm 2
\series default
.
\end_layout
\begin_layout Standard
\begin_inset Float algorithm
wide false
sideways false
status open
\begin_layout Plain Layout
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
Pseudo-code for hinted decoding
\end_layout
\end_inset
\end_layout
\begin_layout Enumerate
Generate a list of
\begin_inset Formula $L$
\end_inset
codewords considered likely to be received.
Set a pointer to the start of this list.
\end_layout
\begin_layout Enumerate
Fetch the next candidate codeword and calculate its metric
\begin_inset Formula $u.$
\end_inset
\end_layout
\begin_layout Enumerate
If
\begin_inset Formula $u$
\end_inset
is the largest metric encountered so far, preserve any previous value of
\begin_inset Formula $u_{1}$
\end_inset
by setting
\begin_inset Formula $u_{2}=u_{1}.$
\end_inset
Then set
\begin_inset Formula $u_{1}=u$
\end_inset
and save the codeword.
\end_layout
\begin_layout Enumerate
If the number of tested codewords is less than
\begin_inset Formula $L,$
\end_inset
go to step 2.
\end_layout
\begin_layout Enumerate
If
\begin_inset Formula $r=u_{2}/u_{1}<R_{2},$
\end_inset
go to step 7.
\end_layout
\begin_layout Enumerate
Otherwise, declare decoding failure and exit.
\end_layout
\begin_layout Enumerate
An acceptable codeword has been found.
Declare a successful result and return the codeword and the value
\begin_inset Formula $q=100\,(u_{1}-bu_{2})$
\end_inset
as a confidence indicator.
(By default we use the value
\begin_inset Formula $b=1.12$
\end_inset
for submode JT65A.)
\end_layout
\end_inset
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Theory,-Simulation,-and"
\end_inset
Decoder Performance Evaluation
\end_layout
\begin_layout Standard
Comparisons of decoding performance are usually presented in the professional
literature as plots of word error rate versus
\begin_inset Formula $E_{b}/N_{0}$
\end_inset
, the ratio of the energy collected per information bit to the one-sided
noise power spectral density.
For weak-signal amateur radio work, performance is more usefully presented
as the probability of successfully decoding a received word plotted against
\begin_inset Formula $\mathrm{SNR}{}_{2500}$
\end_inset
, the signal-to-noise ratio in a 2500 Hz reference bandwidth.
The relationship between
\begin_inset Formula $E_{b}/N_{0}$
\end_inset
and
\begin_inset Formula $\mathrm{SNR}{}_{2500}$
\end_inset
is described in Appendix
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:Appendix:SNR"
\end_inset
.
Examples of both types of plot are included in the following discussion,
where we describe simulations carried out to compare performance of the
FT algorithm and hinted decoding with other algorithms and with theoretical
expectations.
We have also used simulations to establish suitable default values for
the acceptance parameters
\begin_inset Formula $X_{0},$
\end_inset
\begin_inset Formula $D_{0},$
\end_inset
\begin_inset Formula $D_{1},$
\end_inset
\begin_inset Formula $R_{1},$
\end_inset
and
\begin_inset Formula $R_{2}.$
\end_inset
\end_layout
\begin_layout Subsection
Simulated results on the AWGN channel
\end_layout
\begin_layout Standard
Results of simulations using the BM, KV, and FT decoding algorithms on the
JT65 code are presented in terms of word error rate versus
\begin_inset Formula $E_{b}/N_{0}$
\end_inset
in Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
.
For these tests we generated at least 1000 signals at each signal-to-noise
ratio, assuming the additive white gaussian noise (AWGN) channel, and we
processed the data using each algorithm.
For word error rates less than 0.1 it was necessary to process 10,000 or
even 100,000 simulated signals in order to capture enough errors to make
the measurements statistically meaningful.
As a test of the fidelity of our numerical simulations, Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
also shows results calculated from theoretical probability distributions
for comparison with the BM results.
The simulated BM results agree with theory to within about 0.1 dB.
The differences are caused by small errors in the estimates of time and
frequency offset of the received signal in the simulated data.
Such
\begin_inset Quotes eld
\end_inset
sync losses
\begin_inset Quotes erd
\end_inset
are not accounted for in the idealized theoretical results.
\end_layout
\begin_layout Standard
As expected, on the AWGN channel the soft-decision algorithms FT and KV
are about 2 dB better than the hard-decision BM algorithm.
In addition, FT has an edge over KV that increases from about 0.2 dB at
higher SNRs to nearly 0.5 dB at low SNR.
With timeout parameter
\begin_inset Formula $T=10^{5}$
\end_inset
execution time for FT is longer than that for the KV algorithm, but still
small enough to be fully practical on today's home computers.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_bodide.pdf
\end_inset
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:bodide"
\end_inset
Word error rates as a function of
\begin_inset Formula $E_{b}/N_{0},$
\end_inset
the signal-to-noise ratio per information bit.
The curve labeled
\emph on
Theory
\emph default
shows a theoretical prediction for the hard-decision BM decoder.
Remaining curves represent simulation results on an AWGN channel for the
BM, KV, and FT decoders.
The KV algorithm was executed with complexity coefficient
\begin_inset Formula $\lambda=15$
\end_inset
, the most aggressive setting historically used in the
\emph on
WSJT
\emph default
programs.
The FT algorithm used timeout setting
\begin_inset Formula $T=10^{5}.$
\end_inset
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Error-free transmission is important in commercial applications, so plots
like Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
are often extended downward to error rates of
\begin_inset Formula $10^{-6}$
\end_inset
or even less.
The circumstances for minimal amateur-radio contacts are very different,
however.
Decoding failure rates of order 0.1 or higher may be perfectly acceptable:
they simply require repeat transmissions.
In such circumstances the essential information is more usefully presented
in a plot showing the percentage of transmissions copied correctly as a
function of signal-to-noise ratio.
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:WER2"
\end_inset
shows the FT and KV results from Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
in this format, along with additional FT results for
\begin_inset Formula $T=10^{4},\:10^{3},\:10^{2}$
\end_inset
and
\begin_inset Formula $10$
\end_inset
.
It's easy to see that the FT decoder produces more decodes than KV when
\begin_inset Formula $T\gtrsim3000$
\end_inset
.
As already noted in connection with Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:bodide"
\end_inset
, FT with
\begin_inset Formula $T=10^{5}$
\end_inset
has approximately
\begin_inset Formula $0.5$
\end_inset
dB gain over KV at low SNR.
It also provides very significant gains over the hard-decision BM decoder,
even when limited to very small
\begin_inset Formula $T$
\end_inset
.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_wer2.pdf
lyxscale 120
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:WER2"
\end_inset
Percent of JT65 messages copied as a function of
\begin_inset Formula $\mathrm{SNR}{}_{2500},$
\end_inset
assuming additive white gaussian noise and no fading.
Numbers adjacent to curves specify values of timeout parameter
\begin_inset Formula $T$
\end_inset
for the FT decoder.
Open circles and dotted line show results for the KV decoder with complexity
coefficient
\begin_inset Formula $\lambda=15$
\end_inset
.
Results for the BM algorithm are plotted with crosses and dashed line.
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Parameter
\begin_inset Formula $T$
\end_inset
in the FT algorithm is the maximum number of symbol-erasure trials allowed
for a particular attempt at decoding a received word.
Most successful decodes take only a small fraction of the maximum allowed
number of trials.
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:N_vs_X"
\end_inset
shows the number of stochastic erasure trials required to find the correct
codeword plotted as a function of
\begin_inset Formula $X,$
\end_inset
the number of hard-decision errors in the received word.
This test run used 1000 simulated transmissions at
\begin_inset Formula $\mathrm{SNR_{2500}}=-24$
\end_inset
dB, just slightly above the decoding threshold, with timeout parameter
\begin_inset Formula $T=10^{5}.$
\end_inset
No points are shown for
\begin_inset Formula $X\le25$
\end_inset
because all such words are successfully decoded by a single run of the
errors-only BM algorithm.
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:N_vs_X"
\end_inset
shows that the FT algorithm decodes received words with as many as
\begin_inset Formula $X=43$
\end_inset
symbol errors.
It also shows that the average number of trials increases with the number
of errors in the received word.
The variability of decoding time also increases dramatically with the number
of errors in the received word.
These results provide insight into the mean and variance of execution time
for the FT algorithm, since execution time is roughly proportional to the
number of required erasure trials.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_ntrials_vs_nhard.pdf
lyxscale 120
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:N_vs_X"
\end_inset
Number of trials needed to decode a received word
\emph on
vs.
\emph default
Hamming distance
\begin_inset Formula $X$
\end_inset
between received word and decoded codeword.
We used 1000 simulated transmissions on an AWGN channel with no fading.
The signal-to-noise ratio was
\begin_inset Formula $\mathrm{SNR}{}_{2500}=-24$
\end_inset
dB, or
\begin_inset Formula $E_{b}/N_{o}=5.1$
\end_inset
dB.
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Subsection
Simulated results for Rayleigh fading and hinted decoding
\end_layout
\begin_layout Standard
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:Psuccess"
\end_inset
presents the results of simulations for signal-to-noise ratios ranging
from
\begin_inset Formula $-18$
\end_inset
to
\begin_inset Formula $-30$
\end_inset
dB, again using 1000 simulated signals for each plotted point.
We include three curves for each decoding algorithm: one for the AWGN channel
and no fading, and two more for simulated Doppler spreads of 0.2 and 1.0
Hz.
These simulated Doppler spreads are comparable to those encountered on
HF ionospheric paths and also for EME at VHF and the lower UHF bands.
For comparison we note that the JT65 symbol rate is about
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
linebreak
\end_layout
\end_inset
2.7 Hz.
It is interesting to note that while Rayleigh fading severely degrades
the success rate of the BM decoder, the penalties are much smaller with
both FT and Deep Search (DS) decoding.
Simulated Doppler spreads of 0.2 Hz actually increased the FT decoding rate
slightly at SNRs close to the decoding threshold, presumably because with
the low-rate JT65 code, signal peaks provide the information needed for
good copy.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_psuccess.pdf
lyxscale 90
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:Psuccess"
\end_inset
Percentage of JT65 messages successfully decoded as a function of
\begin_inset Formula $\mathrm{SNR}{}_{2500}$
\end_inset
.
Results are shown for the hard-decision Berlekamp-Massey (BM) and soft-decision
Franke-Taylor (FT) decoding algorithms.
Curves labeled DS correspond to the hinted-decode (Deep Search) algorithm
with a codeword list of length
\begin_inset Formula $L\,$
\end_inset
= 5850.
Numbers adjacent to the curves are simulated Doppler spreads in Hz.
In the current version of
\emph on
WSJT-X
\emph default
the performance of the DS algorithm is limited by synchronization failures
when
\begin_inset Formula $SNR\lesssim-28\,\textrm{dB}$
\end_inset
.
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Section
On-the-air Experience
\end_layout
\begin_layout Standard
The JT65 protocol has proven remarkably versatile.
Today the mode is used by thousands of amateurs around the world, communicating
over terrestrial paths on the MF and HF bands and over terrestrial as well
as EME paths from 50 MHz through 10 GHz.
Three
\emph on
submodes
\emph default
are in use, together accommodating a wide range of Doppler spreads and
potential instrumental instabilities.
All three submodes transmit the 63 data symbols interspersed with 63 synchroniz
ation symbols at keying rate 11025/4096 = 2.69 baud.
Submode JT65A uses tone spacing equal to the symbol rate; its total occupied
bandwidth is
\begin_inset Formula $66\times2.69=177.6$
\end_inset
Hz.
Submodes B and C have tone spacings and occupied bandwidths 2 and 4 times
larger.
In practice JT65A is generally used at 50 MHz and below, JT65B on 144 through
432 MHz, and JT65C at 1296 MHz and above.
\end_layout
\begin_layout Standard
Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:JT65B_EME"
\end_inset
shows portions of the main window and spectrogram displays from program
\emph on
WSJT-X,
\emph default
illustrating replies to a CQ from K1JT on 144.118 MHz using submode JT65B
on the EME path.
Speckled vertical lines on the waterfall at 1494 and 1515 Hz are the synchroniz
ing tones of signals from DL7UAE and SP6GWB.
Other visible speckles (barely above the noise) up to about 1870 Hz are
some of the data tones from these two stations.
Two lines of decoded text show that the estimated average signal strengths
were
\begin_inset Formula $\mathrm{SNR}{}_{2500}=-23$
\end_inset
and
\begin_inset Formula $-24$
\end_inset
dB, respectively, just one or two dB above decoding threshold for the FT
decoder.
Note that the two signals overlap throughout more than 90% of their occupied
bandwidths, yet both are decoded cleanly and without errors.
Such behavior is typical of the JT65 protocol.
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename JT65B_EME.png
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:JT65B_EME"
\end_inset
Examples of JT65B EME signals recorded at K1JT.
Numbers above the spectrogram are audio frequencies in Hz, and the spectrogram'
s vertical span is one minute of time.
The horizontal green bar on the frequency axis indicates the bandwidth
occupied by the second decoded signal, a reply from SP6GWB.
See text for additional details.
\end_layout
\end_inset
\end_layout
\end_inset
\end_layout
\begin_layout Standard
As another example, Figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:spectrogram"
\end_inset
shows activity in submode JT65A during a single minute on the 20 m amateur
band.
At this time the band was crowded with overlapping signals.
With care you can count at least 19 distinct synchronizing tones (the speckled
vertical lines in the figure), and can see as many as four signals overlapping
in some places.
After signal processing to demodulate the signals and produce soft-symbol
data for the FT decoder, program
\emph on
WSJT-X
\emph default
extracts and decodes 21 error-free messages from this recorded data segment.
This result is achieved with a relatively small timeout parameter,
\begin_inset Formula $T=1000.$
\end_inset
For these results the decoder uses two successive sweeps over the spectrum.
The strongest signals (12 in this example) are sequentially decoded and
subtracted from the raw data after the first pass.
Another 9 signals are decoded in the second pass.
For comparison, the hard-decision BM decoder decodes only 12 messages from
this recording (9 in the first pass and 3 more in a second).
\end_layout
\begin_layout Standard
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename fig_waterfall.png
scale 60
BoundingBox 0bp 0bp 1124bp 200bp
clip
\end_inset
\end_layout
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
\begin_inset CommandInset label
LatexCommand label
name "fig:spectrogram"
\end_inset
Spectrogram from
\emph on
WSJT-X
\emph default
showing one minute of data collected under crowded band conditions on the
20 m band.
Numbers on the scale are frequencies (in Hz) above 14.076 MHz.
\end_layout
\end_inset
\end_layout
\begin_layout Plain Layout
\end_layout
\end_inset
\end_layout
\begin_layout Standard
Our implementation of the FT decoder, written in a combination of FORTRAN
and C, is freely available as open-source code
\begin_inset CommandInset citation
LatexCommand cite
key "wsjt_sourceforge"
\end_inset
.
For the Berlekamp-Massey part of the algorithm we use routines written
by Phil Karn, KA9Q
\begin_inset CommandInset citation
LatexCommand cite
key "karn"
\end_inset
, modified slightly so that the Reed-Solomon syndromes are computed only
once in our most time-consuming loop (steps 2 through 8,
\series bold
Algorithm 1
\series default
).
The FT algorithm has become an integral part of programs
\emph on
WSJT,
\emph default
\emph on
MAP65,
\emph default
and
\emph on
WSJT-X
\emph default
.
Improvement in sensitivity over the Kötter-Vardy decoder is small, only
a few tenths of a dB, but especially on the EME path such small advantages
are sometimes very important.
Perhaps even more essential, programs in the
\emph on
WSJT
\emph default
family are now entirely open source.
We no longer need to use the patented KV algorithm or the specially licensed
executable program
\family typewriter
kvasd[.exe]
\family default
.
\end_layout
\begin_layout Section
Acknowledgments
\end_layout
\begin_layout Standard
We thank G3WDG, G4WJS, KD9DSW, PY2SDR, SM5BSZ, VK7MO, and W3SZ for helpful
comments on an earlier version of this paper.
\end_layout
\begin_layout Standard
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
newpage
\end_layout
\end_inset
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "1"
key "jt65_protocol"
\end_inset
“The JT65 Communications Protocol”, J.
Taylor, K1JT,
\emph on
QEX
\emph default
, September-October 2005, pp.
3-12.
Available also at http://physics.princeton.edu/pulsar/K1JT/JT65.pdf.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "2"
key "kv2001"
\end_inset
“Algebraic soft-decision decoding of Reed-Solomon codes,” R.
Kötter and A.
Vardy,
\emph on
IEEE Transactions on Information Theory
\emph default
, Vol.
49, pp.
2809-2825, 2003.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "3"
key "wsjt"
\end_inset
\emph on
WSJT Home Page
\emph default
: http://www.physics.princeton.edu/pulsar/K1JT/.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "4"
key "lc2004"
\end_inset
\emph on
Error Control Coding, 2nd Edition
\emph default
, Shu Lin and Daniel J.
Costello, Pearson-Prentice Hall, 2004.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "5"
key "lhmg2010"
\end_inset
``Stochastic Chase Decoding of Reed-Solomon Codes'', Camille Leroux, Saied
Hemati, Shie Mannor, Warren J.
Gross,
\emph on
IEEE Communications Letters
\emph default
, Vol.
14, No.
9, pp.
863-865, 2010.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "6"
key "lk2008"
\end_inset
``Soft-Decision Decoding of Reed-Solomon Codes Using Successive Error-and-Erasur
e Decoding,'' Soo-Woong Lee and B.
V.
K.
Vijaya Kumar,
\emph on
IEEE
\begin_inset Quotes eld
\end_inset
GLOBECOM
\begin_inset Quotes erd
\end_inset
Proceedings,
\emph default
2008
\emph on
.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "7"
key "ls2009"
\end_inset
``Stochastic Erasure-Only List Decoding Algorithms for Reed-Solomon Codes,
\begin_inset Quotes erd
\end_inset
Chang-Ming Lee and Yu T.
Su,
\emph on
IEEE Signal Processing Letters,
\emph default
Vol.
16, pp.
691-694, 2009.
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "8"
key "wsjt_sourceforge"
\end_inset
Source code for all programs in the
\emph on
WSJT
\emph default
project is stored in a Subversion repository at Sourceforge: https://sourceforg
e.net/projects/wsjt/
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
label "9"
key "karn"
\end_inset
Errors-and erasures decoder for the Berlekamp-Massey algorithm written by
Phil Karn, KA9Q: http://www.ka9q.net/code/fec/
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
newpage
\end_layout
\end_inset
\end_layout
\begin_layout Section
\start_of_appendix
\begin_inset CommandInset label
LatexCommand label
name "sec:Appendix:SNR"
\end_inset
Appendix: Signal to Noise Ratios
\end_layout
\begin_layout Standard
The signal to noise ratio in a bandwidth,
\begin_inset Formula $B$
\end_inset
, that is at least as large as the bandwidth occupied by the signal is:
\begin_inset Formula
\begin{equation}
\mathrm{SNR}_{B}=\frac{P_{s}}{N_{0}B}\label{eq:SNR}
\end{equation}
\end_inset
where
\begin_inset Formula $P_{s}$
\end_inset
is the average signal power (W),
\begin_inset Formula $N_{0}$
\end_inset
is one-sided noise power spectral density (W/Hz), and
\begin_inset Formula $B$
\end_inset
is the bandwidth in Hz.
In amateur radio applications, digital modes are often compared based on
the SNR defined in a 2.5 kHz reference bandwidth,
\begin_inset Formula $\mathrm{SNR}_{2500}$
\end_inset
.
\end_layout
\begin_layout Standard
In the professional literature, decoder performance is characterized in
terms of
\begin_inset Formula $E_{b}/N_{0}$
\end_inset
, the ratio of the energy collected per information bit,
\begin_inset Formula $E_{b}$
\end_inset
, to the one-sided noise power spectral density,
\begin_inset Formula $N_{0}$
\end_inset
.
Denote the duration of a channel symbol by
\begin_inset Formula $\tau_{s}$
\end_inset
(for JT65,
\begin_inset Formula $\tau_{s}=0.3715\,\mathrm{s}$
\end_inset
).
JT65 signals have constant envelope, so the average signal power is related
to the energy per symbol,
\begin_inset Formula $E_{s}$
\end_inset
, by
\begin_inset Formula
\begin{equation}
P_{s}=E_{s}/\tau_{s}.\label{eq:signal_power}
\end{equation}
\end_inset
The total energy in a received JT65 message consisting of
\begin_inset Formula $n=63$
\end_inset
channel symbols is
\begin_inset Formula $63E_{s}$
\end_inset
.
The energy collected for each of the 72 bits of information conveyed by
the message is then
\begin_inset Formula
\begin{equation}
E_{b}=\frac{63E_{s}}{72}=0.875E_{s.}\label{eq:Eb_Es}
\end{equation}
\end_inset
Using equations (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:SNR"
\end_inset
)-(
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:Eb_Es"
\end_inset
),
\begin_inset Formula $\mathrm{SNR}_{2500}$
\end_inset
can be written in terms of
\begin_inset Formula $E_{b}/N_{o}$
\end_inset
:
\begin_inset Formula
\begin{equation}
\mathrm{SNR}_{2500}=1.23\times10^{-3}\frac{E_{b}}{N_{0}}.\label{eq:SNR2500}
\end{equation}
\end_inset
If all quantities are expressed in dB, then:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{equation}
\mathrm{SNR}_{2500}=(E_{b}/N_{0})_{\mathrm{dB}}-29.1\,\mathrm{dB}=(E_{s}/N_{0})_{\mathrm{dB}}-29.7\,\mathrm{dB}.\label{eq:SNR_all_types}
\end{equation}
\end_inset
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
newpage
\end_layout
\end_inset
\begin_inset Box Doublebox
position "t"
hor_pos "c"
has_inner_box 1
inner_pos "t"
use_parbox 0
use_makebox 0
width "100col%"
special "none"
height "1in"
height_special "totalheight"
status open
\begin_layout Paragraph
\size large
Sidebar: JT65 Message Processing
\end_layout
\begin_layout Enumerate
User A enters or selects message consistent with formatting rules of JT65.
\end_layout
\begin_layout Enumerate
Transmitting software at A: compress message into 12 six-bit symbols, then
add 51 six-bit parity symbols.
\end_layout
\begin_layout Enumerate
Intersperse 63 synchronizing symbols among the 63 information-carrying symbols.
\end_layout
\begin_layout Enumerate
Start transmission 1 s into a UTC minute.
Transmit each symbol value at a distinct frequency.
\end_layout
\begin_layout Enumerate
Signal propagates from A to B, arriving much weaker and corrupted by noise,
fading, and Doppler spread.
\end_layout
\begin_layout Enumerate
Receiving software at B: remove impulsive noise; detect synchronizing signal,
measure its frequency and time offset.
\end_layout
\begin_layout Enumerate
Shift spectrum to put sync tone at zero frequency, correcting for any measured
drift.
\end_layout
\begin_layout Enumerate
Compute binned power spectra
\begin_inset Formula $S(i,j)$
\end_inset
for all information symbols.
(Index
\begin_inset Formula $i$
\end_inset
runs over 64 possible symbol values, index
\begin_inset Formula $j$
\end_inset
over 63 symbol numbers.)
\end_layout
\begin_layout Enumerate
Remove any possible spurs (signal appearing at same
\begin_inset Formula $i$
\end_inset
for all
\begin_inset Formula $j$
\end_inset
).
\end_layout
\begin_layout Enumerate
Apply Algorithm 1, the FT algorithm.
\end_layout
\begin_layout Enumerate
Optional: if FT decoding was unsuccessful apply Algorithm 2, hinted decoding.
\end_layout
\begin_layout Enumerate
Display decoded message for User B.
\end_layout
\end_inset
\begin_inset ERT
status open
\begin_layout Plain Layout
\backslash
newpage
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\begin_inset CommandInset nomencl_print
LatexCommand printnomenclature
set_width "auto"
\end_inset
\end_layout
\end_body
\end_document