WSJT-X/lib/sfrsd2/sfrsd_paper/sfrsd.lyx

#LyX 2.1 created this file. For more info see http://www.lyx.org/
\lyxformat 474
\begin_document
\begin_header
\textclass paper
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman default
\font_sans default
\font_typewriter default
\font_math auto
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry true
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 1in
\topmargin 1in
\rightmargin 1in
\bottommargin 1in
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header

\begin_body

\begin_layout Title
A stochastic successive erasures soft-decision decoder for the JT65 (63,12)
 Reed-Solomon code
\end_layout

\begin_layout Author
Steven J.
 Franke, K9AN and Joseph H.
 Taylor, K1JT
\end_layout

\begin_layout Abstract
The JT65 mode has revolutionized amateur-radio weak-signal communication
 by enabling amateur radio operators with small antennas and relatively
 low-power transmitters to communicate over propagation paths that could
 not be utilized using traditional technologies.
 One reason for the success and popularity of the JT65 mode is its use of
 strong error-correction coding.
 The JT65 code is a short block-length, low-rate, Reed-Solomon code based
 on a 64-symbol alphabet.
 Since 200?, decoders for the JT65 code have used the 
\begin_inset Quotes eld
\end_inset

Koetter-Vardy
\begin_inset Quotes erd
\end_inset

 (KV) algebraic soft-decision decoder.
 The KV decoder is implemented in a closed-source program that is licensed
 to K1JT for use in amateur applications.
 This note describes a new open-source alternative to the KV decoder called
 the SFRSD decoder.
 The SFRSD decoding algorithm is shown to perform at least as well as the
 KV decoder.
 The SFRSD algorithm is conceptually simple and is built around the well-known
 Berlekamp-Massey errors-and-erasures decoder.
 
\end_layout

\begin_layout Section
Introduction
\end_layout

\begin_layout Standard
JT65 message frames consist of a short, compressed, message that is encoded
 for transmission using a Reed-Solomon code.
 Reed-Solomon codes are block codes and, like all block codes, are characterized
 by the length of their codewords, 
\begin_inset Formula $n$
\end_inset

, the number of message symbols conveyed by the codeword, 
\begin_inset Formula $k$
\end_inset

, and the number of possible values for each symbol in the codewords.
 The codeword length and the number of message symbols are specified as
 a tuple in the form 
\begin_inset Formula $(n,k)$
\end_inset

.
 JT65 uses a (63,12) Reed-Solomon code with 64 possible values for each
 symbol, so each symbol represents 
\begin_inset Formula $\log_{2}64=6$
\end_inset

 message bits.
 The source-encoded messages conveyed by a 63-symbol JT65 frame consist
 of 72 bits.
 The JT65 code is systematic, which means that the 12 message symbols are
 embedded in the codeword without modification and another 51 parity symbols
 derived from the message symbols are added to form the codeword consisting
 of 63 total symbols.
 
\end_layout

\begin_layout Standard
The concept of Hamming distance is used as a measure of 
\begin_inset Quotes eld
\end_inset

distance
\begin_inset Quotes erd
\end_inset

 between different codewords, or between a received word and a codeword.
 Hamming distance is the number of code symbols that differ in the two words
 that are being compared.
 Reed-Solomon codes have minimum Hamming distance 
\begin_inset Formula $d$
\end_inset

, where 
\begin_inset Formula 
\begin{equation}
d=n-k+1.\label{eq:minimum_distance}
\end{equation}

\end_inset

The minimum Hamming distance of the JT65 code is 
\begin_inset Formula $d=52$
\end_inset

, which means that any particular codeword differs from all other codewords
 in at least 52 positions.
 
\end_layout

\begin_layout Standard
Given only a received word containing some incorrect symbols (errors), the
 received word can be decoded into the correct codeword using a deterministic,
 algebraic, algorithm provided that no more than 
\begin_inset Formula $t$
\end_inset

 symbols were received incorrectly, where
\begin_inset Formula 
\begin{equation}
t=\left\lfloor \frac{n-k}{2}\right\rfloor .\label{eq:t}
\end{equation}

\end_inset

For the JT65 code, 
\begin_inset Formula $t=25$
\end_inset

, which means that it is always possible to efficiently decode a received
 word that contains no more than 25 symbol errors.
 
\end_layout

\begin_layout Standard
There are a number of well-known algebraic algorithms that can carry out
 the process of decoding a received codeword that contains no more than
 
\begin_inset Formula $t$
\end_inset

 errors.
 One such algorithm is the Berlekamp-Massey (BM) decoding algorithm.
\end_layout

\begin_layout Standard
A decoder, such as BM, must carry out two tasks: 
\end_layout

\begin_layout Enumerate
determine which symbols were received incorrectly 
\end_layout

\begin_layout Enumerate
determine the correct value of the incorrect symbols 
\end_layout

\begin_layout Standard
If it is somehow known that certain symbols are incorrect, such information
 can be used in the decoding algorithm to reduce the amount of work required
 in step 1 and to allow step 2 to correct more than 
\begin_inset Formula $t$
\end_inset

 errors.
 In fact, in the unlikely event that the location of each and every error
 is known and is provided to the BM decoder, and if no correct symbols are
 accidentally labeled as errors, then the BM decoder can correct up to 
\begin_inset Formula $d$
\end_inset

 errors! 
\end_layout

\begin_layout Standard
In the decoding algorithm described herein, a list of symbols that are known
 or suspected to be incorrect is sent to the BM decoder.
 Symbols in the received word that are flagged as being incorrect are called
 
\begin_inset Quotes eld
\end_inset

erasures
\begin_inset Quotes erd
\end_inset

.
 Symbols that are not erased and that are incorrect will be called 
\begin_inset Quotes eld
\end_inset

errors
\begin_inset Quotes erd
\end_inset

.
 The BM decoder accepts erasure information in the form of a list of indices
 corresponding to the incorrect, or suspected incorrect, symbols in the
 received word.
 As already noted, if the erasure information is perfect, then up to 51
 errors will be corrected.
 When the erasure information is imperfect, then some of the erased symbols
 will actually be correct, and some of the unerased symbols will be in error.
 If a total of 
\begin_inset Formula $n_{e}$
\end_inset

 symbols are erased and the remaining unerased symbols contain 
\begin_inset Formula $x$
\end_inset

 errors, then the BM algorithm can find the correct codeword as long as
 
\begin_inset Formula 
\begin{equation}
n_{e}+2x\le d-1\label{eq:erasures_and_errors}
\end{equation}

\end_inset

If 
\begin_inset Formula $n_{e}=0$
\end_inset

, then the decoder is said to be an 
\begin_inset Quotes eld
\end_inset

errors-only
\begin_inset Quotes erd
\end_inset

 decoder and it can correct up to 
\begin_inset Formula $t$
\end_inset

 errors (
\begin_inset Formula $t$
\end_inset

=25 for JT65).
 If 
\begin_inset Formula $0<n_{e}\le d-1$
\end_inset

 (
\begin_inset Formula $d-1=51$
\end_inset

 for JT65), then the decoder is said to be an 
\begin_inset Quotes eld
\end_inset

errors-and-erasures
\begin_inset Quotes erd
\end_inset

 decoder.
 
\end_layout

\begin_layout Standard
For the JT65 code, (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"

\end_inset

) says that if 
\begin_inset Formula $n_{e}$
\end_inset

 symbols are declared to be erased, then the BM decoder will find the correct
 codeword as long as the remaining un-erased symbols contain no more than
 
\begin_inset Formula $\left\lfloor \frac{51-n_{e}}{2}\right\rfloor $
\end_inset

 errors.
 The errors-and-erasures capability of the BM decoder is a very powerful
 feature that serves as the core of the new soft-decision decoder described
 herein.
 
\end_layout

\begin_layout Standard
It will be helpful to have some understanding of the errors and erasures
 tradeoff described by (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"

\end_inset

) to appreciate how the new decoder algorithm works.
 Section NN describes some examples that illustrate ho w the errors-and-erasures
 capability can be combined with some information about the quality of the
 received symbols to enable a decoding algorithm to reliably decode received
 words that contain many more than 25 errors.
 Section NN describes the SFRSD decoding algorithm.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:You've-got-to"

\end_inset

You've got to ask yourself.
 Do I feel lucky?
\end_layout

\begin_layout Standard
Consider a particular received codeword that contains 40 incorrect symbols
 and 23 correct symbols.
 It is not known which 40 symbols are in error
\begin_inset Foot
status open

\begin_layout Plain Layout
In practice the number of errors will not be known either, but this is not
 a serious problem.
\end_layout

\end_inset

.
 Suppose that the decoder randomly chooses 40 symbols to erase (
\begin_inset Formula $n_{e}=40$
\end_inset

), leaving 23 unerased symbols.
 According to (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"

\end_inset

), the BM decoder can successfully decode this word as long as the number
 of errors, 
\begin_inset Formula $x$
\end_inset

, present in the 23 unerased symbols is 5 or less.
 This means that the number of errors captured in the set of 40 erased symbols
 must be at least 35.
 
\end_layout

\begin_layout Standard
The probability of selecting some particular number of bad symbols in a
 randomly selected subset of the codeword symbols is governed by the hypergeomet
ric probability distribution.
\end_layout

\begin_layout Standard
Define:
\end_layout

\begin_layout Itemize
\begin_inset Formula $n$
\end_inset

= number of symbols in a codeword (63 for JT65),
\end_layout

\begin_layout Itemize
\begin_inset Formula $X$
\end_inset

= number of incorrect symbols in a codeword,
\end_layout

\begin_layout Itemize
\begin_inset Formula $n_{e}$
\end_inset

= number of symbols erased for errors-and-erasures decoding,
\end_layout

\begin_layout Itemize
\begin_inset Formula $x$
\end_inset

= number of incorrect symbols in the set of erased symbols.
\end_layout

\begin_layout Standard
In an ensemble of received words, 
\begin_inset Formula $X$
\end_inset

 and 
\begin_inset Formula $x$
\end_inset

 will be random variables.
 Let 
\begin_inset Formula $P(x|(X,n_{e}))$
\end_inset

 denote the conditional probability mass function for the number of incorrect
 symbols, 
\begin_inset Formula $x$
\end_inset

, given that the number of incorrect symbols in the codeword is X and the
 number of erased symbols is 
\begin_inset Formula $n_{e}$
\end_inset

.
 Then
\end_layout

\begin_layout Standard
\begin_inset Formula 
\begin{equation}
P(x|(X,n_{e}))=\frac{\binom{X}{x}\binom{n-X}{n_{e}-x}}{\binom{n}{n_{e}}}\label{eq:hypergeometric_pdf}
\end{equation}

\end_inset

where 
\begin_inset Formula $\binom{n}{k}=\frac{n!}{k!(n-k)!}$
\end_inset

 is the binomial coefficient.
 The binomial coefficient can be calculated using the 
\begin_inset Quotes eld
\end_inset

nchoosek(
\begin_inset Formula $n,k$
\end_inset

)
\begin_inset Quotes erd
\end_inset

 function in Gnu Octave.
 The hypergeometric probability mass function defined in (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:hypergeometric_pdf"

\end_inset

) is available in Gnu Octave as function 
\begin_inset Quotes eld
\end_inset

hygepdf(
\begin_inset Formula $x,n,X,n_{e}$
\end_inset

)
\begin_inset Quotes erd
\end_inset

.
 
\end_layout

\begin_layout Paragraph
Case 1
\end_layout

\begin_layout Case
A codeword contains 
\begin_inset Formula $X=40$
\end_inset

 incorrect symbols.
 In an attempt to decode using an errors-and-erasures decoder, 
\begin_inset Formula $n_{e}=40$
\end_inset

 symbols are randomly selected for erasure.
 The probability that 
\begin_inset Formula $35$
\end_inset

 of the erased symbols are incorrect is:
\begin_inset Formula 
\[
P(x=35)=\frac{\binom{40}{35}\binom{63-40}{40-35}}{\binom{63}{40}}=2.356\times10^{-7}.
\]

\end_inset

Similarly:
\begin_inset Formula 
\[
P(x=36)=8.610\times10^{-9}.
\]

\end_inset

Since the probability of catching 36 errors is so much smaller than the
 probability of catching 35 errors, it is safe to say that the probability
 of randomly selecting an erasure vector that can decode the received word
 is essentially equal to 
\begin_inset Formula $P(X=35)\simeq2.4\times10^{-7}$
\end_inset

.
 The odds of successfully decoding the word on the first try are about 1
 in 4 million.
\end_layout

\begin_layout Paragraph
Case 2
\end_layout

\begin_layout Case
It is interesting to work out the best choice for the number of symbols
 that should be selected at random for erasure if the goal is to maximize
 the probability of successfully decoding the word.
 By exhaustive search, it turns out that if 
\begin_inset Formula $X=40$
\end_inset

, then the best strategy is to erase 
\begin_inset Formula $n=45$
\end_inset

 symbols, in which case the word will be decoded if the set of erased symbols
 contains at least 37 errors.
 With 
\begin_inset Formula $n=63$
\end_inset

, 
\begin_inset Formula $X=40$
\end_inset

, 
\begin_inset Formula $n_{e}=45$
\end_inset

, then 
\begin_inset Formula 
\[
P(x\ge37)\simeq2\times10^{-6}.
\]

\end_inset

This probability is about 8 times higher than the probability of success
 when only 
\begin_inset Formula $40$
\end_inset

 symbols were erased, and the odds of successfully decoding on the first
 try are roughly 1 in 500,000.
 
\end_layout

\begin_layout Paragraph
Case 3
\end_layout

\begin_layout Case
Cases 1 and 2 illustrate the fact that a strategy that tries to guess which
 symbols to erase is not going to be very successful unless we are prepared
 to wait all day for an answer.
 Consider a slight modification to the strategy that can tip the odds in
 our favor.
 Suppose that the codeword contains 
\begin_inset Formula $X=40$
\end_inset

 incorrect symbols, as before.
 In this case it is known that 10 of the symbols are much more reliable
 than the other 53 symbols.
 The 10 most reliable symbols are all correct and these 10 symbols are protected
 from erasure, i.e.
 the set of erasures is chosen from the smaller set of 53 less reliable
 symbols.
 If 
\begin_inset Formula $n_{e}=45$
\end_inset

 symbols are chosen randomly from the set of 
\begin_inset Formula $n=53$
\end_inset

 least reliable symbols, it is still necessary for the erased symbols to
 include at least 37 errors (as in Case 2).
 In this case, with 
\begin_inset Formula $n=53$
\end_inset

, 
\begin_inset Formula $X=40$
\end_inset

, 
\begin_inset Formula $n_{e}=45$
\end_inset

, 
\begin_inset Formula $P(x\ge37)=0.016$
\end_inset

! Now, the situation is much better.
 The odds of decoding the word on the first try are approximately 1 in 62.5!
 
\end_layout

\begin_layout Standard
Even better odds are obtained with 
\begin_inset Formula $n_{e}=47$
\end_inset

 which requires 
\begin_inset Formula $x\ge38$
\end_inset

.
 With 
\begin_inset Formula $n=53$
\end_inset

, 
\begin_inset Formula $X=40$
\end_inset

, 
\begin_inset Formula $n_{e}=47$
\end_inset

, 
\begin_inset Formula $P(x\ge38)=0.0266$
\end_inset

, which makes the odds the best so far; about 1 in 38.
 
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:The-decoding-algorithm"

\end_inset

The SFRSD decoding algorithm
\end_layout

\begin_layout Standard
Case 3 illustrates how, with the addition of some reliable information about
 the quality of just 10 of the 63 symbols, it is possible to devise an algorithm
 that can decode received words containing a relatively large number of
 errors using only the BM errors-and-erasures decoder.
 The key to improving the odds enough to make the strategy of 
\begin_inset Quotes eld
\end_inset

guessing
\begin_inset Quotes erd
\end_inset

 at the erasure vector useful for practical implementation is to use information
 about the quality of the received symbols to decide which ones are most
 likely to be in error.
 In practice, because the number of errors in the received word is unknown,
 rather than erase a fixed number of symbols, it is better use a stochastic
 algorithm which assigns a relatively high probability of erasure to the
 lowest quality symbols and a relatively low probability of erasure to the
 highest quality symbols.
 As illustrated by case 3, a good choice of the erasure probabilities can
 increase the probability of a successful decode by many orders of magnitude
 relative to a bad choice.
\end_layout

\begin_layout Standard
The SFRSD algorithm uses two quality indices available from the JT65 noncoherent
 64-FSK demodulator to assign a variable probability of erasure to each
 received symbol.
 The demodulator identifies the most likely received symbol based on which
 of 64 frequency bins contains the the largest signal plus noise power.
 The percentage of the total signal plus noise power in the two bins containing
 the largest and second largest powers (denoted by, 
\begin_inset Formula $p_{1}$
\end_inset

 and 
\begin_inset Formula $p_{2}$
\end_inset

, respectively) are passed to the decoder from the demodulator as 
\begin_inset Quotes eld
\end_inset

soft-symbol
\begin_inset Quotes erd
\end_inset

 information.
 The decoder derives two metrics from 
\begin_inset Formula $\{p_{1},p_{2}\}:$
\end_inset


\end_layout

\begin_layout Itemize
\begin_inset Formula $p_{1}$
\end_inset

-rank: the rank 
\begin_inset Formula $\{1,2,\ldots,63\}$
\end_inset

 of the symbol's power percentage, 
\begin_inset Formula $p_{1}$
\end_inset

 in the sorted list of 
\begin_inset Formula $p_{1}$
\end_inset

 values.
 High ranking symbols have larger signal to noise ratio than lower ranked
 symbols.
 
\end_layout

\begin_layout Itemize
\begin_inset Formula $p_{2}/p_{1}$
\end_inset

: when 
\begin_inset Formula $p_{2}/p_{1}$
\end_inset

 is not small compared to 1, the most likely symbol is not much better than
 the second most likely symbol
\end_layout

\begin_layout Standard
The decoder has a built-in table of symbol error probabilities derived from
 a large dataset of received words that have been successfully decoded.
 The table provides an estimate of the 
\emph on
a-priori
\emph default
 probability of symbol error that is expected based on a given symbol's
 
\begin_inset Formula $p_{1}$
\end_inset

-rank and 
\begin_inset Formula $p_{2}/p_{1}$
\end_inset

 metrics.
 These 
\emph on
a-priori
\emph default
 symbol error probabilities will be close to 1 for lower-quality symbols
 and closer to 0 for high-quality symbols.
 Recall, from Cases 2 and 3, that the best performance was obtained when
 
\begin_inset Formula $n_{e}>X$
\end_inset

.
 Correspondingly, the SFRSD algorithm works best when the probability of
 erasing a symbol is somewhat larger than the probability that the symbol
 is incorrect.
 Empirically, it was determined that good performance of the SFRSD algorithm
 is obtained when the symbol erasure probability is a factor of 
\begin_inset Formula $1.3$
\end_inset

 larger than the symbol error probability.
\end_layout

\begin_layout Standard
The SFRSD algorithm successively tries to decode the received word using
 educated guesses at the symbols that should be erased.
 In each iteration, an independent stochastic erasure vector is generated
 based on the symbol erasure probabilities.
 The guessed erasure vector is provided to the BM decoder along with the
 received word.
 If the BM decoder finds a candidate codeword, then the codeword is assigned
 a quality metric, defined to be the soft distance, 
\begin_inset Formula $d_{s}$
\end_inset

, between the received word and the codeword, where
\begin_inset Formula 
\begin{equation}
d_{s}=\sum_{i=1}^{n}(1+p_{1,i})\alpha_{i}.\label{eq:soft_distance}
\end{equation}

\end_inset

and 
\begin_inset Formula $p_{1,i}$
\end_inset

 is the fractional power associated with the i'th received symbol and 
\begin_inset Formula $\alpha_{i}=0$
\end_inset

 if the i'th received symbol is the same as the corresponding symbol in
 the codeword, and 
\begin_inset Formula $\alpha_{i}=1$
\end_inset

 if the i'th symbol in the received word and the codeword are different.
 This soft distance can be written as two terms, the first of which is just
 the Hamming distance between the received word and the codeword.
 The second term ensures that if two candidate codewords have the same Hamming
 distance from the received word, a smaller distance will be assigned to
 the one where the different symbols occurred in lower quality symbols.
 
\end_layout

\begin_layout Standard
Technically, the algorithm is a list-decoder, potentially generating a list
 of candidate codewords.
 Among the list of candidate codewords found by this stochastic search algorithm
, only the one with the smallest soft-distance from the received word is
 kept.
 As with all such algorithms, a stopping criterion is necessary.
 SFRSD accepts a codeword unconditionally if its soft distance is smaller
 than an empirically determined acceptance threshold, 
\begin_inset Formula $d_{a}$
\end_inset

.
 A timeout is employed to limit the execution time of the algorithm in cases
 where no codewords within soft distance 
\begin_inset Formula $d_{a}$
\end_inset

 of the received word are found in a reasonable number of trials.
\end_layout

\begin_layout Paragraph
Algorithm
\end_layout

\begin_layout Enumerate
For each symbol in the received word, define the erasure probability to
 be 1.3 times the a priori symbol-error probability determined by the soft-symbol
 information 
\begin_inset Formula $\{p_{1}\textrm{-rank},p_{2}/p_{1}\}$
\end_inset

.
 
\end_layout

\begin_layout Enumerate
Make independent decisions about whether or not to erase each symbol in
 the word using the symbol's erasure probability.
 Allow a total of up to 51 symbols to be erased.
 
\end_layout

\begin_layout Enumerate
Attempt BM errors-and-erasures decoding with the set of erased symbols that
 was determined in step 2.
 If the BM decoder is successful go to step 5.
\end_layout

\begin_layout Enumerate
If decoding is not successful, go to step 2.
\end_layout

\begin_layout Enumerate
Calculate the soft distance, 
\begin_inset Formula $d_{s}$
\end_inset

, between the candidate codeword and the received word.
 Set 
\begin_inset Formula $d_{s,min}=d_{s}$
\end_inset

 if the soft distance is the smallest one encountered so far.
\end_layout

\begin_layout Enumerate
If 
\begin_inset Formula $d_{s,min}\le d_{a}$
\end_inset

, go to 8.
 
\end_layout

\begin_layout Enumerate
If the number of trials is less than the maximum allowed number, go to 2.
 Otherwise, declare decoding failure and exit.
\end_layout

\begin_layout Enumerate
A codeword with 
\begin_inset Formula $d_{s}\le d_{a}$
\end_inset

 has been found.
 Declare a successful decode.
 Return the best codeword found so far.
\end_layout

\begin_layout Section
Results
\end_layout

\begin_layout Section
Summary 
\end_layout

\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
key "key-1"

\end_inset


\end_layout

\end_body
\end_document