WSJT-X/lib/sfrsd2/sfrsd_paper/sfrsd.lyx

756 lines
19 KiB
Plaintext
Raw Normal View History

#LyX 2.1 created this file. For more info see http://www.lyx.org/
\lyxformat 474
\begin_document
\begin_header
\textclass IEEEtran
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman default
\font_sans default
\font_typewriter default
\font_math auto
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry false
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
A stochastic successive erasures soft-decision decoder for the JT65 (63,12)
Reed-Solomon code
\end_layout
\begin_layout Author
Steven J.
Franke, K9AN and Joseph H.
Taylor, K1JT
\end_layout
\begin_layout Abstract
The JT65 mode has revolutionized amateur-radio weak-signal communication
by enabling amateur radio operators with small antennas and relatively
low-power transmitters to communicate over propagation paths that could
not be utilized using traditional technologies.
One reason for the success and popularity of the JT65 mode is its use of
strong error-correction coding.
The JT65 code is a short block-length, low-rate, Reed-Solomon code based
on a 64-symbol alphabet.
Since 200?, decoders for the JT65 code have used the
\begin_inset Quotes eld
\end_inset
Koetter-Vardy
\begin_inset Quotes erd
\end_inset
(KV) algebraic soft-decision decoder.
The KV decoder is implemented in a closed-source program that is licensed
to K1JT for use in amateur applications.
This note describes a new open-source alternative to the KV decoder called
the SFRSD decoder.
The SFRSD decoding algorithm is shown to perform at least as well as the
KV decoder.
The SFRSD algorithm is conceptually simple and is built around the well-known
Berlekamp-Massey errors-and-erasures decoder.
\end_layout
\begin_layout Standard
JT65 message frames consist of a short, compressed, message that is encoded
for transmission using a Reed-Solomon code.
Reed-Solomon codes are block codes and, like all block codes, are characterized
by the length of their codewords,
\begin_inset Formula $n$
\end_inset
, the number of message symbols conveyed by the codeword,
\begin_inset Formula $k$
\end_inset
, and the number of possible values for each symbol in the codewords.
The codeword length and the number of message symbols are specified as
a tuple in the form
\begin_inset Formula $(n,k)$
\end_inset
.
JT65 uses a (63,12) Reed-Solomon code with 64 possible values for each
symbol, so each symbol represents
\begin_inset Formula $\log_{2}64=6$
\end_inset
message bits.
The source-encoded messages conveyed by a 63-symbol JT65 frame consist
of 72 bits.
The JT65 code is systematic, which means that the 12 message symbols are
embedded in the codeword without modification and another 51 parity symbols
derived from the message symbols are added to form the codeword consisting
of 63 total symbols.
\end_layout
\begin_layout Standard
The concept of Hamming distance is used as a measure of
\begin_inset Quotes eld
\end_inset
distance
\begin_inset Quotes erd
\end_inset
between different codewords, or between a received word and a codeword.
Hamming distance is the number of code symbols that differ in the two words
that are being compared.
Reed-Solomon codes have minimum Hamming distance
\begin_inset Formula $d$
\end_inset
, where
\begin_inset Formula
\begin{equation}
d=n-k+1.\label{eq:minimum_distance}
\end{equation}
\end_inset
The minimum Hamming distance of the JT65 code is
\begin_inset Formula $d=52$
\end_inset
, which means that any particular codeword differs from all other codewords
in at least 52 positions.
\end_layout
\begin_layout Standard
Given only a received word containing some incorrect symbols (errors), the
received word can be decoded into the correct codeword using a deterministic,
algebraic, algorithm provided that no more than
\begin_inset Formula $t$
\end_inset
symbols were received incorrectly, where
\begin_inset Formula
\begin{equation}
t=\left\lfloor \frac{n-k}{2}\right\rfloor .\label{eq:t}
\end{equation}
\end_inset
For the JT65 code,
\begin_inset Formula $t=25$
\end_inset
, which means that it is always possible to efficiently decode a received
word that contains no more than 25 symbol errors.
\end_layout
\begin_layout Standard
There are a number of well-known algebraic algorithms that can carry out
the process of decoding a received codeword that contains no more than
\begin_inset Formula $t$
\end_inset
errors.
One such algorithm is the Berlekamp-Massey (BM) decoding algorithm.
\end_layout
\begin_layout Standard
A decoder, such as BM, must carry out two tasks:
\end_layout
\begin_layout Enumerate
figure out which symbols were received incorrectly
\end_layout
\begin_layout Enumerate
figure out the correct value of the incorrect symbols
\end_layout
\begin_layout Standard
If it is somehow known that certain symbols are incorrect, such information
can be used in the decoding algorithm to reduce the amount of work required
in step 1 and to allow step 2 to correct more than
\begin_inset Formula $t$
\end_inset
errors.
In fact, in the unlikely event that the location of each and every error
is known and is provided to the BM decoder, and if no correct symbols are
accidentally labeled as errors, then the BM decoder can correct up to
\begin_inset Formula $d$
\end_inset
errors!
\end_layout
\begin_layout Standard
In the decoding algorithm described herein, a list of symbols that are known
or suspected to be incorrect is sent to the BM decoder.
Symbols in the received word that are flagged as being incorrect are called
\begin_inset Quotes eld
\end_inset
erasures
\begin_inset Quotes erd
\end_inset
.
Symbols that are not erased and that are incorrect will be called
\begin_inset Quotes eld
\end_inset
errors
\begin_inset Quotes erd
\end_inset
.
The BM decoder accepts erasure information in the form of a list of indices
corresponding to the incorrect, or suspected incorrect, symbols in the
received word.
As already noted, if the erasure information is perfect, then up to 51
errors will be corrected.
When the erasure information is imperfect, then some of the erased symbols
will actually be correct, and some of the unerased symbols will be in error.
If a total of
\begin_inset Formula $n_{era}$
\end_inset
symbols are erased and the remaining unerased symbols contain
\begin_inset Formula $n_{err}$
\end_inset
errors, then the BM algorithm can find the correct codeword as long as
\begin_inset Formula
\begin{equation}
n_{era}+2n_{err}\le d-1\label{eq:erasures_and_errors}
\end{equation}
\end_inset
If
\begin_inset Formula $n_{era}=0$
\end_inset
, then the decoder is said to be an
\begin_inset Quotes eld
\end_inset
errors-only
\begin_inset Quotes erd
\end_inset
decoder and it can correct up to
\begin_inset Formula $t$
\end_inset
errors (
\begin_inset Formula $t$
\end_inset
=25 for JT65).
If
\begin_inset Formula $0<n_{era}\le d-1$
\end_inset
(
\begin_inset Formula $d-1=51$
\end_inset
for JT65), then the decoder is said to be an
\begin_inset Quotes eld
\end_inset
errors-and-erasures
\begin_inset Quotes erd
\end_inset
decoder.
\end_layout
\begin_layout Standard
For the JT65 code, (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
) says that if
\begin_inset Formula $n_{era}$
\end_inset
symbols are declared to be erased, then the BM decoder will find the correct
codeword as long as the remaining un-erased symbols contain no more than
\begin_inset Formula $\left\lfloor \frac{51-n_{era}}{2}\right\rfloor $
\end_inset
errors.
The errors-and-erasures capability of the BM decoder is a very powerful
feature that serves as the core of the new soft-decision decoder described
herein.
\end_layout
\begin_layout Standard
It will be helpful to have some understanding of the errors and erasures
tradeoff described by (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
) to appreciate how the new decoder algorithm works.
Section
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:Errors-and-erasures-decoding-exa"
\end_inset
describes some examples that should illustrate how the errors-and-erasures
capability can be combined with some information about the quality of the
received symbols to enable development of a decoding algorithm that can
reliably decode received words that contain many more than 25 errors.
Section describes the SFRSD decoding algorithm.
\end_layout
\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Errors-and-erasures-decoding-exa"
\end_inset
You've got to ask yourself.
Do I feel lucky?
\end_layout
\begin_layout Standard
Consider a particular received codeword that contains 40 incorrect symbols
and 23 correct symbols.
It is not known which 40 symbols are in error.
\begin_inset Foot
status open
\begin_layout Plain Layout
In practice the number of errors will not be known either, but this is not
a serious problem.
\end_layout
\end_inset
Suppose that the decoder randomly chooses 40 symbols to erase (
\begin_inset Formula $n_{era}=40$
\end_inset
), leaving 23 unerased symbols.
According to (
\begin_inset CommandInset ref
LatexCommand ref
reference "eq:erasures_and_errors"
\end_inset
), the BM decoder can successfully decode this word as long as the number
of errors present in the 23 unerased symbols is 5 or less.
This means that the number of errors captured in the set of 40 erased symbols
must be at least 35.
\end_layout
\begin_layout Standard
The probability of selecting some particular number of bad symbols in a
randomly selected subset of the codeword symbols is governed by the hypergeomet
ric probability distribution.
\end_layout
\begin_layout Standard
Define:
\end_layout
\begin_layout Itemize
\begin_inset Formula $N$
\end_inset
= number of symbols in a codeword (63 for JT65),
\end_layout
\begin_layout Itemize
\begin_inset Formula $K$
\end_inset
= number of incorrect symbols in a codeword,
\end_layout
\begin_layout Itemize
\begin_inset Formula $n$
\end_inset
= number of symbols erased for errors-and-erasures decoding,
\end_layout
\begin_layout Itemize
\begin_inset Formula $k$
\end_inset
= number of incorrect symbols in the set of erased symbols.
\end_layout
\begin_layout Standard
Let
\begin_inset Formula $X$
\end_inset
be the number of incorrect symbols in a set of
\begin_inset Formula $n$
\end_inset
symbols chosen for erasure.
Then
\begin_inset Formula
\begin{equation}
P(X=k)=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\label{eq:hypergeometric_pdf-1}
\end{equation}
\end_inset
where
\begin_inset Formula $\binom{n}{m}=\frac{n!}{m!(n-m)!}$
\end_inset
is the binomial coefficient.
The binomial coefficient can be calculated using the
\begin_inset Quotes eld
\end_inset
nchoosek(n,k)
\begin_inset Quotes erd
\end_inset
function in Gnu Octave.
The hypergeometric probability mass function is available in Gnu Octave
as function
\begin_inset Quotes eld
\end_inset
hygepdf(k,N,K,n)
\begin_inset Quotes erd
\end_inset
.
\end_layout
\begin_layout Case
A codeword contains
\begin_inset Formula $K=40$
\end_inset
incorrect symbols.
In an attempt to decode using an errors-and-erasures decoder,
\begin_inset Formula $n=40$
\end_inset
symbols are randomly selected for erasure.
The probability that
\begin_inset Formula $35$
\end_inset
of the erased symbols are incorrect is:
\begin_inset Formula
\[
P(X=35)=\frac{\binom{40}{35}\binom{63-40}{40-35}}{\binom{63}{40}}=2.356\times10^{-7}.
\]
\end_inset
Similarly:
\begin_inset Formula
\[
P(X=36)=8.610\times10^{-9}.
\]
\end_inset
Since the probability of catching 36 errors is so much smaller than the
probability of catching 35 errors, it is safe to say that the probability
of randomly selecting an erasure vector that can decode the received word
is essentially equal to
\begin_inset Formula $P(X=35)\simeq2.4\times10^{-7}$
\end_inset
.
The odds of successfully decoding the word on the first try are about 1
in 4 million.
\end_layout
\begin_layout Case
A codeword contains
\begin_inset Formula $K=40$
\end_inset
incorrect symbols.
It is interesting to work out the best choice for the number of symbols
that should be selected at random for erasure if the goal is to maximize
the probability of successfully decoding the word.
By exhaustive search, it turns out that the best case is to erase
\begin_inset Formula $n=45$
\end_inset
symbols, in which case the word will be decoded if the set of erased symbols
contains at least 37 errors.
With
\begin_inset Formula $N=63$
\end_inset
,
\begin_inset Formula $K=40$
\end_inset
,
\begin_inset Formula $n=45$
\end_inset
, then
\begin_inset Formula
\[
P(X\ge37)\simeq2\times10^{-6}.
\]
\end_inset
This probability is about 8 times higher than the probability of success
when only
\begin_inset Formula $40$
\end_inset
symbols were erased, and the odds of successfully decoding on the first
try are roughly 1 in 500,000.
\end_layout
\begin_layout Case
Cases 1 and 2 illustrate the fact that a strategy that tries to guess which
symbols to erase is not going to be very successful unless we are prepared
to wait all day for an answer.
Consider a slight modification to the strategy that can tip the odds in
our favor.
Suppose that the codeword contains
\begin_inset Formula $K=40$
\end_inset
incorrect symbols, as before.
In this case it is known that 10 of the symbols are much more reliable
than the other 53 symbols.
The 10 most reliable symbols are all correct and these 10 symbols are protected
from erasure, i.e.
the set of erasures is chosen from the smaller set of 53 less reliable
symbols.
If
\begin_inset Formula $n=40$
\end_inset
symbols are chosen randomly from the set of
\begin_inset Formula $N=53$
\end_inset
least reliable symbols, it is still necessary for the erased symbols to
include at least 35 errors (as in Case 1).
In this case, with
\begin_inset Formula $N=53$
\end_inset
,
\begin_inset Formula $K=40$
\end_inset
,
\begin_inset Formula $n=35$
\end_inset
,
\begin_inset Formula $P(X=35)=0.001$
\end_inset
! Now, the situation is much better.
The odds of decoding the word on the first try are approximately 1 in 1000.
The odds are even better if 41 symbols are erased, in which case
\begin_inset Formula $P(X=35)=0.0042$
\end_inset
, giving odds of about 1 in 200!
\end_layout
\begin_layout Standard
Case 3 illustrates how, with the addition of some reliable information about
the quality of just 10 of the 63 symbols, it is possible to decode received
words containing a relatively large number of errors using only the BM
errors-and-erasures decoder.
The key to improving the odds enough to make the strategy of
\begin_inset Quotes eld
\end_inset
guessing
\begin_inset Quotes erd
\end_inset
at the erasure vector useful for practical implementation is to use information
about the quality of the received symbols to decide which ones are most
likely to be in error, and to assign a relatively high probability of erasure
to the lowest quality symbols and a relatively low probability of erasure
to the highest quality symbols.
It turns out that a good choice of the erasure probabilities can increase
the probability of a successful decode by several orders of magnitude relative
to a bad choice.
\end_layout
\begin_layout Standard
Rather than selecting a fixed number of symbols to erase, the SFRSD algorithm
uses information available from the demodulator to assign a variable probabilit
y of erasure to each received symbol.
Symbols that are determined to be of low quality and thus likely to be
incorrect are assigned a high probability of erasure, and symbols that
are likely to be correct are assigned low erasure probabilities.
The erasure probability for a symbol is determined using two quality indices
that are derived from information provided by the demodulator.
\end_layout
\begin_layout Section
The decoding algorithm
\end_layout
\begin_layout Standard
Preliminary setup: Using a large dataset of received words that have been
successfully decoded, estimate the probability of symbol error as a function
of the symbol's metrics P1-rank and P2/P1.
The resulting matrix is scaled by a factor (1.3) and used as the erasure-probabi
lity matrix in step 2.
\end_layout
\begin_layout Standard
For each received word:
\end_layout
\begin_layout Standard
1.
Determine symbol metrics for each symbol in the received word.
The metrics are the rank {1,2,...,63} of the symbol's power percentage and
the ratio of the power percentages of the second most likely symbol and
the most likely symbol.
Denote these metrics by P1-rank and P2/P1.
\end_layout
\begin_layout Standard
2.
Use the erasure probability for each symbol, make independent decisions
about whether or not to erase each symbol in the word.
Allow a total of up to 51 symbols to be erased.
\end_layout
\begin_layout Standard
3.
Attempt errors-and-erasures decoding with the erasure vector that was determine
d in step 3.
If the decoder is successful, it returns a candidate codeword.
Go to step 5.
\end_layout
\begin_layout Standard
4.
If decoding is not successful, go to step 2.
\end_layout
\begin_layout Standard
5.
If a candidate codeword is returned by the decoder, calculate its soft
distance from the received word and save the codeword if the soft distance
is the smallest one encountered so far.
If the soft distance is smaller than threshold dthresh, delare a successful
decode and return the codeword.
\end_layout
\begin_layout Standard
6.
If the number of trials is equal to the maximum allowed number, exit and
return the current best codeword.
Otherwise, go to 2
\end_layout
\begin_layout Bibliography
\begin_inset CommandInset bibitem
LatexCommand bibitem
key "key-1"
\end_inset
\end_layout
\end_body
\end_document