A copy of the license is included in the section entitled "GNU Free Documentation License". File-based compression is of course also supported. \end_layout \begin_layout Standard The Speex codec is designed to be very flexible and support a wide range of speech quality and bit-rate. Support for very good quality speech also means that Speex can encode wideband speech (16 kHz sampling rate) in addition to narrowband speech (telephone quality, 8 kHz sampling rate). \end_layout \begin_layout Standard Designing for VoIP instead of mobile phones means that Speex is robust to lost packets, but not to corrupted ones. This is based on the assumption that in VoIP, packets either arrive unaltered or don't arrive at all. Because Speex is targeted at a wide range of devices, it has modest (adjustable ) complexity and a small memory footprint. \end_layout \begin_layout Standard All the design goals led to the choice of CELP \begin_inset Index status collapsed \begin_layout Plain Layout CELP \end_layout \end_inset as the encoding technique. One of the main reasons is that CELP has long proved that it could work reliably and scale well to both low bit-rates (e.g. DoD CELP @ 4.8 kbps) and high bit-rates (e.g. G.728 @ 16 kbps). \end_layout \begin_layout Section Getting help \begin_inset CommandInset label LatexCommand label name "sec:Getting-help" \end_inset \end_layout \begin_layout Standard As for many open source projects, there are many ways to get help with Speex. These include: \end_layout \begin_layout Itemize This manual \end_layout \begin_layout Itemize Other documentation on the Speex website (http://www.speex.org/) \end_layout \begin_layout Itemize Mailing list: Discuss any Speex-related topic on speex-dev@xiph.org (not just for developers) \end_layout \begin_layout Itemize IRC: The main channel is #speex on irc.freenode.net. Note that due to time differences, it may take a while to get someone, so please be patient. \end_layout \begin_layout Itemize Email the author privately at jean-marc.valin@usherbrooke.ca \series bold only \series default for private/delicate topics you do not wish to discuss publicly. \end_layout \begin_layout Standard Before asking for help (mailing list or IRC), \series bold it is important to first read this manual \series default (OK, so if you made it here it's already a good sign). It is generally considered rude to ask on a mailing list about topics that are clearly detailed in the documentation. On the other hand, it's perfectly OK (and encouraged) to ask for clarifications about something covered in the manual. This manual does not (yet) cover everything about Speex, so everyone is encouraged to ask questions, send comments, feature requests, or just let us know how Speex is being used. \end_layout \begin_layout Standard Here are some additional guidelines related to the mailing list. Before reporting bugs in Speex to the list, it is strongly recommended (if possible) to first test whether these bugs can be reproduced using the speexenc and speexdec (see Section \begin_inset CommandInset ref LatexCommand ref reference "sec:Command-line-encoder/decoder" \end_inset ) command-line utilities. Bugs reported based on 3rd party code are both harder to find and far too often caused by errors that have nothing to do with Speex. \end_layout \begin_layout Section About this document \end_layout \begin_layout Standard This document is divided in the following way. Section \begin_inset CommandInset ref LatexCommand ref reference "sec:Feature-description" \end_inset describes the different Speex features and defines many basic terms that are used throughout this manual. Section \begin_inset CommandInset ref LatexCommand ref reference "sec:Command-line-encoder/decoder" \end_inset documents the standard command-line tools provided in the Speex distribution. Section \begin_inset CommandInset ref LatexCommand ref reference "sec:Programming-with-Speex" \end_inset includes detailed instructions about programming using the libspeex \begin_inset Index status collapsed \begin_layout Plain Layout libspeex \end_layout \end_inset API. Section \begin_inset CommandInset ref LatexCommand ref reference "sec:Formats-and-standards" \end_inset has some information related to Speex and standards. \end_layout \begin_layout Standard The three last sections describe the algorithms used in Speex. These sections require signal processing knowledge, but are not required for merely using Speex. They are intended for people who want to understand how Speex really works and/or want to do research based on Speex. Section \begin_inset CommandInset ref LatexCommand ref reference "sec:Introduction-to-CELP" \end_inset explains the general idea behind CELP, while sections \begin_inset CommandInset ref LatexCommand ref reference "sec:Speex-narrowband-mode" \end_inset and \begin_inset CommandInset ref LatexCommand ref reference "sec:Speex-wideband-mode" \end_inset are specific to Speex. \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Chapter Codec description \begin_inset CommandInset label LatexCommand label name "sec:Feature-description" \end_inset \end_layout \begin_layout Standard This section describes Speex and its features into more details. \end_layout \begin_layout Section Concepts \end_layout \begin_layout Standard Before introducing all the Speex features, here are some concepts in speech coding that help better understand the rest of the manual. Although some are general concepts in speech/audio processing, others are specific to Speex. \end_layout \begin_layout Subsection* Sampling rate \begin_inset Index status collapsed \begin_layout Plain Layout sampling rate \end_layout \end_inset \end_layout \begin_layout Standard The sampling rate expressed in Hertz (Hz) is the number of samples taken from a signal per second. For a sampling rate of \begin_inset Formula $F_{s}$ \end_inset kHz, the highest frequency that can be represented is equal to \begin_inset Formula $F_{s}/2$ \end_inset kHz ( \begin_inset Formula $F_{s}/2$ \end_inset is known as the Nyquist frequency). This is a fundamental property in signal processing and is described by the sampling theorem. Speex is mainly designed for three different sampling rates: 8 kHz, 16 kHz, and 32 kHz. These are respectively referred to as narrowband \begin_inset Index status collapsed \begin_layout Plain Layout narrowband \end_layout \end_inset , wideband \begin_inset Index status collapsed \begin_layout Plain Layout wideband \end_layout \end_inset and ultra-wideband \begin_inset Index status collapsed \begin_layout Plain Layout ultra-wideband \end_layout \end_inset . \end_layout \begin_layout Subsection* Bit-rate \end_layout \begin_layout Standard When encoding a speech signal, the bit-rate is defined as the number of bits per unit of time required to encode the speech. It is measured in \emph on bits per second \emph default (bps), or generally \emph on kilobits per second \emph default . It is important to make the distinction between \emph on kilo \series bold bits \series default \emph default \emph on per second \emph default (k \series bold b \series default ps) and \emph on kilo \series bold bytes \series default \emph default \emph on per second \emph default (k \series bold B \series default ps). \end_layout \begin_layout Subsection* Quality \begin_inset Index status collapsed \begin_layout Plain Layout quality \end_layout \end_inset (variable) \end_layout \begin_layout Standard Speex is a lossy codec, which means that it achieves compression at the expense of fidelity of the input speech signal. Unlike some other speech codecs, it is possible to control the trade-off made between quality and bit-rate. The Speex encoding process is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate \begin_inset Index status collapsed \begin_layout Plain Layout constant bit-rate \end_layout \end_inset (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a float. \end_layout \begin_layout Subsection* Complexity \begin_inset Index status collapsed \begin_layout Plain Layout complexity \end_layout \end_inset (variable) \end_layout \begin_layout Standard With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way that's similar to the -1 to -9 options to \emph on gzip \emph default and \emph on bzip2 \emph default compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about 5 times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF \begin_inset Index status collapsed \begin_layout Plain Layout DTMF \end_layout \end_inset tones. \end_layout \begin_layout Subsection* Variable Bit-Rate \begin_inset Index status collapsed \begin_layout Plain Layout variable bit-rate \end_layout \end_inset (VBR) \end_layout \begin_layout Standard Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically to adapt to the \begin_inset Quotes eld \end_inset difficulty \begin_inset Quotes erd \end_inset of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit-rate to achieve good quality, while fricatives (e.g. s,f sounds) can be coded adequately with less bits. For this reason, VBR can achieve lower bit-rate for the same quality, or a better quality for a certain bit-rate. Despite its advantages, VBR has two main drawbacks: first, by only specifying quality, there's no guaranty about the final average bit-rate. Second, for some real-time applications like voice over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel. \end_layout \begin_layout Subsection* Average Bit-Rate \begin_inset Index status collapsed \begin_layout Plain Layout average bit-rate \end_layout \end_inset (ABR) \end_layout \begin_layout Standard Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bit-rate. \end_layout \begin_layout Subsection* Voice Activity Detection \begin_inset Index status collapsed \begin_layout Plain Layout voice activity detection \end_layout \end_inset (VAD) \end_layout \begin_layout Standard When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encode them with just enough bits to reproduce the background noise. This is called \begin_inset Quotes eld \end_inset comfort noise generation \begin_inset Quotes erd \end_inset (CNG). \end_layout \begin_layout Subsection* Discontinuous Transmission \begin_inset Index status collapsed \begin_layout Plain Layout discontinuous transmission \end_layout \end_inset (DTX) \end_layout \begin_layout Standard Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In file-based operation, since we cannot just stop writing to the file, only 5 bits are used for such frames (corresponding to 250 bps). \end_layout \begin_layout Subsection* Perceptual enhancement \begin_inset Index status collapsed \begin_layout Plain Layout perceptual enhancement \end_layout \end_inset \end_layout \begin_layout Standard Perceptual enhancement is a part of the decoder which, when turned on, attempts to reduce the perception of the noise/distortion produced by the encoding/decod ing process. In most cases, perceptual enhancement brings the sound further from the original \emph on objectively \emph default (e.g. considering only SNR), but in the end it still \emph on sounds \emph default better (subjective improvement). \end_layout \begin_layout Subsection* Latency and algorithmic delay \begin_inset Index status collapsed \begin_layout Plain Layout algorithmic delay \end_layout \end_inset \end_layout \begin_layout Standard Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of \begin_inset Quotes eld \end_inset look-ahead \begin_inset Quotes erd \end_inset required to process each frame. In narrowband operation (8 kHz), the look-ahead is 10 ms, in wideband operation (16 kHz), the look-ahead is 13.9 ms and in ultra-wideband operation (32 kHz) look-ahead is 15.9 ms, resulting in the algorithic delays of 30 ms, 33.9 ms and 35.9 ms accordingly. These values don't account for the CPU time it takes to encode or decode the frames. \end_layout \begin_layout Section Codec \end_layout \begin_layout Standard The main characteristics of Speex can be summarized as follows: \end_layout \begin_layout Itemize Free software/open-source \begin_inset Index status collapsed \begin_layout Plain Layout open-source \end_layout \end_inset , patent \begin_inset Index status collapsed \begin_layout Plain Layout patent \end_layout \end_inset and royalty-free \end_layout \begin_layout Itemize Integration of narrowband \begin_inset Index status collapsed \begin_layout Plain Layout narrowband \end_layout \end_inset and wideband \begin_inset Index status collapsed \begin_layout Plain Layout wideband \end_layout \end_inset using an embedded bit-stream \end_layout \begin_layout Itemize Wide range of bit-rates available (from 2.15 kbps to 44 kbps) \end_layout \begin_layout Itemize Dynamic bit-rate switching (AMR) and Variable Bit-Rate \begin_inset Index status collapsed \begin_layout Plain Layout variable bit-rate \end_layout \end_inset (VBR) operation \end_layout \begin_layout Itemize Voice Activity Detection \begin_inset Index status collapsed \begin_layout Plain Layout voice activity detection \end_layout \end_inset (VAD, integrated with VBR) and discontinuous transmission (DTX) \end_layout \begin_layout Itemize Variable complexity \begin_inset Index status collapsed \begin_layout Plain Layout complexity \end_layout \end_inset \end_layout \begin_layout Itemize Embedded wideband structure (scalable sampling rate) \end_layout \begin_layout Itemize Ultra-wideband sampling rate at 32 kHz \end_layout \begin_layout Itemize Intensity stereo encoding option \end_layout \begin_layout Itemize Fixed-point implementation \end_layout \begin_layout Section Preprocessor \end_layout \begin_layout Standard This part refers to the preprocessor module introduced in the 1.1.x branch. The preprocessor is designed to be used on the audio \emph on before \emph default running the encoder. The preprocessor provides three main functionalities: \end_layout \begin_layout Itemize noise suppression \end_layout \begin_layout Itemize automatic gain control (AGC) \end_layout \begin_layout Itemize voice activity detection (VAD) \end_layout \begin_layout Standard The denoiser can be used to reduce the amount of background noise present in the input signal. This provides higher quality speech whether or not the denoised signal is encoded with Speex (or at all). However, when using the denoised signal with the codec, there is an additional benefit. Speech codecs in general (Speex included) tend to perform poorly on noisy input, which tends to amplify the noise. The denoiser greatly reduces this effect. \end_layout \begin_layout Standard Automatic gain control (AGC) is a feature that deals with the fact that the recording volume may vary by a large amount between different setups. The AGC provides a way to adjust a signal to a reference volume. This is useful for voice over IP because it removes the need for manual adjustment of the microphone gain. A secondary advantage is that by setting the microphone gain to a conservative (low) level, it is easier to avoid clipping. \end_layout \begin_layout Standard The voice activity detector (VAD) provided by the preprocessor is more advanced than the one directly provided in the codec. \end_layout \begin_layout Section Adaptive Jitter Buffer \end_layout \begin_layout Standard When transmitting voice (or any content for that matter) over UDP or RTP, packet may be lost, arrive with different delay, or even out of order. The purpose of a jitter buffer is to reorder packets and buffer them long enough (but no longer than necessary) so they can be sent to be decoded. \end_layout \begin_layout Section Acoustic Echo Canceller \end_layout \begin_layout Standard In any hands-free communication system (Fig. \begin_inset CommandInset ref LatexCommand ref reference "fig:Acoustic-echo-model" \end_inset ), speech from the remote end is played in the local loudspeaker, propagates in the room and is captured by the microphone. If the audio captured from the microphone is sent directly to the remote end, then the remote user hears an echo of his voice. An acoustic echo canceller is designed to remove the acoustic echo before it is sent to the remote end. It is important to understand that the echo canceller is meant to improve the quality on the \series bold remote \series default end. For those who care a lot about mouth-to-ear delays it should be noted that unlike Speex codec, resampler and preprocessor, this Acoustic Echo Canceller does not introduce any latency. \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Graphics filename echo_path.eps width 10cm \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Acoustic echo model \begin_inset CommandInset label LatexCommand label name "fig:Acoustic-echo-model" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Section Resampler \end_layout \begin_layout Standard In some cases, it may be useful to convert audio from one sampling rate to another. There are many reasons for that. It can be for mixing streams that have different sampling rates, for supporting sampling rates that the soundcard doesn't support, for transcoding, etc. That's why there is now a resampler that is part of the Speex project. This resampler can be used to convert between any two arbitrary rates (the ratio must only be a rational number) and there is control over the quality/com plexity tradeoff. Keep in mind, that resampler introduce some delay in audio stream, which size depends on resampler quality setting. Refer to resampler API documentation to know how to get exact delay values. \end_layout \begin_layout Section Integration \end_layout \begin_layout Standard Knowing \emph on how \emph default to use each of the components is not that useful unless we know \emph on where \emph default to use them. Figure \begin_inset CommandInset ref LatexCommand ref reference "fig:Integration-VoIP" \end_inset shows where each of the components would be used in a typical VoIP client. Components in dotted lines are optional, though they may be very useful in some circumstances. There are several important things to note from there. The AEC must be placed as close as possible to the playback and capture. Only the resampling may be closer. Also, it is very important to use the same clock for both mic capture and speaker/headphones playback. \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Graphics filename components.eps width 80text% \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Integration of all the components in a VoIP client. \begin_inset CommandInset label LatexCommand label name "fig:Integration-VoIP" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Chapter Compiling and Porting \end_layout \begin_layout Standard Compiling Speex under UNIX/Linux or any other platform supported by autoconf (e.g. Win32/cygwin) is as easy as typing: \end_layout \begin_layout LyX-Code % ./configure [options] \end_layout \begin_layout LyX-Code % make \end_layout \begin_layout LyX-Code % make install \end_layout \begin_layout Standard The options supported by the Speex configure script are: \end_layout \begin_layout Description --prefix= Specifies the base path for installing Speex (e.g. /usr) \end_layout \begin_layout Description --enable-shared/--disable-shared Whether to compile shared libraries \end_layout \begin_layout Description --enable-static/--disable-static Whether to compile static libraries \end_layout \begin_layout Description --disable-wideband Disable the wideband part of Speex (typically to save space) \end_layout \begin_layout Description --enable-valgrind Enable extra hits for valgrind for debugging purposes (do not use by default) \end_layout \begin_layout Description --enable-sse Enable use of SSE instructions (x86/float only) \end_layout \begin_layout Description --enable-fixed-point \begin_inset Index status collapsed \begin_layout Plain Layout fixed-point \end_layout \end_inset Compile Speex for a processor that does not have a floating point unit (FPU) \end_layout \begin_layout Description --enable-arm4-asm Enable assembly specific to the ARMv4 architecture (gcc only) \end_layout \begin_layout Description --enable-arm5e-asm Enable assembly specific to the ARMv5E architecture (gcc only) \end_layout \begin_layout Description --enable-fixed-point-debug Use only for debugging the fixed-point \begin_inset Index status collapsed \begin_layout Plain Layout fixed-point \end_layout \end_inset code (very slow) \end_layout \begin_layout Description --enable-ti-c55x Enable support for the TI C5x family \end_layout \begin_layout Description --enable-blackfin-asm Enable assembly specific to the Blackfin DSP architecture (gcc only) \end_layout \begin_layout Section Platforms \end_layout \begin_layout Standard Speex is known to compile and work on a large number of architectures, both floating-point and fixed-point. In general, any architecture that can natively compute the multiplication of two signed 16-bit numbers (32-bit result) and runs at a sufficient clock rate (architecture-dependent) is capable of running Speex. Architectures on which Speex is \series bold known \series default to work (it probably works on many others) are: \end_layout \begin_layout Itemize x86 & x86-64 \end_layout \begin_layout Itemize Power \end_layout \begin_layout Itemize SPARC \end_layout \begin_layout Itemize ARM \end_layout \begin_layout Itemize Blackfin \end_layout \begin_layout Itemize Coldfire (68k family) \end_layout \begin_layout Itemize TI C54xx & C55xx \end_layout \begin_layout Itemize TI C6xxx \end_layout \begin_layout Itemize TriMedia (experimental) \end_layout \begin_layout Standard Operating systems on top of which Speex is known to work include (it probably works on many others): \end_layout \begin_layout Itemize Linux \end_layout \begin_layout Itemize \begin_inset Formula $\mu$ \end_inset Clinux \end_layout \begin_layout Itemize MacOS X \end_layout \begin_layout Itemize BSD \end_layout \begin_layout Itemize Other UNIX/POSIX variants \end_layout \begin_layout Itemize Symbian \end_layout \begin_layout Standard The source code directory include additional information for compiling on certain architectures or operating systems in README.xxx files. \end_layout \begin_layout Section Porting and Optimising \end_layout \begin_layout Standard Here are a few things to consider when porting or optimising Speex for a new platform or an existing one. \end_layout \begin_layout Subsection CPU optimisation \end_layout \begin_layout Standard The single factor that will affect the CPU usage of Speex the most is whether it is compiled for floating point or fixed-point. If your CPU/DSP does not have a floating-point unit FPU, then compiling as fixed-point will be orders of magnitudes faster. If there is an FPU present, then it is important to test which version is faster. On the x86 architecture, floating-point is \series bold generally \series default faster, but not always. To compile Speex as fixed-point, you need to pass --fixed-point to the configure script or define the FIXED_POINT macro for the compiler. As of 1.2beta3, it is now possible to disable the floating-point compatibility API, which means that your code can link without a float emulation library. To do that configure with --disable-float-api or define the DISABLE_FLOAT_API macro. Until the VBR feature is ported to fixed-point, you will also need to configure with --disable-vbr or define DISABLE_VBR. \end_layout \begin_layout Standard Other important things to check on some DSP architectures are: \end_layout \begin_layout Itemize Make sure the cache is set to write-back mode \end_layout \begin_layout Itemize If the chip has SRAM instead of cache, make sure as much code and data are in SRAM, rather than in RAM \end_layout \begin_layout Standard If you are going to be writing assembly, then the following functions are \series bold usually \series default the first ones you should consider optimising: \end_layout \begin_layout Itemize \begin_inset listings inline true status collapsed \begin_layout Plain Layout filter_mem16() \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset listings inline true status collapsed \begin_layout Plain Layout iir_mem16() \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset listings inline true status collapsed \begin_layout Plain Layout vq_nbest() \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset listings inline true status collapsed \begin_layout Plain Layout pitch_xcorr() \end_layout \end_inset \end_layout \begin_layout Itemize \begin_inset listings inline true status collapsed \begin_layout Plain Layout interp_pitch() \end_layout \end_inset \end_layout \begin_layout Standard The filtering functions \begin_inset listings inline true status collapsed \begin_layout Plain Layout filter_mem16() \end_layout \end_inset and \begin_inset listings inline true status collapsed \begin_layout Plain Layout iir_mem16() \end_layout \end_inset are implemented in the direct form II transposed (DF2T). However, for architectures based on multiply-accumulate (MAC), DF2T requires frequent reload of the accumulator, which can make the code very slow. For these architectures (e.g. Blackfin and Coldfire), a better approach is to implement those functions as direct form I (DF1), which is easier to express in terms of MAC. When doing that however, \series bold it is important to make sure that the DF1 implementation still behaves like the original DF2T behaviour when it comes to memory values \series default . This is necessary because the filter is time-varying and must compute exactly the same value (not counting machine rounding) on any encoder or decoder. \end_layout \begin_layout Subsection Memory optimisation \end_layout \begin_layout Standard Memory optimisation is mainly something that should be considered for small embedded platforms. For PCs, Speex is already so tiny that it's just not worth doing any of the things suggested here. There are several ways to reduce the memory usage of Speex, both in terms of code size and data size. For optimising code size, the trick is to first remove features you do not need. Some examples of things that can easily be disabled \series bold if you don't need them \series default are: \end_layout \begin_layout Itemize Wideband support (--disable-wideband) \end_layout \begin_layout Itemize Support for stereo (removing stereo.c) \end_layout \begin_layout Itemize VBR support (--disable-vbr or DISABLE_VBR) \end_layout \begin_layout Itemize Static codebooks that are not needed for the bit-rates you are using (*_table.c files) \end_layout \begin_layout Standard Speex also has several methods for allocating temporary arrays. When using a compiler that supports C99 properly (as of 2007, Microsoft compilers don't, but gcc does), it is best to define VAR_ARRAYS. That makes use of the variable-size array feature of C99. The next best is to define USE_ALLOCA so that Speex can use alloca() to allocate the temporary arrays. Note that on many systems, alloca() is buggy so it may not work. If none of VAR_ARRAYS and USE_ALLOCA are defined, then Speex falls back to allocating a large \begin_inset Quotes eld \end_inset scratch space \begin_inset Quotes erd \end_inset and doing its own internal allocation. The main disadvantage of this solution is that it is wasteful. It needs to allocate enough stack for the worst case scenario (worst bit-rate, highest complexity setting, ...) and by default, the memory isn't shared between multiple encoder/decoder states. Still, if the \begin_inset Quotes eld \end_inset manual \begin_inset Quotes erd \end_inset allocation is the only option left, there are a few things that can be improved. By overriding the speex_alloc_scratch() call in os_support.h, it is possible to always return the same memory area for all states \begin_inset Foot status collapsed \begin_layout Plain Layout In this case, one must be careful with threads \end_layout \end_inset . In addition to that, by redefining the NB_ENC_STACK and NB_DEC_STACK (or similar for wideband), it is possible to only allocate memory for a scenario that is known in advance. In this case, it is important to measure the amount of memory required for the specific sampling rate, bit-rate and complexity level being used. \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Chapter Command-line encoder/decoder \begin_inset CommandInset label LatexCommand label name "sec:Command-line-encoder/decoder" \end_inset \end_layout \begin_layout Standard The base Speex distribution includes a command-line encoder ( \emph on speexenc \emph default ) and decoder ( \emph on speexdec \emph default ). Those tools produce and read Speex files encapsulated in the Ogg container. Although it is possible to encapsulate Speex in any container, Ogg is the recommended container for files. This section describes how to use the command line tools for Speex files in Ogg. \end_layout \begin_layout Section \emph on speexenc \begin_inset Index status collapsed \begin_layout Plain Layout speexenc \end_layout \end_inset \end_layout \begin_layout Standard The \emph on speexenc \emph default utility is used to create Speex files from raw PCM or wave files. It can be used by calling: \end_layout \begin_layout LyX-Code speexenc [options] input_file output_file \end_layout \begin_layout Standard The value '-' for input_file or output_file corresponds respectively to stdin and stdout. The valid options are: \end_layout \begin_layout Description --narrowband \begin_inset space ~ \end_inset (-n) Tell Speex to treat the input as narrowband (8 kHz). This is the default \end_layout \begin_layout Description --wideband \begin_inset space ~ \end_inset (-w) Tell Speex to treat the input as wideband (16 kHz) \end_layout \begin_layout Description --ultra-wideband \begin_inset space ~ \end_inset (-u) Tell Speex to treat the input as \begin_inset Quotes eld \end_inset ultra-wideband \begin_inset Quotes erd \end_inset (32 kHz) \end_layout \begin_layout Description --quality \begin_inset space ~ \end_inset n Set the encoding quality (0-10), default is 8 \end_layout \begin_layout Description --bitrate \begin_inset space ~ \end_inset n Encoding bit-rate (use bit-rate n or lower) \end_layout \begin_layout Description --vbr Enable VBR (Variable Bit-Rate), disabled by default \end_layout \begin_layout Description --abr \begin_inset space ~ \end_inset n Enable ABR (Average Bit-Rate) at n kbps, disabled by default \end_layout \begin_layout Description --vad Enable VAD (Voice Activity Detection), disabled by default \end_layout \begin_layout Description --dtx Enable DTX (Discontinuous Transmission), disabled by default \end_layout \begin_layout Description --nframes \begin_inset space ~ \end_inset n Pack n frames in each Ogg packet (this saves space at low bit-rates) \end_layout \begin_layout Description --comp \begin_inset space ~ \end_inset n Set encoding speed/quality tradeoff. The higher the value of n, the slower the encoding (default is 3) \end_layout \begin_layout Description -V Verbose operation, print bit-rate currently in use \end_layout \begin_layout Description --help \begin_inset space ~ \end_inset (-h) Print the help \end_layout \begin_layout Description --version \begin_inset space ~ \end_inset (-v) Print version information \end_layout \begin_layout Subsection* Speex comments \end_layout \begin_layout Description --comment Add the given string as an extra comment. This may be used multiple times. \end_layout \begin_layout Description --author Author of this track. \end_layout \begin_layout Description --title Title for this track. \end_layout \begin_layout Subsection* Raw input options \end_layout \begin_layout Description --rate \begin_inset space ~ \end_inset n Sampling rate for raw input \end_layout \begin_layout Description --stereo Consider raw input as stereo \end_layout \begin_layout Description --le Raw input is little-endian \end_layout \begin_layout Description --be Raw input is big-endian \end_layout \begin_layout Description --8bit Raw input is 8-bit unsigned \end_layout \begin_layout Description --16bit Raw input is 16-bit signed \end_layout \begin_layout Section \emph on speexdec \begin_inset Index status collapsed \begin_layout Plain Layout speexdec \end_layout \end_inset \end_layout \begin_layout Standard The \emph on speexdec \emph default utility is used to decode Speex files and can be used by calling: \end_layout \begin_layout LyX-Code speexdec [options] speex_file [output_file] \end_layout \begin_layout Standard The value '-' for input_file or output_file corresponds respectively to stdin and stdout. Also, when no output_file is specified, the file is played to the soundcard. The valid options are: \end_layout \begin_layout Description --enh enable post-filter (default) \end_layout \begin_layout Description --no-enh disable post-filter \end_layout \begin_layout Description --force-nb Force decoding in narrowband \end_layout \begin_layout Description --force-wb Force decoding in wideband \end_layout \begin_layout Description --force-uwb Force decoding in ultra-wideband \end_layout \begin_layout Description --mono Force decoding in mono \end_layout \begin_layout Description --stereo Force decoding in stereo \end_layout \begin_layout Description --rate \begin_inset space ~ \end_inset n Force decoding at n Hz sampling rate \end_layout \begin_layout Description --packet-loss \begin_inset space ~ \end_inset n Simulate n % random packet loss \end_layout \begin_layout Description -V Verbose operation, print bit-rate currently in use \end_layout \begin_layout Description --help \begin_inset space ~ \end_inset (-h) Print the help \end_layout \begin_layout Description --version \begin_inset space ~ \end_inset (-v) Print version information \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Chapter Using the Speex Codec API ( \emph on libspeex \emph default \begin_inset Index status collapsed \begin_layout Plain Layout libspeex \end_layout \end_inset ) \begin_inset CommandInset label LatexCommand label name "sec:Programming-with-Speex" \end_inset \end_layout \begin_layout Standard The \emph on libspeex \emph default library contains all the functions for encoding and decoding speech with the Speex codec. When linking on a UNIX system, one must add \emph on -lspeex -lm \emph default to the compiler command line. One important thing to know is that \series bold libspeex calls are reentrant, but not thread-safe \series default . That means that it is fine to use calls from many threads, but \series bold calls using the same state from multiple threads must be protected by mutexes \series default . Examples of code can also be found in Appendix \begin_inset CommandInset ref LatexCommand ref reference "sec:Sample-code" \end_inset and the complete API documentation is included in the Documentation section of the Speex website (http://www.speex.org/). \end_layout \begin_layout Section Encoding \begin_inset CommandInset label LatexCommand label name "sub:Encoding" \end_inset \end_layout \begin_layout Standard In order to encode speech using Speex, one first needs to: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout #include \end_layout \end_inset Then in the code, a Speex bit-packing struct must be declared, along with a Speex encoder state: \begin_inset listings inline false status open \begin_layout Plain Layout SpeexBits bits; \end_layout \begin_layout Plain Layout void *enc_state; \end_layout \end_inset The two are initialized by: \begin_inset listings inline false status open \begin_layout Plain Layout speex_bits_init(&bits); \end_layout \begin_layout Plain Layout enc_state = speex_encoder_init(&speex_nb_mode); \end_layout \end_inset \end_layout \begin_layout Standard For wideband coding, \emph on speex_nb_mode \emph default will be replaced by \emph on speex_wb_mode \emph default . In most cases, you will need to know the frame size used at the sampling rate you are using. You can get that value in the \emph on frame_size \emph default variable (expressed in \series bold samples \series default , not bytes) with: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size); \end_layout \end_inset \end_layout \begin_layout Standard In practice, \emph on frame_size \emph default will correspond to 20 ms when using 8, 16, or 32 kHz sampling rate. There are many parameters that can be set for the Speex encoder, but the most useful one is the quality parameter that controls the quality vs bit-rate tradeoff. This is set by: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_encoder_ctl(enc_state,SPEEX_SET_QUALITY,&quality); \end_layout \end_inset where \emph on quality \emph default is an integer value ranging from 0 to 10 (inclusively). The mapping between quality and bit-rate is described in Fig. \begin_inset CommandInset ref LatexCommand ref reference "cap:quality_vs_bps" \end_inset for narrowband. \end_layout \begin_layout Standard Once the initialization is done, for every input frame: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_bits_reset(&bits); \end_layout \begin_layout Plain Layout speex_encode_int(enc_state, input_frame, &bits); \end_layout \begin_layout Plain Layout nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES); \end_layout \end_inset \end_layout \begin_layout Standard where \emph on input_frame \emph default is a \emph on ( \emph default short \emph on *) \emph default pointing to the beginning of a speech frame, \emph on byte_ptr \emph default is a \emph on (char *) \emph default where the encoded frame will be written, \emph on MAX_NB_BYTES \emph default is the maximum number of bytes that can be written to \emph on byte_ptr \emph default without causing an overflow and \emph on nbBytes \emph default is the number of bytes actually written to \emph on byte_ptr \emph default (the encoded size in bytes). Before calling speex_bits_write, it is possible to find the number of bytes that need to be written by calling \family typewriter speex_bits_nbytes(&bits) \family default , which returns a number of bytes. \end_layout \begin_layout Standard It is still possible to use the \emph on speex_encode() \emph default function, which takes a \emph on (float *) \emph default for the audio. However, this would make an eventual port to an FPU-less platform (like ARM) more complicated. Internally, \emph on speex_encode() \emph default and \emph on speex_encode_int() \emph default are processed in the same way. Whether the encoder uses the fixed-point version is only decided by the compile-time flags, not at the API level. \end_layout \begin_layout Standard After you're done with the encoding, free all resources with: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_bits_destroy(&bits); \end_layout \begin_layout Plain Layout speex_encoder_destroy(enc_state); \end_layout \end_inset \end_layout \begin_layout Standard That's about it for the encoder. \end_layout \begin_layout Section Decoding \begin_inset CommandInset label LatexCommand label name "sub:Decoding" \end_inset \end_layout \begin_layout Standard In order to decode speech using Speex, you first need to: \begin_inset listings inline false status open \begin_layout Plain Layout #include \end_layout \end_inset You also need to declare a Speex bit-packing struct \begin_inset listings inline false status open \begin_layout Plain Layout SpeexBits bits; \end_layout \end_inset and a Speex decoder state \begin_inset listings inline false status open \begin_layout Plain Layout void *dec_state; \end_layout \end_inset The two are initialized by: \begin_inset listings inline false status open \begin_layout Plain Layout speex_bits_init(&bits); \end_layout \begin_layout Plain Layout dec_state = speex_decoder_init(&speex_nb_mode); \end_layout \end_inset \end_layout \begin_layout Standard For wideband decoding, \emph on speex_nb_mode \emph default will be replaced by \emph on speex_wb_mode \emph default . If you need to obtain the size of the frames that will be used by the decoder, you can get that value in the \emph on frame_size \emph default variable (expressed in \series bold samples \series default , not bytes) with: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size); \end_layout \end_inset \end_layout \begin_layout Standard There is also a parameter that can be set for the decoder: whether or not to use a perceptual enhancer. This can be set by: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh); \end_layout \end_inset \end_layout \begin_layout Standard where \emph on enh \emph default is an int with value 0 to have the enhancer disabled and 1 to have it enabled. As of 1.2-beta1, the default is now to enable the enhancer. \end_layout \begin_layout Standard Again, once the decoder initialization is done, for every input frame: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_bits_read_from(&bits, input_bytes, nbBytes); \end_layout \begin_layout Plain Layout speex_decode_int(dec_state, &bits, output_frame); \end_layout \end_inset where input_bytes is a \emph on (char *) \emph default containing the bit-stream data received for a frame, \emph on nbBytes \emph default is the size (in bytes) of that bit-stream, and \emph on output_frame \emph default is a \emph on (short *) \emph default and points to the area where the decoded speech frame will be written. A NULL value as the second argument indicates that we don't have the bits for the current frame. When a frame is lost, the Speex decoder will do its best to "guess" the correct signal. \end_layout \begin_layout Standard As for the encoder, the \emph on speex_decode() \emph default function can still be used, with a \emph on (float *) \emph default as the output for the audio. After you're done with the decoding, free all resources with: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_bits_destroy(&bits); \end_layout \begin_layout Plain Layout speex_decoder_destroy(dec_state); \end_layout \end_inset \end_layout \begin_layout Section Codec Options (speex_*_ctl) \begin_inset CommandInset label LatexCommand label name "sub:Codec-Options" \end_inset \end_layout \begin_layout Quote \align center \emph on Entities should not be multiplied beyond necessity -- William of Ockham. \end_layout \begin_layout Quote \align center \emph on Just because there's an option for it doesn't mean you have to turn it on -- me. \end_layout \begin_layout Standard The Speex encoder and decoder support many options and requests that can be accessed through the \emph on speex_encoder_ctl \emph default and \emph on speex_decoder_ctl \emph default functions. These functions are similar to the \emph on ioctl \emph default system call and their prototypes are: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout void speex_encoder_ctl(void *encoder, int request, void *ptr); \end_layout \begin_layout Plain Layout void speex_decoder_ctl(void *encoder, int request, void *ptr); \end_layout \end_inset \end_layout \begin_layout Standard Despite those functions, the defaults are usually good for many applications and \series bold optional settings should only be used when one understands them and knows that they are needed \series default . A common error is to attempt to set many unnecessary settings. \end_layout \begin_layout Standard Here is a list of the values allowed for the requests. Some only apply to the encoder or the decoder. Because the last argument is of type \begin_inset listings inline true status collapsed \begin_layout Plain Layout void * \end_layout \end_inset , the \begin_inset listings inline true status collapsed \begin_layout Plain Layout _ctl() \end_layout \end_inset functions are \series bold not type safe \series default , and should thus be used with care. The type \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset is the same as the C99 \begin_inset listings inline true status collapsed \begin_layout Plain Layout int32_t \end_layout \end_inset type. \end_layout \begin_layout Description SPEEX_SET_ENH \begin_inset Formula $\ddagger$ \end_inset Set perceptual enhancer \begin_inset Index status collapsed \begin_layout Plain Layout perceptual enhancement \end_layout \end_inset to on (1) or off (0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset , default is on) \end_layout \begin_layout Description SPEEX_GET_ENH \begin_inset Formula $\ddagger$ \end_inset Get perceptual enhancer status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_GET_FRAME_SIZE Get the number of samples per frame for the current mode ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_SET_QUALITY \begin_inset Formula $\dagger$ \end_inset Set the encoder speech quality ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset from 0 to 10, default is 8) \end_layout \begin_layout Description SPEEX_GET_QUALITY \begin_inset Formula $\dagger$ \end_inset Get the current encoder speech quality ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset from 0 to 10) \end_layout \begin_layout Description SPEEX_SET_MODE \begin_inset Formula $\dagger$ \end_inset Set the mode number, as specified in the RTP spec ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_GET_MODE \begin_inset Formula $\dagger$ \end_inset Get the current mode number, as specified in the RTP spec ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_SET_VBR \begin_inset Formula $\dagger$ \end_inset Set variable bit-rate (VBR) to on (1) or off (0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset , default is off) \end_layout \begin_layout Description SPEEX_GET_VBR \begin_inset Formula $\dagger$ \end_inset Get variable bit-rate \begin_inset Index status collapsed \begin_layout Plain Layout variable bit-rate \end_layout \end_inset (VBR) status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_SET_VBR_QUALITY \begin_inset Formula $\dagger$ \end_inset Set the encoder VBR speech quality (float 0.0 to 10.0, default is 8.0) \end_layout \begin_layout Description SPEEX_GET_VBR_QUALITY \begin_inset Formula $\dagger$ \end_inset Get the current encoder VBR speech quality (float 0 to 10) \end_layout \begin_layout Description SPEEX_SET_COMPLEXITY \begin_inset Formula $\dagger$ \end_inset Set the CPU resources allowed for the encoder ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset from 1 to 10, default is 2) \end_layout \begin_layout Description SPEEX_GET_COMPLEXITY \begin_inset Formula $\dagger$ \end_inset Get the CPU resources allowed for the encoder ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset from 1 to 10, default is 2) \end_layout \begin_layout Description SPEEX_SET_BITRATE \begin_inset Formula $\dagger$ \end_inset Set the bit-rate to use the closest value not exceeding the parameter ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in bits per second) \end_layout \begin_layout Description SPEEX_GET_BITRATE Get the current bit-rate in use ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in bits per second) \end_layout \begin_layout Description SPEEX_SET_SAMPLING_RATE Set real sampling rate ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in Hz) \end_layout \begin_layout Description SPEEX_GET_SAMPLING_RATE Get real sampling rate ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in Hz) \end_layout \begin_layout Description SPEEX_RESET_STATE Reset the encoder/decoder state to its original state, clearing all memories (no argument) \end_layout \begin_layout Description SPEEX_SET_VAD \begin_inset Formula $\dagger$ \end_inset Set voice activity detection \begin_inset Index status collapsed \begin_layout Plain Layout voice activity detection \end_layout \end_inset (VAD) to on (1) or off (0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset , default is off) \end_layout \begin_layout Description SPEEX_GET_VAD \begin_inset Formula $\dagger$ \end_inset Get voice activity detection (VAD) status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_SET_DTX \begin_inset Formula $\dagger$ \end_inset Set discontinuous transmission \begin_inset Index status collapsed \begin_layout Plain Layout discontinuous transmission \end_layout \end_inset (DTX) to on (1) or off (0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset , default is off) \end_layout \begin_layout Description SPEEX_GET_DTX \begin_inset Formula $\dagger$ \end_inset Get discontinuous transmission (DTX) status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_SET_ABR \begin_inset Formula $\dagger$ \end_inset Set average bit-rate \begin_inset Index status collapsed \begin_layout Plain Layout average bit-rate \end_layout \end_inset (ABR) to a value n in bits per second ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in bits per second) \end_layout \begin_layout Description SPEEX_GET_ABR \begin_inset Formula $\dagger$ \end_inset Get average bit-rate (ABR) setting ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in bits per second) \end_layout \begin_layout Description SPEEX_SET_PLC_TUNING \begin_inset Formula $\dagger$ \end_inset Tell the encoder to optimize encoding for a certain percentage of packet loss ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in percent) \end_layout \begin_layout Description SPEEX_GET_PLC_TUNING \begin_inset Formula $\dagger$ \end_inset Get the current tuning of the encoder for PLC ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in percent) \end_layout \begin_layout Description SPEEX_GET_LOOKAHEAD Returns the lookahead used by Speex separately for an encoder and a decoder. Sum encoder and decoder lookahead values to get the total codec lookahead. \end_layout \begin_layout Description SPEEX_SET_VBR_MAX_BITRATE \begin_inset Formula $\dagger$ \end_inset Set the maximum bit-rate allowed in VBR operation ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in bits per second) \end_layout \begin_layout Description SPEEX_GET_VBR_MAX_BITRATE \begin_inset Formula $\dagger$ \end_inset Get the current maximum bit-rate allowed in VBR operation ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset in bits per second) \end_layout \begin_layout Description SPEEX_SET_HIGHPASS Set the high-pass filter on (1) or off (0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset , default is on) \end_layout \begin_layout Description SPEEX_GET_HIGHPASS Get the current high-pass filter status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description \begin_inset Formula $\dagger$ \end_inset applies only to the encoder \end_layout \begin_layout Description \begin_inset Formula $\ddagger$ \end_inset applies only to the decoder \end_layout \begin_layout Section Mode queries \begin_inset CommandInset label LatexCommand label name "sub:Mode-queries" \end_inset \end_layout \begin_layout Standard Speex modes have a query system similar to the speex_encoder_ctl and speex_decod er_ctl calls. Since modes are read-only, it is only possible to get information about a particular mode. The function used to do that is: \begin_inset listings inline false status open \begin_layout Plain Layout void speex_mode_query(SpeexMode *mode, int request, void *ptr); \end_layout \end_inset The admissible values for request are (unless otherwise note, the values are returned through \emph on ptr \emph default ): \end_layout \begin_layout Description SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode \end_layout \begin_layout Description SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified through \emph on ptr \emph default (integer in bps). \end_layout \begin_layout Section Packing and in-band signalling \begin_inset Index status collapsed \begin_layout Plain Layout in-band signalling \end_layout \end_inset \end_layout \begin_layout Standard Sometimes it is desirable to pack more than one frame per packet (or other basic unit of storage). The proper way to do it is to call speex_encode \begin_inset Formula $N$ \end_inset times before writing the stream with speex_bits_write. In cases where the number of frames is not determined by an out-of-band mechanism, it is possible to include a terminator code. That terminator consists of the code 15 (decimal) encoded with 5 bits, as shown in Table \begin_inset CommandInset ref LatexCommand ref reference "cap:quality_vs_bps" \end_inset . Note that as of version 1.0.2, calling speex_bits_write automatically inserts the terminator so as to fill the last byte. This doesn't involves any overhead and makes sure Speex can always detect when there is no more frame in a packet. \end_layout \begin_layout Standard It is also possible to send in-band \begin_inset Quotes eld \end_inset messages \begin_inset Quotes erd \end_inset to the other side. All these messages are encoded as \begin_inset Quotes eld \end_inset pseudo-frames \begin_inset Quotes erd \end_inset of mode 14 which contain a 4-bit message type code, followed by the message. Table \begin_inset CommandInset ref LatexCommand ref reference "cap:In-band-signalling-codes" \end_inset lists the available codes, their meaning and the size of the message that follows. Most of these messages are requests that are sent to the encoder or decoder on the other end, which is free to comply or ignore them. By default, all in-band messages are ignored. \end_layout \begin_layout Standard \begin_inset Float table placement htbp wide false sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout Code \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Size (bits) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Content \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Asks decoder to set perceptual enhancement off (0) or on(1) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Asks (if 1) the encoder to be less \begin_inset Quotes eld \end_inset aggressive \begin_inset Quotes erd \end_inset due to high packet loss \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Asks encoder to switch to mode N \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Asks encoder to switch to mode N for low-band \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Asks encoder to switch to mode N for high-band \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Asks encoder to switch to quality N for VBR \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 6 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Request acknowledge (0=no, 1=all, 2=only for in-band data) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Asks encoder to set CBR (0), VAD(1), DTX(3), VBR(5), VBR+DTX(7) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Transmit (8-bit) character to the other end \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 9 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Intensity stereo information \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 10 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 16 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Announce maximum bit-rate acceptable (N in bytes/second) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 11 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 16 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 12 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 32 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Acknowledge receiving packet N \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 13 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 32 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 14 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 64 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 15 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 64 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved \end_layout \end_inset \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout In-band signalling codes \begin_inset CommandInset label LatexCommand label name "cap:In-band-signalling-codes" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard Finally, applications may define custom in-band messages using mode 13. The size of the message in bytes is encoded with 5 bits, so that the decoder can skip it if it doesn't know how to interpret it. \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Chapter Speech Processing API ( \emph on libspeexdsp \emph default ) \end_layout \begin_layout Standard As of version 1.2beta3, the non-codec parts of the Speex package are now in a separate library called \emph on libspeexdsp \emph default . This library includes the preprocessor, the acoustic echo canceller, the jitter buffer, and the resampler. In a UNIX environment, it can be linked into a program by adding \emph on -lspeexdsp -lm \emph default to the compiler command line. Just like for libspeex, \series bold libspeexdsp calls are reentrant, but not thread-safe \series default . That means that it is fine to use calls from many threads, but \series bold calls using the same state from multiple threads must be protected by mutexes \series default . \end_layout \begin_layout Section Preprocessor \begin_inset CommandInset label LatexCommand label name "sub:Preprocessor" \end_inset \end_layout \begin_layout Standard \noindent In order to use the Speex preprocessor \begin_inset Index status collapsed \begin_layout Plain Layout preprocessor \end_layout \end_inset , you first need to: \begin_inset listings inline false status open \begin_layout Plain Layout #include \end_layout \end_inset \end_layout \begin_layout Standard \noindent Then, a preprocessor state can be created as: \begin_inset listings inline false status open \begin_layout Plain Layout SpeexPreprocessState *preprocess_state = speex_preprocess_state_init(frame_size, sampling_rate); \end_layout \end_inset \end_layout \begin_layout Standard \noindent and it is recommended to use the same value for \family typewriter frame_size \family default as is used by the encoder (20 \emph on ms \emph default ). \end_layout \begin_layout Standard For each input frame, you need to call: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_preprocess_run(preprocess_state, audio_frame); \end_layout \end_inset \end_layout \begin_layout Standard \noindent where \family typewriter audio_frame \family default is used both as input and output. In cases where the output audio is not useful for a certain frame, it is possible to use instead: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_preprocess_estimate_update(preprocess_state, audio_frame); \end_layout \end_inset \end_layout \begin_layout Standard \noindent This call will update all the preprocessor internal state variables without computing the output audio, thus saving some CPU cycles. \end_layout \begin_layout Standard The behaviour of the preprocessor can be changed using: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_preprocess_ctl(preprocess_state, request, ptr); \end_layout \end_inset \end_layout \begin_layout Standard \noindent which is used in the same way as the encoder and decoder equivalent. Options are listed in Section \begin_inset CommandInset ref LatexCommand ref reference "sub:Preprocessor-options" \end_inset . \end_layout \begin_layout Standard The preprocessor state can be destroyed using: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_preprocess_state_destroy(preprocess_state); \end_layout \end_inset \end_layout \begin_layout Subsection Preprocessor options \begin_inset CommandInset label LatexCommand label name "sub:Preprocessor-options" \end_inset \end_layout \begin_layout Standard As with the codec, the preprocessor also has options that can be controlled using an ioctl()-like call. The available options are: \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_DENOISE Turns denoising on(1) or off(0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_DENOISE Get denoising status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_AGC Turns automatic gain control (AGC) on(1) or off(0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_AGC Get AGC status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_VAD Turns voice activity detector (VAD) on(1) or off(0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_VAD Get VAD status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_AGC_LEVEL \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_AGC_LEVEL \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_DEREVERB Turns reverberation removal on(1) or off(0) ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_DEREVERB Get reverberation removal status ( \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_DEREVERB_LEVEL Not working yet, do not use \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_DEREVERB_LEVEL Not working yet, do not use \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_DEREVERB_DECAY Not working yet, do not use \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_DEREVERB_DECAY Not working yet, do not use \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_PROB_START \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_PROB_START \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_PROB_CONTINUE \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_PROB_CONTINUE \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_NOISE_SUPPRESS Set maximum attenuation of the noise in dB (negative \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_NOISE_SUPPRESS Get maximum attenuation of the noise in dB (negative \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_ECHO_SUPPRESS Set maximum attenuation of the residual echo in dB (negative \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_ECHO_SUPPRESS Get maximum attenuation of the residual echo in dB (negative \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_ECHO_SUPPRESS_ACTIVE Set maximum attenuation of the echo in dB when near end is active (negative \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_ECHO_SUPPRESS_ACTIVE Get maximum attenuation of the echo in dB when near end is active (negative \begin_inset listings inline true status collapsed \begin_layout Plain Layout spx_int32_t \end_layout \end_inset ) \end_layout \begin_layout Description SPEEX_PREPROCESS_SET_ECHO_STATE Set the associated echo canceller for residual echo suppression (pointer or NULL for no residual echo suppression) \end_layout \begin_layout Description SPEEX_PREPROCESS_GET_ECHO_STATE Get the associated echo canceller (pointer) \end_layout \begin_layout Section Echo Cancellation \begin_inset CommandInset label LatexCommand label name "sub:Echo-Cancellation" \end_inset \end_layout \begin_layout Standard The Speex library now includes an echo cancellation \begin_inset Index status collapsed \begin_layout Plain Layout echo cancellation \end_layout \end_inset algorithm suitable for Acoustic Echo Cancellation \begin_inset Index status collapsed \begin_layout Plain Layout acoustic echo cancellation \end_layout \end_inset (AEC). In order to use the echo canceller, you first need to \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout #include \end_layout \end_inset \end_layout \begin_layout Standard Then, an echo canceller state can be created by: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout SpeexEchoState *echo_state = speex_echo_state_init(frame_size, filter_length); \end_layout \end_inset \end_layout \begin_layout Standard where \family typewriter frame_size \family default is the amount of data (in samples) you want to process at once and \family typewriter filter_length \family default is the length (in samples) of the echo cancelling filter you want to use (also known as \shape italic tail length \shape default \begin_inset Index status collapsed \begin_layout Plain Layout tail length \end_layout \end_inset ). It is recommended to use a frame size in the order of 20 ms (or equal to the codec frame size) and make sure it is easy to perform an FFT of that size (powers of two are better than prime sizes). The recommended tail length is approximately the third of the room reverberatio n time. For example, in a small room, reverberation time is in the order of 300 ms, so a tail length of 100 ms is a good choice (800 samples at 8000 Hz sampling rate). \end_layout \begin_layout Standard Once the echo canceller state is created, audio can be processed by: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame); \end_layout \end_inset \end_layout \begin_layout Standard where \family typewriter input_frame \family default is the audio as captured by the microphone, \family typewriter echo_frame \family default is the signal that was played in the speaker (and needs to be removed) and \family typewriter output_frame \family default is the signal with echo removed. \end_layout \begin_layout Standard One important thing to keep in mind is the relationship between \family typewriter input_frame \family default and \family typewriter echo_frame \family default . It is important that, at any time, any echo that is present in the input has already been sent to the echo canceller as \family typewriter echo_frame \family default . In other words, the echo canceller cannot remove a signal that it hasn't yet received. On the other hand, the delay between the input signal and the echo signal must be small enough because otherwise part of the echo cancellation filter is inefficient. In the ideal case, you code would look like: \begin_inset listings lstparams "breaklines=true" inline false status open \begin_layout Plain Layout write_to_soundcard(echo_frame, frame_size); \end_layout \begin_layout Plain Layout read_from_soundcard(input_frame, frame_size); \end_layout \begin_layout Plain Layout speex_echo_cancellation(echo_state, input_frame, echo_frame, output_frame); \end_layout \end_inset \end_layout \begin_layout Standard If you wish to further reduce the echo present in the signal, you can do so by associating the echo canceller to the preprocessor (see Section \begin_inset CommandInset ref LatexCommand ref reference "sub:Preprocessor" \end_inset ). This is done by calling: \begin_inset listings lstparams "breaklines=true" inline false status open \begin_layout Plain Layout speex_preprocess_ctl(preprocess_state, SPEEX_PREPROCESS_SET_ECHO_STATE,echo_stat e); \end_layout \end_inset in the initialisation. \end_layout \begin_layout Standard As of version 1.2-beta2, there is an alternative, simpler API that can be used instead of \emph on speex_echo_cancellation() \emph default . When audio capture and playback are handled asynchronously (e.g. in different threads or using the \emph on poll() \emph default or \emph on select() \emph default system call), it can be difficult to keep track of what input_frame comes with what echo_frame. Instead, the playback context/thread can simply call: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_echo_playback(echo_state, echo_frame); \end_layout \end_inset \end_layout \begin_layout Standard every time an audio frame is played. Then, the capture context/thread calls: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_echo_capture(echo_state, input_frame, output_frame); \end_layout \end_inset \end_layout \begin_layout Standard for every frame captured. Internally, \emph on speex_echo_playback() \emph default simply buffers the playback frame so it can be used by \emph on speex_echo_capture() \emph default to call \emph on speex_echo_cancel() \emph default . A side effect of using this alternate API is that the playback audio is delayed by two frames, which is the normal delay caused by the soundcard. When capture and playback are already synchronised, \emph on speex_echo_cancellation() \emph default is preferable since it gives better control on the exact input/echo timing. \end_layout \begin_layout Standard The echo cancellation state can be destroyed with: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_echo_state_destroy(echo_state); \end_layout \end_inset \end_layout \begin_layout Standard It is also possible to reset the state of the echo canceller so it can be reused without the need to create another state with: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout speex_echo_state_reset(echo_state); \end_layout \end_inset \end_layout \begin_layout Subsection Troubleshooting \end_layout \begin_layout Standard There are several things that may prevent the echo canceller from working properly. One of them is a bug (or something suboptimal) in the code, but there are many others you should consider first \end_layout \begin_layout Itemize Using a different soundcard to do the capture and plaback will \series bold not \series default work, regardless of what you may think. The only exception to that is if the two cards can be made to have their sampling clock \begin_inset Quotes eld \end_inset locked \begin_inset Quotes erd \end_inset on the same clock source. If not, the clocks will always have a small amount of drift, which will prevent the echo canceller from adapting. \end_layout \begin_layout Itemize The delay between the record and playback signals must be minimal. Any signal played has to \begin_inset Quotes eld \end_inset appear \begin_inset Quotes erd \end_inset on the playback (far end) signal slightly before the echo canceller \begin_inset Quotes eld \end_inset sees \begin_inset Quotes erd \end_inset it in the near end signal, but excessive delay means that part of the filter length is wasted. In the worst situations, the delay is such that it is longer than the filter length, in which case, no echo can be cancelled. \end_layout \begin_layout Itemize When it comes to echo tail length (filter length), longer is \series bold not \series default better. Actually, the longer the tail length, the longer it takes for the filter to adapt. Of course, a tail length that is too short will not cancel enough echo, but the most common problem seen is that people set a very long tail length and then wonder why no echo is being cancelled. \end_layout \begin_layout Itemize Non-linear distortion cannot (by definition) be modeled by the linear adaptive filter used in the echo canceller and thus cannot be cancelled. Use good audio gear and avoid saturation/clipping. \end_layout \begin_layout Standard Also useful is reading \emph on Echo Cancellation Demystified \emph default by Alexey Frunze \begin_inset Foot status collapsed \begin_layout Plain Layout http://www.embeddedstar.com/articles/2003/7/article20030720-1.html \end_layout \end_inset , which explains the fundamental principles of echo cancellation. The details of the algorithm described in the article are different, but the general ideas of echo cancellation through adaptive filters are the same. \end_layout \begin_layout Standard As of version 1.2beta2, a new \family typewriter echo_diagnostic.m \family default tool is included in the source distribution. The first step is to define DUMP_ECHO_CANCEL_DATA during the build. This causes the echo canceller to automatically save the near-end, far-end and output signals to files (aec_rec.sw aec_play.sw and aec_out.sw). These are exactly what the AEC receives and outputs. From there, it is necessary to start Octave and type: \end_layout \begin_layout Standard \begin_inset listings lstparams "language=Matlab" inline false status open \begin_layout Plain Layout echo_diagnostic('aec_rec.sw', 'aec_play.sw', 'aec_diagnostic.sw', 1024); \end_layout \end_inset \end_layout \begin_layout Standard The value of 1024 is the filter length and can be changed. There will be some (hopefully) useful messages printed and echo cancelled audio will be saved to aec_diagnostic.sw . If even that output is bad (almost no cancellation) then there is probably problem with the playback or recording process. \end_layout \begin_layout Section Jitter Buffer \end_layout \begin_layout Standard The jitter buffer can be enabled by including: \begin_inset listings lstparams "breaklines=true" inline false status open \begin_layout Plain Layout #include \end_layout \end_inset and a new jitter buffer state can be initialised by: \end_layout \begin_layout Standard \begin_inset listings lstparams "breaklines=true" inline false status open \begin_layout Plain Layout JitterBuffer *state = jitter_buffer_init(step); \end_layout \end_inset \end_layout \begin_layout Standard where the \begin_inset listings inline true status collapsed \begin_layout Plain Layout step \end_layout \end_inset argument is the default time step (in timestamp units) used for adjusting the delay and doing concealment. A value of 1 is always correct, but higher values may be more convenient sometimes. For example, if you are only able to do concealment on 20ms frames, there is no point in the jitter buffer asking you to do it on one sample. Another example is that for video, it makes no sense to adjust the delay by less than a full frame. The value provided can always be changed at a later time. \end_layout \begin_layout Standard The jitter buffer API is based on the \begin_inset listings inline true status open \begin_layout Plain Layout JitterBufferPacket \end_layout \end_inset type, which is defined as: \begin_inset listings inline false status open \begin_layout Plain Layout typedef struct { \end_layout \begin_layout Plain Layout char *data; /* Data bytes contained in the packet */ \end_layout \begin_layout Plain Layout spx_uint32_t len; /* Length of the packet in bytes */ \end_layout \begin_layout Plain Layout spx_uint32_t timestamp; /* Timestamp for the packet */ \end_layout \begin_layout Plain Layout spx_uint32_t span; /* Time covered by the packet (timestamp units) */ \end_layout \begin_layout Plain Layout } JitterBufferPacket; \end_layout \end_inset \end_layout \begin_layout Standard As an example, for audio the timestamp field would be what is obtained from the RTP timestamp field and the span would be the number of samples that are encoded in the packet. For Speex narrowband, span would be 160 if only one frame is included in the packet. \end_layout \begin_layout Standard When a packet arrives, it need to be inserter into the jitter buffer by: \begin_inset listings inline false status open \begin_layout Plain Layout JitterBufferPacket packet; \end_layout \begin_layout Plain Layout /* Fill in each field in the packet struct */ \end_layout \begin_layout Plain Layout jitter_buffer_put(state, &packet); \end_layout \end_inset \end_layout \begin_layout Standard When the decoder is ready to decode a packet the packet to be decoded can be obtained by: \begin_inset listings inline false status open \begin_layout Plain Layout int start_offset; \end_layout \begin_layout Plain Layout err = jitter_buffer_get(state, &packet, desired_span, &start_offset); \end_layout \end_inset \end_layout \begin_layout Standard If \begin_inset listings inline true status open \begin_layout Plain Layout jitter_buffer_put() \end_layout \end_inset and \begin_inset listings inline true status collapsed \begin_layout Plain Layout jitter_buffer_get() \end_layout \end_inset are called from different threads, then \series bold you need to protect the jitter buffer state with a mutex \series default . \end_layout \begin_layout Standard Because the jitter buffer is designed not to use an explicit timer, it needs to be told about the time explicitly. This is done by calling: \begin_inset listings inline false status open \begin_layout Plain Layout jitter_buffer_tick(state); \end_layout \end_inset \end_layout \begin_layout Standard This needs to be done periodically in the playing thread. This will be the last jitter buffer call before going to sleep (until more data is played back). In some cases, it may be preferable to use \begin_inset listings inline false status open \begin_layout Plain Layout jitter_buffer_remaining_span(state, remaining); \end_layout \end_inset \end_layout \begin_layout Standard The second argument is used to specify that we are still holding data that has not been written to the playback device. For instance, if 256 samples were needed by the soundcard (specified by \begin_inset listings inline true status collapsed \begin_layout Plain Layout desired_span \end_layout \end_inset ), but \begin_inset listings inline true status collapsed \begin_layout Plain Layout jitter_buffer_get() \end_layout \end_inset returned 320 samples, we would have \begin_inset listings inline true status open \begin_layout Plain Layout remaining=64 \end_layout \end_inset . \end_layout \begin_layout Section Resampler \end_layout \begin_layout Standard Speex includes a resampling modules. To make use of the resampler, it is necessary to include its header file: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout #include \end_layout \end_inset \end_layout \begin_layout Standard For each stream that is to be resampled, it is necessary to create a resampler state with: \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout SpeexResamplerState *resampler; \end_layout \begin_layout Plain Layout resampler = speex_resampler_init(nb_channels, input_rate, output_rate, quality, &err); \end_layout \end_inset \end_layout \begin_layout Standard where \begin_inset listings inline true status collapsed \begin_layout Plain Layout nb_channels \end_layout \end_inset is the number of channels that will be used (either interleaved or non-interlea ved), \begin_inset listings inline true status collapsed \begin_layout Plain Layout input_rate \end_layout \end_inset is the sampling rate of the input stream, \begin_inset listings inline true status collapsed \begin_layout Plain Layout output_rate \end_layout \end_inset is the sampling rate of the output stream and \begin_inset listings inline true status collapsed \begin_layout Plain Layout quality \end_layout \end_inset is the requested quality setting (0 to 10). The quality parameter is useful for controlling the quality/complexity/latency tradeoff. Using a higher quality setting means less noise/aliasing, a higher complexity and a higher latency. Usually, a quality of 3 is acceptable for most desktop uses and quality 10 is mostly recommended for pro audio work. Quality 0 usually has a decent sound (certainly better than using linear interpolation resampling), but artifacts may be heard. \end_layout \begin_layout Standard The actual resampling is performed using \end_layout \begin_layout Standard \begin_inset listings inline false status open \begin_layout Plain Layout err = speex_resampler_process_int(resampler, channelID, in, &in_length, out, &out_length); \end_layout \end_inset where \begin_inset listings inline true status collapsed \begin_layout Plain Layout channelID \end_layout \end_inset is the ID of the channel to be processed. For a mono stream, use 0. The \emph on in \emph default pointer points to the first sample of the input buffer for the selected channel and \begin_inset listings inline true status collapsed \begin_layout Plain Layout out \end_layout \end_inset points to the first sample of the output. The size of the input and output buffers are specified by \begin_inset listings inline true status collapsed \begin_layout Plain Layout in_length \end_layout \end_inset and \begin_inset listings inline true status collapsed \begin_layout Plain Layout out_length \end_layout \end_inset respectively. Upon completion, these values are replaced by the number of samples read and written by the resampler. Unless an error occurs, either all input samples will be read or all output samples will be written to (or both). For floating-point samples, the function \begin_inset listings inline true status open \begin_layout Plain Layout speex_resampler_process_float() \end_layout \end_inset behaves similarly. \end_layout \begin_layout Standard It is also possible to process multiple channels at once. To do that, you can use speex_resampler_process_interleaved_int() or \begin_inset listings inline true status open \begin_layout Plain Layout speex_resampler_process_interleaved_float() \end_layout \end_inset . The arguments are the same except that there is no \begin_inset listings inline true status collapsed \begin_layout Plain Layout channelID \end_layout \end_inset argument. Note that the \series bold length parameters are per-channel \series default . So if you have 1024 samples for each of 4 channels, you pass 1024 and not 4096. \end_layout \begin_layout Standard The resampler allows changing the quality and input/output sampling frequencies on the fly without glitches. This can be done with calls such as \begin_inset listings inline true status open \begin_layout Plain Layout speex_resampler_set_quality() \end_layout \end_inset and \begin_inset listings inline true status open \begin_layout Plain Layout speex_resampler_set_rate() \end_layout \end_inset . The only side effect is that a new filter will have to be recomputed, consuming many CPU cycles. \end_layout \begin_layout Standard When resampling a file, it is often desirable to have the output file perfectly synchronised with the input. To do that, you need to call \begin_inset listings inline true status open \begin_layout Plain Layout speex_resampler_skip_zeros() \end_layout \end_inset \series bold before \series default you start processing any samples. For real-time applications (e.g. VoIP), it is not recommended to do that as the first process frame will be shorter to compensate for the delay (the skipped zeros). Instead, in real-time applications you may want to know how many delay is introduced by the resampler. This can be done at run-time with \begin_inset listings inline true status open \begin_layout Plain Layout speex_resampler_get_input_latency() \end_layout \end_inset and \begin_inset listings inline true status open \begin_layout Plain Layout speex_resampler_get_output_latency() \end_layout \end_inset functions. First function returns delay measured in samples at input samplerate, while second returns delay measured in samples at output samplerate. \end_layout \begin_layout Standard To destroy a resampler state, just call \begin_inset listings inline true status open \begin_layout Plain Layout speex_resampler_destroy() \end_layout \end_inset . \end_layout \begin_layout Section Ring Buffer \end_layout \begin_layout Standard In some cases, it is necessary to interface components that use different block sizes. For example, it is possible that the soundcard does not support reading/writing in blocks of 20 \begin_inset space ~ \end_inset ms or sometimes, complicated resampling ratios mean that the blocks don't always have the same time. In thoses cases, it is often necessary to buffer a bit of audio using a ring buffer. \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Chapter Formats and standards \begin_inset Index status collapsed \begin_layout Plain Layout standards \end_layout \end_inset \begin_inset CommandInset label LatexCommand label name "sec:Formats-and-standards" \end_inset \end_layout \begin_layout Standard Speex can encode speech in both narrowband and wideband and provides different bit-rates. However, not all features need to be supported by a certain implementation or device. In order to be called \begin_inset Quotes eld \end_inset Speex compatible \begin_inset Quotes erd \end_inset (whatever that means), an implementation must implement at least a basic set of features. \end_layout \begin_layout Standard At the minimum, all narrowband modes of operation MUST be supported at the decoder. This includes the decoding of a wideband bit-stream by the narrowband decoder \begin_inset Foot status collapsed \begin_layout Plain Layout The wideband bit-stream contains an embedded narrowband bit-stream which can be decoded alone \end_layout \end_inset . If present, a wideband decoder MUST be able to decode a narrowband stream, and MAY either be able to decode all wideband modes or be able to decode the embedded narrowband part of all modes (which includes ignoring the high-band bits). \end_layout \begin_layout Standard For encoders, at least one narrowband or wideband mode MUST be supported. The main reason why all encoding modes do not have to be supported is that some platforms may not be able to handle the complexity of encoding in some modes. \end_layout \begin_layout Section RTP \begin_inset Index status collapsed \begin_layout Plain Layout RTP \end_layout \end_inset Payload Format \end_layout \begin_layout Standard The RTP payload draft is included in appendix \begin_inset CommandInset ref LatexCommand ref reference "sec:IETF-draft" \end_inset and the latest version is available at \begin_inset Flex URL status collapsed \begin_layout Plain Layout http://www.speex.org/drafts/latest \end_layout \end_inset . This draft has been sent (2003/02/26) to the Internet Engineering Task Force (IETF) and will be discussed at the March 18th meeting in San Francisco. \end_layout \begin_layout Section MIME Type \end_layout \begin_layout Standard For now, you should use the MIME type audio/x-speex for Speex-in-Ogg. We will apply for type \family typewriter audio/speex \family default in the near future. \end_layout \begin_layout Section Ogg \begin_inset Index status collapsed \begin_layout Plain Layout Ogg \end_layout \end_inset file format \end_layout \begin_layout Standard Speex bit-streams can be stored in Ogg files. In this case, the first packet of the Ogg file contains the Speex header described in table \begin_inset CommandInset ref LatexCommand ref reference "cap:ogg_speex_header" \end_inset . All integer fields in the headers are stored as little-endian. The \family typewriter speex_string \family default field must contain the \begin_inset Quotes eld \end_inset \family typewriter Speex \family default \begin_inset space ~ \end_inset \begin_inset space ~ \end_inset \begin_inset space ~ \end_inset \begin_inset Quotes erd \end_inset (with 3 trailing spaces), which identifies the bit-stream. The next field, \family typewriter speex_version \family default contains the version of Speex that encoded the file. For now, refer to speex_header.[ch] for more info. The \emph on beginning of stream \emph default ( \family typewriter b_o_s \family default ) flag is set to 1 for the header. The header packet has \family typewriter packetno=0 \family default and \family typewriter granulepos=0 \family default . \end_layout \begin_layout Standard The second packet contains the Speex comment header. The format used is the Vorbis comment format described here: http://www.xiph.org/ ogg/vorbis/doc/v-comment.html . This packet has \family typewriter packetno=1 \family default and \family typewriter granulepos=0 \family default . \end_layout \begin_layout Standard The third and subsequent packets each contain one or more (number found in header) Speex frames. These are identified with \family typewriter packetno \family default starting from 2 and the \family typewriter granulepos \family default is the number of the last sample encoded in that packet. The last of these packets has the \emph on end of stream \emph default ( \family typewriter e_o_s \family default ) flag is set to 1. \end_layout \begin_layout Standard \begin_inset Float table placement htbp wide true sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout Field \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Type \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Size \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout speex_string \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout char[] \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout speex_version \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout char[] \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 20 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout speex_version_id \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout header_size \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout rate \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout mode \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout mode_bitstream_version \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout nb_channels \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout bitrate \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame_size \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout vbr \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frames_per_packet \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout extra_headers \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout int \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Ogg/Speex header packet \begin_inset CommandInset label LatexCommand label name "cap:ogg_speex_header" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash clearpage \end_layout \end_inset \end_layout \begin_layout Chapter Introduction to CELP Coding \begin_inset Index status collapsed \begin_layout Plain Layout CELP \end_layout \end_inset \begin_inset CommandInset label LatexCommand label name "sec:Introduction-to-CELP" \end_inset \end_layout \begin_layout Quote \align center \emph on Do not meddle in the affairs of poles, for they are subtle and quick to leave the unit circle. \end_layout \begin_layout Standard Speex is based on CELP, which stands for Code Excited Linear Prediction. This section attempts to introduce the principles behind CELP, so if you are already familiar with CELP, you can safely skip to section \begin_inset CommandInset ref LatexCommand ref reference "sec:Speex-narrowband-mode" \end_inset . The CELP technique is based on three ideas: \end_layout \begin_layout Enumerate The use of a linear prediction (LP) model to model the vocal tract \end_layout \begin_layout Enumerate The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model \end_layout \begin_layout Enumerate The search performed in closed-loop in a \begin_inset Quotes eld \end_inset perceptually weighted domain \begin_inset Quotes erd \end_inset \end_layout \begin_layout Standard This section describes the basic ideas behind CELP. This is still a work in progress. \end_layout \begin_layout Section Source-Filter Model of Speech Prediction \end_layout \begin_layout Standard The source-filter model of speech production assumes that the vocal cords are the source of spectrally flat sound (the excitation signal), and that the vocal tract acts as a filter to spectrally shape the various sounds of speech. While still an approximation, the model is widely used in speech coding because of its simplicity.Its use is also the reason why most speech codecs (Speex included) perform badly on music signals. The different phonemes can be distinguished by their excitation (source) and spectral shape (filter). Voiced sounds (e.g. vowels) have an excitation signal that is periodic and that can be approximated by an impulse train in the time domain or by regularly-spaced harmonics in the frequency domain. On the other hand, fricatives (such as the "s", "sh" and "f" sounds) have an excitation signal that is similar to white Gaussian noise. So called voice fricatives (such as "z" and "v") have excitation signal composed of an harmonic part and a noisy part. \end_layout \begin_layout Standard The source-filter model is usually tied with the use of Linear prediction. The CELP model is based on source-filter model, as can be seen from the CELP decoder illustrated in Figure \begin_inset CommandInset ref LatexCommand ref reference "fig:The-CELP-model" \end_inset . \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Graphics filename celp_decoder.eps width 45page% keepAspectRatio \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout The CELP model of speech synthesis (decoder) \begin_inset CommandInset label LatexCommand label name "fig:The-CELP-model" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Section Linear Prediction Coefficients (LPC) \begin_inset Index status collapsed \begin_layout Plain Layout linear prediction \end_layout \end_inset \end_layout \begin_layout Standard Linear prediction is at the base of many speech coding techniques, including CELP. The idea behind it is to predict the signal \begin_inset Formula $x[n]$ \end_inset using a linear combination of its past samples: \end_layout \begin_layout Standard \begin_inset Formula \[ y[n]=\sum_{i=1}^{N}a_{i}x[n-i]\] \end_inset where \begin_inset Formula $y[n]$ \end_inset is the linear prediction of \begin_inset Formula $x[n]$ \end_inset . The prediction error is thus given by: \begin_inset Formula \[ e[n]=x[n]-y[n]=x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\] \end_inset \end_layout \begin_layout Standard The goal of the LPC analysis is to find the best prediction coefficients \begin_inset Formula $a_{i}$ \end_inset which minimize the quadratic error function: \begin_inset Formula \[ E=\sum_{n=0}^{L-1}\left[e[n]\right]^{2}=\sum_{n=0}^{L-1}\left[x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\right]^{2}\] \end_inset That can be done by making all derivatives \begin_inset Formula $\frac{\partial E}{\partial a_{i}}$ \end_inset equal to zero: \begin_inset Formula \[ \frac{\partial E}{\partial a_{i}}=\frac{\partial}{\partial a_{i}}\sum_{n=0}^{L-1}\left[x[n]-\sum_{i=1}^{N}a_{i}x[n-i]\right]^{2}=0\] \end_inset \end_layout \begin_layout Standard For an order \begin_inset Formula $N$ \end_inset filter, the filter coefficients \begin_inset Formula $a_{i}$ \end_inset are found by solving the system \begin_inset Formula $N\times N$ \end_inset linear system \begin_inset Formula $\mathbf{Ra}=\mathbf{r}$ \end_inset , where \begin_inset Formula \[ \mathbf{R}=\left[\begin{array}{cccc} R(0) & R(1) & \cdots & R(N-1)\\ R(1) & R(0) & \cdots & R(N-2)\\ \vdots & \vdots & \ddots & \vdots\\ R(N-1) & R(N-2) & \cdots & R(0)\end{array}\right]\] \end_inset \begin_inset Formula \[ \mathbf{r}=\left[\begin{array}{c} R(1)\\ R(2)\\ \vdots\\ R(N)\end{array}\right]\] \end_inset with \begin_inset Formula $R(m)$ \end_inset , the auto-correlation \begin_inset Index status collapsed \begin_layout Plain Layout auto-correlation \end_layout \end_inset of the signal \begin_inset Formula $x[n]$ \end_inset , computed as: \end_layout \begin_layout Standard \begin_inset Formula \[ R(m)=\sum_{i=0}^{N-1}x[i]x[i-m]\] \end_inset \end_layout \begin_layout Standard Because \begin_inset Formula $\mathbf{R}$ \end_inset is Hermitian Toeplitz, the Levinson-Durbin \begin_inset Index status collapsed \begin_layout Plain Layout Levinson-Durbin \end_layout \end_inset algorithm can be used, making the solution to the problem \begin_inset Formula $\mathcal{O}\left(N^{2}\right)$ \end_inset instead of \begin_inset Formula $\mathcal{O}\left(N^{3}\right)$ \end_inset . Also, it can be proven that all the roots of \begin_inset Formula $A(z)$ \end_inset are within the unit circle, which means that \begin_inset Formula $1/A(z)$ \end_inset is always stable. This is in theory; in practice because of finite precision, there are two commonly used techniques to make sure we have a stable filter. First, we multiply \begin_inset Formula $R(0)$ \end_inset by a number slightly above one (such as 1.0001), which is equivalent to adding noise to the signal. Also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducing sharp resonances. \end_layout \begin_layout Section Pitch Prediction \begin_inset Index status collapsed \begin_layout Plain Layout pitch \end_layout \end_inset \end_layout \begin_layout Standard During voiced segments, the speech signal is periodic, so it is possible to take advantage of that property by approximating the excitation signal \begin_inset Formula $e[n]$ \end_inset by a gain times the past of the excitation: \end_layout \begin_layout Standard \begin_inset Formula \[ e[n]\simeq p[n]=\beta e[n-T]\ ,\] \end_inset where \begin_inset Formula $T$ \end_inset is the pitch period, \begin_inset Formula $\beta$ \end_inset is the pitch gain. We call that long-term prediction since the excitation is predicted from \begin_inset Formula $e[n-T]$ \end_inset with \begin_inset Formula $T\gg N$ \end_inset . \end_layout \begin_layout Section Innovation Codebook \end_layout \begin_layout Standard The final excitation \begin_inset Formula $e[n]$ \end_inset will be the sum of the pitch prediction and an \emph on innovation \emph default signal \begin_inset Formula $c[n]$ \end_inset taken from a fixed codebook, hence the name \emph on Code \emph default Excited Linear Prediction. The final excitation is given by \end_layout \begin_layout Standard \begin_inset Formula \[ e[n]=p[n]+c[n]=\beta e[n-T]+c[n]\ .\] \end_inset The quantization of \begin_inset Formula $c[n]$ \end_inset is where most of the bits in a CELP codec are allocated. It represents the information that couldn't be obtained either from linear prediction or pitch prediction. In the \emph on z \emph default -domain we can represent the final signal \begin_inset Formula $X(z)$ \end_inset as \begin_inset Formula \[ X(z)=\frac{C(z)}{A(z)\left(1-\beta z^{-T}\right)}\] \end_inset \end_layout \begin_layout Section Noise Weighting \begin_inset Index status collapsed \begin_layout Plain Layout error weighting \end_layout \end_inset \begin_inset Index status collapsed \begin_layout Plain Layout analysis-by-synthesis \end_layout \end_inset \end_layout \begin_layout Standard Most (if not all) modern audio codecs attempt to \begin_inset Quotes eld \end_inset shape \begin_inset Quotes erd \end_inset the noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and \emph on vice versa \emph default . In order to maximize speech quality, CELP codecs minimize the mean square of the error (noise) in the perceptually weighted domain. This means that a perceptual noise weighting filter \begin_inset Formula $W(z)$ \end_inset is applied to the error signal in the encoder. In most CELP codecs, \begin_inset Formula $W(z)$ \end_inset is a pole-zero weighting filter derived from the linear prediction coefficients (LPC), generally using bandwidth expansion. Let the spectral envelope be represented by the synthesis filter \begin_inset Formula $1/A(z)$ \end_inset , CELP codecs typically derive the noise weighting filter as \begin_inset Formula \begin{equation} W(z)=\frac{A(z/\gamma_{1})}{A(z/\gamma_{2})}\ ,\label{eq:gamma-weighting}\end{equation} \end_inset where \begin_inset Formula $\gamma_{1}=0.9$ \end_inset and \begin_inset Formula $\gamma_{2}=0.6$ \end_inset in the Speex reference implementation. If a filter \begin_inset Formula $A(z)$ \end_inset has (complex) poles at \begin_inset Formula $p_{i}$ \end_inset in the \begin_inset Formula $z$ \end_inset -plane, the filter \begin_inset Formula $A(z/\gamma)$ \end_inset will have its poles at \begin_inset Formula $p'_{i}=\gamma p_{i}$ \end_inset , making it a flatter version of \begin_inset Formula $A(z)$ \end_inset . \end_layout \begin_layout Standard The weighting filter is applied to the error signal used to optimize the codebook search through analysis-by-synthesis (AbS). This results in a spectral shape of the noise that tends towards \begin_inset Formula $1/W(z)$ \end_inset . While the simplicity of the model has been an important reason for the success of CELP, it remains that \begin_inset Formula $W(z)$ \end_inset is a very rough approximation for the perceptually optimal noise weighting function. Fig. \begin_inset CommandInset ref LatexCommand ref reference "cap:Standard-noise-shaping" \end_inset illustrates the noise shaping that results from Eq. \begin_inset CommandInset ref LatexCommand ref reference "eq:gamma-weighting" \end_inset . Throughout this paper, we refer to \begin_inset Formula $W(z)$ \end_inset as the noise weighting filter and to \begin_inset Formula $1/W(z)$ \end_inset as the noise shaping filter (or curve). \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Graphics filename ref_shaping.eps width 45page% keepAspectRatio \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Standard noise shaping in CELP. Arbitrary y-axis offset. \begin_inset CommandInset label LatexCommand label name "cap:Standard-noise-shaping" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Section Analysis-by-Synthesis \end_layout \begin_layout Standard One of the main principles behind CELP is called Analysis-by-Synthesis (AbS), meaning that the encoding (analysis) is performed by perceptually optimising the decoded (synthesis) signal in a closed loop. In theory, the best CELP stream would be produced by trying all possible bit combinations and selecting the one that produces the best-sounding decoded signal. This is obviously not possible in practice for two reasons: the required complexity is beyond any currently available hardware and the \begin_inset Quotes eld \end_inset best sounding \begin_inset Quotes erd \end_inset selection criterion implies a human listener. \end_layout \begin_layout Standard In order to achieve real-time encoding using limited computing resources, the CELP optimisation is broken down into smaller, more manageable, sequential searches using the perceptual weighting function described earlier. \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Chapter The Speex Decoder Specification \end_layout \begin_layout Section Narrowband decoder \end_layout \begin_layout Standard \end_layout \begin_layout Subsection Narrowband modes \end_layout \begin_layout Standard There are 7 different narrowband bit-rates defined for Speex, ranging from 250 bps to 24.6 kbps, although the modes below 5.9 kbps should not be used for speech. The bit-allocation for each mode is detailed in table \begin_inset CommandInset ref LatexCommand ref reference "cap:bits-narrowband" \end_inset . Each frame starts with the mode ID encoded with 4 bits which allows a range from 0 to 15, though only the first 7 values are used (the others are reserved). The parameters are listed in the table in the order they are packed in the bit-stream. All frame-based parameters are packed before sub-frame parameters. The parameters for a certain sub-frame are all packed before the following sub-frame is packed. The \begin_inset Quotes eld \end_inset OL \begin_inset Quotes erd \end_inset in the parameter description means that the parameter is an open loop estimatio n based on the whole frame. \end_layout \begin_layout Standard \begin_inset Float table placement h wide true sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout Parameter \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Update rate \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 6 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Wideband bit \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Mode ID \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout LSP \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 18 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 18 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 18 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 18 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 30 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 30 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 30 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 18 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout OL pitch \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout OL pitch gain \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout OL Exc gain \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Fine pitch \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout sub-frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Pitch gain \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout sub-frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Innovation gain \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout sub-frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Innovation VQ \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout sub-frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 16 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 20 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 35 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 48 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 64 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 96 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 10 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Total \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 43 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 119 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 160 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 220 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 300 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 364 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 492 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 79 \end_layout \end_inset \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Bit allocation for narrowband modes \begin_inset CommandInset label LatexCommand label name "cap:bits-narrowband" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Subsection LSP decoding \end_layout \begin_layout Standard Depending on the mode, LSP parameters are encoded using either 18 bits or 30 bits. \end_layout \begin_layout Standard Interpolation \end_layout \begin_layout Standard Safe margin \end_layout \begin_layout Subsection Adaptive codebook \end_layout \begin_layout Standard For rates of 8 kbit/s and above, the pitch period is encoded for each subframe. The real period is \begin_inset Formula $T=p_{i}+17$ \end_inset where \begin_inset Formula $p_{i}$ \end_inset is a value encoded with 7 bits and 17 corresponds to the minimum pitch. The maximum period is 144. At 5.95 kbit/s (mode 2), the pitch period is similarly encoded, but only once for the frame. Each sub-frame then has a 2-bit offset that is added to the pitch value of the frame. In that case, the pitch for each sub-frame is equal to \begin_inset Formula $T-1+offset$ \end_inset . For rates below 5.95 kbit/s, only the per-frame pitch is used and the pitch is constant for all sub-frames. \end_layout \begin_layout Standard Speex uses a 3-tap predictor for rates of 5.95 kbit/s and above. The three gain values are obtained from a 5-bit or a 7-bit codebook, depending on the mode. \end_layout \begin_layout Subsection Innovation codebook \end_layout \begin_layout Standard Split codebook, size and entries depend on bit-rate \end_layout \begin_layout Standard a 5-bit gain is encoder on a per-frame basis \end_layout \begin_layout Standard Depending on the mode, higher resolution per sub-frame \end_layout \begin_layout Standard innovation sub-vectors concatenated, gain applied \end_layout \begin_layout Subsection Perceptual enhancement \end_layout \begin_layout Standard Optional, implementation-defined. \end_layout \begin_layout Subsection Bit-stream definition \end_layout \begin_layout Standard This section defines the bit-stream that is transmitted on the wire. One speex packet consist of 1 frame header and 4 sub-frames: \end_layout \begin_layout Standard \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout Frame Header \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Subframe 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Subframe2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Subframe 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Subframe 4 \end_layout \end_inset \end_inset \end_layout \begin_layout Standard The frame header is variable length, depending on decoding mode and submode. The narrowband frame header is defined as follows: \end_layout \begin_layout Standard \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout wb bit \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout modeid \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout LSP \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout OL-pitch \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout OL-pitchgain \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout OL ExcGain \end_layout \end_inset \end_inset \end_layout \begin_layout Standard wb-bit: Wideband bit (1 bit) 0=narrowband, 1=wideband \end_layout \begin_layout Standard modeid: Mode identifier (4 bits) \end_layout \begin_layout Standard LSP: Line Spectral Pairs (0, 18 or 30 bits) \end_layout \begin_layout Standard OL-pitch: Open Loop Pitch (0 or 7 bits) \end_layout \begin_layout Standard OL-pitchgain: Open Loop Pitch Gain (0 or 4 bits) \end_layout \begin_layout Standard OL-ExcGain: Open Loop Excitation Gain (0 or 5 bits) \end_layout \begin_layout Standard ... \end_layout \begin_layout Standard Each subframe is defined as follows: \end_layout \begin_layout Standard \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout FinePitch \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout PitchGain \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout InnovationGain \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Innovation VQ \end_layout \end_inset \end_inset \end_layout \begin_layout Standard FinePitch: (0 or 7 bits) \end_layout \begin_layout Standard PitchGain: (0, 5, or 7 bits) \end_layout \begin_layout Standard Innovation Gain: (0, 1, 3 bits) \end_layout \begin_layout Standard Innovation VQ: (0-96 bits) \end_layout \begin_layout Standard ... \end_layout \begin_layout Subsection Sample decoder \end_layout \begin_layout Standard This section contains some sample source code, showing how a basic Speex decoder can be implemented. The sample decoder is narrowband submode 3 only, and with no advanced features like enhancement, vbr etc. \end_layout \begin_layout Standard ... \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "nb_celp.c" lstparams "caption={Sample Decoder}" \end_inset \end_layout \begin_layout Subsection Lookup tables \end_layout \begin_layout Standard The Speex decoder includes a set of lookup tables and codebooks, which are used to convert between values of different domains. This includes: \end_layout \begin_layout Standard - Excitation 10x16 (3200 bps) \end_layout \begin_layout Standard - Excitation 10x32 (4000 bps) \end_layout \begin_layout Standard - Excitation 20x32 (2000 bps) \end_layout \begin_layout Standard - Excitation 5x256 (12800 bps) \end_layout \begin_layout Standard - Excitation 5x64 (9600 bps) \end_layout \begin_layout Standard - Excitation 8x128 (7000 bps) \end_layout \begin_layout Standard - Codebook for 3-tap pitch prediction gain (Normal and Low Bitrate) \end_layout \begin_layout Standard - Codebook for LSPs in narrowband CELP mode \end_layout \begin_layout Standard ... \end_layout \begin_layout Standard The exact lookup tables are included here for reference. \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/exc_5_64_table.c" \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/exc_5_256_table.c" \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/exc_8_128_table.c" \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/exc_10_16_table.c" \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/exc_10_32_table.c" \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/exc_20_32_table.c" \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/gain_table.c" \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/gain_table_lbr.c" \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "../libspeex/lsp_tables_nb.c" \end_inset \end_layout \begin_layout Section Wideband embedded decoder \end_layout \begin_layout Standard QMF filter. Narrowband signal decoded using narrowband decoder \end_layout \begin_layout Standard For the high band, the decoder is similar to the narrowband decoder, with the main difference being that there is no adaptive codebook. \end_layout \begin_layout Standard Gain is per-subframe \end_layout \begin_layout Chapter Speex narrowband mode \begin_inset CommandInset label LatexCommand label name "sec:Speex-narrowband-mode" \end_inset \begin_inset Index status collapsed \begin_layout Plain Layout narrowband \end_layout \end_inset \end_layout \begin_layout Standard This section looks at how Speex works for narrowband ( \begin_inset Formula $8\:\mathrm{kHz}$ \end_inset sampling rate) operation. The frame size for this mode is \begin_inset Formula $20\:\mathrm{ms}$ \end_inset , corresponding to 160 samples. Each frame is also subdivided into 4 sub-frames of 40 samples each. \end_layout \begin_layout Standard Also many design decisions were based on the original goals and assumptions: \end_layout \begin_layout Itemize Minimizing the amount of information extracted from past frames (for robustness to packet loss) \end_layout \begin_layout Itemize Dynamically-selectable codebooks (LSP, pitch and innovation) \end_layout \begin_layout Itemize sub-vector fixed (innovation) codebooks \end_layout \begin_layout Section Whole-Frame Analysis \begin_inset Index status collapsed \begin_layout Plain Layout linear prediction \end_layout \end_inset \end_layout \begin_layout Standard In narrowband, Speex frames are 20 ms long (160 samples) and are subdivided in 4 sub-frames of 5 ms each (40 samples). For most narrowband bit-rates (8 kbps and above), the only parameters encoded at the frame level are the Line Spectral Pairs (LSP) and a global excitation gain \begin_inset Formula $g_{frame}$ \end_inset , as shown in Fig. \begin_inset CommandInset ref LatexCommand ref reference "cap:Frame-open-loop-analysis" \end_inset . All other parameters are encoded at the sub-frame level. \end_layout \begin_layout Standard Linear prediction analysis is performed once per frame using an asymmetric Hamming window centered on the fourth sub-frame. Because linear prediction coefficients (LPC) are not robust to quantization, they are first converted to line spectral pairs (LSP) \begin_inset Index status collapsed \begin_layout Plain Layout line spectral pair \end_layout \end_inset . The LSP's are considered to be associated to the \begin_inset Formula $4^{th}$ \end_inset sub-frames and the LSP's associated to the first 3 sub-frames are linearly interpolated using the current and previous LSP coefficients. The LSP coefficients and converted back to the LPC filter \begin_inset Formula $\hat{A}(z)$ \end_inset . The non-quantized interpolated filter is denoted \begin_inset Formula $A(z)$ \end_inset and can be used for the weighting filter \begin_inset Formula $W(z)$ \end_inset because it does not need to be available to the decoder. \end_layout \begin_layout Standard To make Speex more robust to packet loss, no prediction is applied on the LSP coefficients prior to quantization. The LSPs are encoded using vector quantization (VQ) with 30 bits for higher quality modes and 18 bits for lower quality. \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Graphics filename speex_analysis.eps width 35page% \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Frame open-loop analysis \begin_inset CommandInset label LatexCommand label name "cap:Frame-open-loop-analysis" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Section Sub-Frame Analysis-by-Synthesis \end_layout \begin_layout Standard \begin_inset Float figure wide false sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Graphics filename speex_abs.eps lyxscale 75 width 40page% \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Analysis-by-synthesis closed-loop optimization on a sub-frame. \begin_inset CommandInset label LatexCommand label name "cap:Sub-frame-AbS" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard The analysis-by-synthesis (AbS) encoder loop is described in Fig. \begin_inset CommandInset ref LatexCommand ref reference "cap:Sub-frame-AbS" \end_inset . There are three main aspects where Speex significantly differs from most other CELP codecs. First, while most recent CELP codecs make use of fractional pitch estimation with a single gain, Speex uses an integer to encode the pitch period, but uses a 3-tap predictor (3 gains). The adaptive codebook contribution \begin_inset Formula $e_{a}[n]$ \end_inset can thus be expressed as: \begin_inset Formula \begin{equation} e_{a}[n]=g_{0}e[n-T-1]+g_{1}e[n-T]+g_{2}e[n-T+1]\label{eq:adaptive-3tap}\end{equation} \end_inset where \begin_inset Formula $g_{0}$ \end_inset , \begin_inset Formula $g_{1}$ \end_inset and \begin_inset Formula $g_{2}$ \end_inset are the jointly quantized pitch gains and \begin_inset Formula $e[n]$ \end_inset is the codec excitation memory. It is worth noting that when the pitch is smaller than the sub-frame size, we repeat the excitation at a period \begin_inset Formula $T$ \end_inset . For example, when \begin_inset Formula $n-T+1\geq0$ \end_inset , we use \begin_inset Formula $n-2T+1$ \end_inset instead. In most modes, the pitch period is encoded with 7 bits in the \begin_inset Formula $\left[17,144\right]$ \end_inset range and the \begin_inset Formula $\beta_{i}$ \end_inset coefficients are vector-quantized using 7 bits at higher bit-rates (15 kbps narrowband and above) and 5 bits at lower bit-rates (11 kbps narrowband and below). \end_layout \begin_layout Standard Many current CELP codecs use moving average (MA) prediction to encode the fixed codebook gain. This provides slightly better coding at the expense of introducing a dependency on previously encoded frames. A second difference is that Speex encodes the fixed codebook gain as the product of the global excitation gain \begin_inset Formula $g_{frame}$ \end_inset with a sub-frame gain corrections \begin_inset Formula $g_{subf}$ \end_inset . This increases robustness to packet loss by eliminating the inter-frame dependency. The sub-frame gain correction is encoded before the fixed codebook is searched (not closed-loop optimized) and uses between 0 and 3 bits per sub-frame, depending on the bit-rate. \end_layout \begin_layout Standard The third difference is that Speex uses sub-vector quantization of the innovatio n (fixed codebook) signal instead of an algebraic codebook. Each sub-frame is divided into sub-vectors of lengths ranging between 5 and 20 samples. Each sub-vector is chosen from a bitrate-dependent codebook and all sub-vectors are concatenated to form a sub-frame. As an example, the 3.95 kbps mode uses a sub-vector size of 20 samples with 32 entries in the codebook (5 bits). This means that the innovation is encoded with 10 bits per sub-frame, or 2000 bps. On the other hand, the 18.2 kbps mode uses a sub-vector size of 5 samples with 256 entries in the codebook (8 bits), so the innovation uses 64 bits per sub-frame, or 12800 bps. \end_layout \begin_layout Section Bit-rates \end_layout \begin_layout Standard So far, no MOS (Mean Opinion Score \begin_inset Index status collapsed \begin_layout Plain Layout mean opinion score \end_layout \end_inset ) subjective evaluation has been performed for Speex. In order to give an idea of the quality achievable with it, table \begin_inset CommandInset ref LatexCommand ref reference "cap:quality_vs_bps" \end_inset presents my own subjective opinion on it. It should be noted that different people will perceive the quality differently and that the person that designed the codec often has a bias (one way or another) when it comes to subjective evaluation. Last thing, it should be noted that for most codecs (including Speex) encoding quality sometimes varies depending on the input. Note that the complexity is only approximate (within 0.5 mflops and using the lowest complexity setting). Decoding requires approximately 0.5 mflops \begin_inset Index status collapsed \begin_layout Plain Layout complexity \end_layout \end_inset in most modes (1 mflops with perceptual enhancement). \end_layout \begin_layout Standard \begin_inset Float table placement h wide true sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout Mode \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Quality \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Bit-rate \begin_inset Index status collapsed \begin_layout Plain Layout bit-rate \end_layout \end_inset (bps) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout mflops \begin_inset Index status collapsed \begin_layout Plain Layout complexity \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Quality/description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 250 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout No transmission (DTX) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2,150 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 6 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Vocoder (mostly for comfort noise) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5,950 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 9 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Very noticeable artifacts/noise, good intelligibility \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3-4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8,000 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 10 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Artifacts/noise sometimes noticeable \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5-6 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 11,000 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 14 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Artifacts usually noticeable only with headphones \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7-8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 15,000 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 11 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Need good headphones to tell the difference \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 6 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 9 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 18,200 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 17.5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Hard to tell the difference even with good headphones \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 10 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 24,600 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 14.5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Completely transparent for voice, good quality music \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3,950 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 10.5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Very noticeable artifacts/noise, good intelligibility \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 9 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 10 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 11 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 12 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout reserved \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 13 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Application-defined, interpreted by callback or skipped \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 14 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Speex in-band signaling \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 15 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Terminator code \end_layout \end_inset \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Quality versus bit-rate \begin_inset CommandInset label LatexCommand label name "cap:quality_vs_bps" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Section Perceptual enhancement \begin_inset Index status collapsed \begin_layout Plain Layout perceptual enhancement \end_layout \end_inset \end_layout \begin_layout Standard \series bold This section was only valid for version 1.1.12 and earlier. It does not apply to version 1.2-beta1 (and later), for which the new perceptual enhancement is not yet documented. \end_layout \begin_layout Standard This part of the codec only applies to the decoder and can even be changed without affecting inter-operability. For that reason, the implementation provided and described here should only be considered as a reference implementation. The enhancement system is divided into two parts. First, the synthesis filter \begin_inset Formula $S(z)=1/A(z)$ \end_inset is replaced by an enhanced filter: \begin_inset Formula \[ S'(z)=\frac{A\left(z/a_{2}\right)A\left(z/a_{3}\right)}{A\left(z\right)A\left(z/a_{1}\right)}\] \end_inset where \begin_inset Formula $a_{1}$ \end_inset and \begin_inset Formula $a_{2}$ \end_inset depend on the mode in use and \begin_inset Formula $a_{3}=\frac{1}{r}\left(1-\frac{1-ra_{1}}{1-ra_{2}}\right)$ \end_inset with \begin_inset Formula $r=.9$ \end_inset . The second part of the enhancement consists of using a comb filter to enhance the pitch in the excitation domain. \end_layout \begin_layout Standard \begin_inset Newpage newpage \end_inset \end_layout \begin_layout Chapter Speex wideband mode (sub-band CELP) \begin_inset Index status collapsed \begin_layout Plain Layout wideband \end_layout \end_inset \begin_inset CommandInset label LatexCommand label name "sec:Speex-wideband-mode" \end_inset \end_layout \begin_layout Standard For wideband, the Speex approach uses a \emph on q \emph default uadrature \emph on m \emph default irror \emph on f \emph default ilter \begin_inset Index status collapsed \begin_layout Plain Layout quadrature mirror filter \end_layout \end_inset (QMF) to split the band in two. The 16 kHz signal is thus divided into two 8 kHz signals, one representing the low band (0-4 kHz), the other the high band (4-8 kHz). The low band is encoded with the narrowband mode described in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Speex-narrowband-mode" \end_inset in such a way that the resulting \begin_inset Quotes eld \end_inset embedded narrowband bit-stream \begin_inset Quotes erd \end_inset can also be decoded with the narrowband decoder. Since the low band encoding has already been described, only the high band encoding is described in this section. \end_layout \begin_layout Section Linear Prediction \end_layout \begin_layout Standard The linear prediction part used for the high-band is very similar to what is done for narrowband. The only difference is that we use only 12 bits to encode the high-band LSP's using a multi-stage vector quantizer (MSVQ). The first level quantizes the 10 coefficients with 6 bits and the error is then quantized using 6 bits, too. \end_layout \begin_layout Section Pitch Prediction \end_layout \begin_layout Standard That part is easy: there's no pitch prediction for the high-band. There are two reasons for that. First, there is usually little harmonic structure in this band (above 4 kHz). Second, it would be very hard to implement since the QMF folds the 4-8 kHz band into 4-0 kHz (reversing the frequency axis), which means that the location of the harmonics is no longer at multiples of the fundamental (pitch). \end_layout \begin_layout Section Excitation Quantization \end_layout \begin_layout Standard The high-band excitation is coded in the same way as for narrowband. \end_layout \begin_layout Section Bit allocation \end_layout \begin_layout Standard For the wideband mode, the entire narrowband frame is packed before the high-band is encoded. The narrowband part of the bit-stream is as defined in table \begin_inset CommandInset ref LatexCommand ref reference "cap:bits-narrowband" \end_inset . The high-band follows, as described in table \begin_inset CommandInset ref LatexCommand ref reference "cap:bits-wideband" \end_inset . For wideband, the mode ID is the same as the Speex quality setting and is defined in table \begin_inset CommandInset ref LatexCommand ref reference "tab:wideband-quality" \end_inset . This also means that a wideband frame may be correctly decoded by a narrowband decoder with the only caveat that if more than one frame is packed in the same packet, the decoder will need to skip the high-band parts in order to sync with the bit-stream. \end_layout \begin_layout Standard \begin_inset Float table placement h wide true sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout Parameter \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Update rate \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Wideband bit \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Mode ID \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout LSP \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 12 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 12 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 12 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 12 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Excitation gain \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout sub-frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Excitation VQ \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout sub-frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 20 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 40 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 80 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Total \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout frame \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 36 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 112 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 192 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 352 \end_layout \end_inset \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Bit allocation for high-band in wideband mode \begin_inset CommandInset label LatexCommand label name "cap:bits-wideband" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset Float table placement h wide true sideways false status open \begin_layout Plain Layout \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash begin{center} \end_layout \end_inset \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout Mode/Quality \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Bit-rate \begin_inset Index status collapsed \begin_layout Plain Layout bit-rate \end_layout \end_inset (bps) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Quality/description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3,950 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Barely intelligible (mostly for comfort noise) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5,750 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Very noticeable artifacts/noise, poor intelligibility \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7,750 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Very noticeable artifacts/noise, good intelligibility \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 3 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 9,800 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Artifacts/noise sometimes annoying \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 4 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 12,800 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Artifacts/noise usually noticeable \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 5 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 16,800 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Artifacts/noise sometimes noticeable \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 6 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 20,600 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Need good headphones to tell the difference \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 7 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 23,800 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Need good headphones to tell the difference \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 8 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 27,800 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Hard to tell the difference even with good headphones \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 9 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 34,200 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Hard to tell the difference even with good headphones \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 10 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout 42,200 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Completely transparent for voice, good quality music \end_layout \end_inset \end_inset \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash end{center} \end_layout \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption \begin_layout Plain Layout Quality versus bit-rate for the wideband encoder \begin_inset CommandInset label LatexCommand label name "tab:wideband-quality" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash clearpage \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset ERT status collapsed \begin_layout Plain Layout \backslash clearpage \end_layout \end_inset \end_layout \begin_layout Chapter \start_of_appendix Sample code \begin_inset CommandInset label LatexCommand label name "sec:Sample-code" \end_inset \end_layout \begin_layout Standard This section shows sample code for encoding and decoding speech using the Speex API. The commands can be used to encode and decode a file by calling: \family typewriter \begin_inset Newline newline \end_inset % sampleenc in_file.sw | sampledec out_file.sw \family default \begin_inset Newline newline \end_inset where both files are raw (no header) files encoded at 16 bits per sample (in the machine natural endianness). \end_layout \begin_layout Section sampleenc.c \end_layout \begin_layout Standard sampleenc takes a raw 16 bits/sample file, encodes it and outputs a Speex stream to stdout. Note that the packing used is \series bold not \series default compatible with that of speexenc/speexdec. \end_layout \begin_layout Standard \begin_inset CommandInset include LatexCommand lstinputlisting filename "sampleenc.c" lstparams "caption={Source code for sampleenc},label={sampleenc-source-code},numbers=left,numberstyle={\\footnotesize}" \end_inset \end_layout \begin_layout Section sampledec.c \end_layout \begin_layout Standard sampledec reads a Speex stream from stdin, decodes it and outputs it to a raw 16 bits/sample file. 