693 lines
		
	
	
		
			41 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
		
		
			
		
	
	
			693 lines
		
	
	
		
			41 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
|  | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> | ||
|  | <html> | ||
|  |     <head> | ||
|  |         <link rel="stylesheet" type="text/css" href="opus_in_isobmff.css"/> | ||
|  |         <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> | ||
|  |         <title>Encapsulation of Opus in ISO Base Media File Format</title> | ||
|  |     </head> | ||
|  |     <body bgcolor="0x333333" text="#60B0C0"> | ||
|  |         <b><u>Encapsulation of Opus in ISO Base Media File Format</u></b><br> | ||
|  |         <font size="2">last updated: August 28, 2018</font><br> | ||
|  |         <br> | ||
|  |         <div class="normal_link pre frame_box"> | ||
|  | 
 | ||
|  |                                 Encapsulation of Opus in ISO Base Media File Format | ||
|  |                                         Version 0.8.1 (incomplete) | ||
|  | 
 | ||
|  | 
 | ||
|  | Table of Contents | ||
|  | <a href="#1">1</a> Scope | ||
|  | <a href="#2">2</a> Normative References | ||
|  | <a href="#3">3</a> Terms and Definitions | ||
|  | <a href="#4">4</a> Design Rules of Encapsulation | ||
|  |     <a href="#4.1">4.1</a> File Type Identification | ||
|  |     <a href="#4.2">4.2</a> Overview of Track Structure | ||
|  |     <a href="#4.3">4.3</a> Definitions of Opus sample | ||
|  |         <a href="#4.3.1">4.3.1</a> Sample entry format | ||
|  |         <a href="#4.3.2">4.3.2</a> Opus Specific Box | ||
|  |         <a href="#4.3.3">4.3.3</a> Sample format | ||
|  |         <a href="#4.3.4">4.3.4</a> Duration of Opus sample | ||
|  |         <a href="#4.3.5">4.3.5</a> Sub-sample | ||
|  |         <a href="#4.3.6">4.3.6</a> Random Access | ||
|  |             <a href="#4.3.6.1">4.3.6.1</a> Random Access Point | ||
|  |             <a href="#4.3.6.2">4.3.6.2</a> Pre-roll | ||
|  |     <a href="#4.4">4.4</a> Trimming of Actual Duration | ||
|  |     <a href="#4.5">4.5</a> Channel Mapping | ||
|  |         <a href="#4.5.1">4.5.1</a> ISO Base Media native Channel Mapping | ||
|  |         <a href="#4.5.2">4.5.2</a> Composition on all active tracks (informative) | ||
|  |     <a href="#4.6">4.6</a> Basic Structure (informative) | ||
|  |         <a href="#4.6.1">4.6.2</a> Initial Movie | ||
|  |         <a href="#4.6.2">4.6.3</a> Movie Fragments | ||
|  |     <a href="#4.7">4.7</a> Example of Encapsulation (informative) | ||
|  | <a href="#5">5</a> Author's Address | ||
|  | 
 | ||
|  | <a name="1"></a> | ||
|  | 1 Scope | ||
|  |     This specification specifies the fundamental way of the encapsulation of Opus coded bitstreams in ISO Base Media | ||
|  |     file format and its derivatives. The encapsulation of Opus coded bitstreams in QuickTime file format is outside | ||
|  |     the scope of this specification. | ||
|  | 
 | ||
|  | <a name="2"></a> | ||
|  | 2 Normative References | ||
|  |     [1] ISO/IEC 14496-12:2015 Corrected version | ||
|  |         Information technology — Coding of audio-visual objects — Part 12: ISO base media file format | ||
|  | 
 | ||
|  |     [2] RFC 6716 | ||
|  |         Definition of the Opus Audio Codec | ||
|  | 
 | ||
|  |     [3] RFC 7845 | ||
|  |         Ogg Encapsulation for the Opus Audio Codec | ||
|  | 
 | ||
|  | <a name="3"></a> | ||
|  | 3 Terms and Definitions | ||
|  |     3.1 active track | ||
|  |         enabled track from the non-alternate group or selected track from alternate group | ||
|  | 
 | ||
|  |     3.2 actual duration | ||
|  |         duration constructed from valid samples | ||
|  | 
 | ||
|  |     3.3 edit | ||
|  |         entry in the Edit List Box | ||
|  | 
 | ||
|  |     3.4 padded samples | ||
|  |         PCM samples after decoding Opus sample(s) which are not valid samples | ||
|  |         An Opus bitstream always contains them partially at the beginning and may contain them in part at the end, as | ||
|  |         long as not physically removed yet at the beginning and/or the end. | ||
|  | 
 | ||
|  |     3.5 priming samples | ||
|  |         padded samples at the beginning of the Opus bitstream | ||
|  | 
 | ||
|  |     3.6 sample-accurate | ||
|  |         for any PCM sample, a timestamp exactly matching its sampling timestamp is present in the media timeline. | ||
|  | 
 | ||
|  |     3.7 valid samples | ||
|  |         PCM samples after decoding Opus sample(s) corresponding to input PCM samples | ||
|  | 
 | ||
|  | <a name="4"></a> | ||
|  | 4 Design Rules of Encapsulation | ||
|  |     4.1 File Type Identification<a name="4.1"></a> | ||
|  |         This specification defines the brand 'Opus' to declare files are conformant to this specification. Additionally, | ||
|  |         files conformant to this specification shall contain at least one brand, which supports the requirements and the | ||
|  |         requirements described in this clause without contradiction, in the compatible brands list of the File Type Box. | ||
|  |         As an example, the minimal support of the encapsulation of Opus bitstreams in ISO Base Media file format requires | ||
|  |         the 'iso2' brand in the compatible brands list since support of roll groups is required. | ||
|  | <a name="4.2"></a> | ||
|  |     4.2 Overview of Track Structure | ||
|  |         This clause summarizes requirements of the encapsulation of Opus coded bitstream as media data in audio tracks | ||
|  |         in file formats compliant with the ISO Base Media File Format. The details are described in clauses after this | ||
|  |         clause. | ||
|  |             + The handler_type field in the Handler Reference Box shall be set to 'soun'. | ||
|  |             + The Media Information Box shall contain the Sound Media Header Box. | ||
|  |             + The codingname of the sample entry is 'Opus'. | ||
|  |                 This specification does not define any encapsulation using MP4AudioSampleEntry with objectTypeIndication | ||
|  |                 specified by the MPEG-4 Registration Authority (http://www.mp4ra.org/). | ||
|  |                 See 4.3.1 Sample entry format to get the details about the sample entry. | ||
|  |             + The 'dOps' box is added to the sample entry to convey initializing information for the decoder. | ||
|  |                 See 4.3.2 Opus Specific Box to get the details. | ||
|  |             + An Opus sample is exactly one Opus packet for each of different Opus bitstreams. | ||
|  |                 See 4.3.3 Sample format to get the details. | ||
|  |             + Every Opus sample is a sync sample but requires pre-roll for every random access to get correct output. | ||
|  |                 See 4.3.6 Random Access to get the details. | ||
|  | <a name="4.3"></a> | ||
|  |     4.3 Definitions of Opus sample | ||
|  |         4.3.1 Sample entry format<a name="4.3.1"></a> | ||
|  |             For any track containing Opus bitstreams, at least one sample entry describing corresponding Opus bitstream | ||
|  |             shall be present inside the Sample Table Box. This version of the specification defines only one sample | ||
|  |             entry format named OpusSampleEntry whose codingname is 'Opus'. This sample entry includes exactly one Opus | ||
|  |             Specific Box defined in 4.3.2 as a mandatory box and indicates that Opus samples described by this sample | ||
|  |             entry are stored by the sample format described in 4.3.3. | ||
|  | 
 | ||
|  |             The syntax and semantics of the OpusSampleEntry is shown as follows. | ||
|  | 
 | ||
|  |             class OpusSampleEntry() extends AudioSampleEntry ('Opus') { | ||
|  |                 OpusSpecificBox(); | ||
|  |             } | ||
|  | 
 | ||
|  |             + channelcount: | ||
|  |                 The channelcount field indicates the number of output channels and shall be set to the same value of | ||
|  |                 the OutputChannelCount in the OpusDecoderConfigurationRecord. The value of this field may be used in | ||
|  |                 the ChannelLayout if any as described in 4.5.1. | ||
|  |             + samplesize: | ||
|  |                 The samplesize field shall be set to 16. | ||
|  |             + samplerate: | ||
|  |                 The samplerate field shall be set to 48000<<16. | ||
|  |             + OpusSpecificBox | ||
|  |                 This box contains initializing information for the decoder as defined in 4.3.2. | ||
|  | 
 | ||
|  |         4.3.2 Opus Specific Box<a name="4.3.2"></a> | ||
|  |             Exactly one Opus Specific Box shall be present in each OpusSampleEntry. | ||
|  |             The Opus Specific Box contains an OpusDecoderConfigurationRecord which contains the Version field and | ||
|  |             this specification defines version 0 of this record. If incompatible changes occured in the fields after | ||
|  |             the Version field within the OpusDecoderConfigurationRecord in the future versions of this specification, | ||
|  |             another version will be defined. | ||
|  |             This box refers to Ogg Opus [3] at many parts but all the data are stored as big-endian format. | ||
|  | 
 | ||
|  |             The syntax and semantics of the Opus Specific Box is shown as follows. | ||
|  | 
 | ||
|  |             class ChannelMappingTable (unsigned int(8) OutputChannelCount) { | ||
|  |                 unsigned int(8) StreamCount; | ||
|  |                 unsigned int(8) CoupledCount; | ||
|  |                 unsigned int(8 * OutputChannelCount) ChannelMapping; | ||
|  |             } | ||
|  | 
 | ||
|  |             aligned(8) class OpusDecoderConfigurationRecord { | ||
|  |                 unsigned int(8) Version; | ||
|  |                 unsigned int(8) OutputChannelCount; | ||
|  |                 unsigned int(16) PreSkip; | ||
|  |                 unsigned int(32) InputSampleRate; | ||
|  |                 signed int(16) OutputGain; | ||
|  |                 unsigned int(8) ChannelMappingFamily; | ||
|  |                 if (ChannelMappingFamily != 0) { | ||
|  |                     ChannelMappingTable(OutputChannelCount); | ||
|  |                 } | ||
|  |             } | ||
|  | 
 | ||
|  |             class OpusSpecificBox extends Box('dOps') { | ||
|  |                 OpusDecoderConfigurationRecord() OpusConfig; | ||
|  |             } | ||
|  | 
 | ||
|  |             + Version: | ||
|  |                 The Version field shall be set to 0. | ||
|  |                 In the future versions of this specification, this field may be set to other values. And without support | ||
|  |                 of those values, the reader shall not read the fields after this within the OpusSpecificBox. | ||
|  |             + OutputChannelCount: | ||
|  |                 The OutputChannelCount field shall be set to the same value as the *Output Channel Count* field in the | ||
|  |                 identification header defined in Ogg Opus [3]. | ||
|  |             + PreSkip: | ||
|  |                 The PreSkip field indicates the number of the priming samples, that is, the number of samples at 48000 Hz | ||
|  |                 to discard from the decoder output when starting playback. The value of the PreSkip field shall be at least | ||
|  |                 80 milliseconds' worth of PCM samples even when removing any number of Opus samples which may or may not | ||
|  |                 contain the priming samples. The PreSkip field is not used for discarding the priming samples at the whole | ||
|  |                 playback at all since it is informative only, and that task falls on the Edit List Box. | ||
|  |             + InputSampleRate: | ||
|  |                 The InputSampleRate field shall be set to the same value as the *Input Sample Rate* field in the | ||
|  |                 identification header defined in Ogg Opus [3]. | ||
|  |             + OutputGain: | ||
|  |                 The OutputGain field shall be set to the same value as the *Output Gain* field in the identification | ||
|  |                 header define in Ogg Opus [3]. Note that the value is stored as 8.8 fixed-point. | ||
|  |             + ChannelMappingFamily: | ||
|  |                 The ChannelMappingFamily field shall be set to the same value as the *Channel Mapping Family* field in | ||
|  |                 the identification header defined in Ogg Opus [3]. Note that the value 255 may be used for an alternative | ||
|  |                 to map channels by ISO Base Media native mapping. The details are described in 4.5.1. | ||
|  |             + StreamCount: | ||
|  |                 The StreamCount field shall be set to the same value as the *Stream Count* field in the identification | ||
|  |                 header defined in Ogg Opus [3]. | ||
|  |             + CoupledCount: | ||
|  |                 The CoupledCount field shall be set to the same value as the *Coupled Count* field in the identification | ||
|  |                 header defined in Ogg Opus [3]. | ||
|  |             + ChannelMapping: | ||
|  |                 The ChannelMapping field shall be set to the same octet string as *Channel Mapping* field in the identi- | ||
|  |                 fication header defined in Ogg Opus [3]. | ||
|  | 
 | ||
|  |         4.3.3 Sample format<a name="4.3.3"></a> | ||
|  |             An Opus sample is exactly one Opus packet for each of different Opus bitstreams. Due to support more than | ||
|  |             two channels, an Opus sample can contain frames from multiple Opus bitstreams but all Opus packets shall | ||
|  |             share with the total of frame sizes in a single Opus sample. The way of how to pack an Opus packet from | ||
|  |             each of Opus bitstreams into a single Opus sample follows Appendix B. in RFC 6716 [2]. | ||
|  |             The endianness has nothing to do with any Opus sample since every Opus packet is processed byte-by-byte. | ||
|  |             In this specification, 'sample' means 'Opus sample' except for 'padded samples', 'priming samples', 'valid | ||
|  |             sample' and 'sample-accurate', i.e. 'sample' is 'sample' in the term defined in ISO/IEC 14496-12 [1]. | ||
|  | 
 | ||
|  |                 +-----------------------------------------+-------------------------------------+ | ||
|  |                 | Opus packet 0 (self-delimiting framing) | Opus packet 1 (undelimited framing) | | ||
|  |                 +-----------------------------------------+-------------------------------------+ | ||
|  |                 |<---------------------------- the size of Opus sample ------------------------>| | ||
|  | 
 | ||
|  |                     Figure 1 - Example structure of an Opus sample containing two Opus bitstreams | ||
|  | 
 | ||
|  |         4.3.4 Duration of Opus sample<a name="4.3.4"></a> | ||
|  |             The duration of Opus sample is given by multiplying the total of frame sizes for a single Opus bitstream | ||
|  |             expressed in seconds by the value of the timescale field in the Media Header Box. | ||
|  |             Let's say an Opus sample consists of two Opus bitstreams, where the frame size of one bitstream is 40 milli- | ||
|  |             seconds and the frame size of another is 60 milliseconds, and the timescale field in the Media Header Box | ||
|  |             is set to 48000, then the duration of that Opus sample shall be 120 milliseconds since three 40 millisecond | ||
|  |             frame and two 60 millisecond frames shall be contained because of the maximum duration of Opus packet, 120 | ||
|  |             milliseconds, and 5760 in the timescale indicated in the Media Header Box. | ||
|  | 
 | ||
|  |             To indicate the valid samples excluding the padded samples at the end of Opus bitstream, the duration of | ||
|  |             the last Opus sample of an Opus bitstream is given by multiplying the number of the valid samples by the | ||
|  |             value produced by dividing the value of the timescale field in the Media Header Box by 48000. | ||
|  | 
 | ||
|  |         4.3.5 Sub-sample<a name="4.3.5"></a> | ||
|  |             The structure of the last Opus packet in an Opus sample is different from the others in the same Opus sample, | ||
|  |             and the others are invalid Opus packets as an Opus sample because of self-delimiting framing. To avoid | ||
|  |             complexities, sub-sample is not defined for Opus sample in this specification. | ||
|  | 
 | ||
|  |         4.3.6 Random Access<a name="4.3.6"></a> | ||
|  |             This subclause describes the nature of the random access of Opus sample. | ||
|  | 
 | ||
|  |             4.3.6.1 Random Access Point<a name="4.3.6.1"></a> | ||
|  |                 All Opus samples can be independently decoded i.e. every Opus sample is a sync sample. Therefore, the | ||
|  |                 Sync Sample Box shall not be present as long as there are no samples other than Opus samples in the same | ||
|  |                 track. And the sample_is_non_sync_sample field for Opus samples shall be set to 0. | ||
|  | 
 | ||
|  |             4.3.6.2 Pre-roll<a name="4.3.6.2"></a> | ||
|  |                 Opus bitstream requires at least 80 millisecond pre-roll after each random access to get correct output. | ||
|  |                 Pre-roll is indicated by the roll_distance field in AudioRollRecoveryEntry. AudioPreRollEntry shall not | ||
|  |                 be used since every Opus sample is a sync sample in Opus bitstream. Note that roll_distance is expressed | ||
|  |                 in sample units in a term of ISO Base Media File Format, and always takes negative values. | ||
|  | 
 | ||
|  |                 For any track containing Opus bitstreams, at least one Sample Group Description Box and at least one | ||
|  |                 Sample to Group Box within the Sample Table Box shall be present and these have the grouping_type field | ||
|  |                 set to 'roll'. If any Opus sample is contained in a track fragment, the Sample to Group Box with the | ||
|  |                 grouping_type field set to 'roll' shall be present for that track fragment. | ||
|  | 
 | ||
|  |                 For the requirement of AudioRollRecoveryEntry, the compatible_brands field in the File Type Box shall | ||
|  |                 contain at least one brand which requires support for roll groups. | ||
|  | <a name="4.4"></a> | ||
|  |     4.4 Trimming of Actual Duration | ||
|  |         Due to the priming samples (or the padding at the beginning) derived from the pre-roll for the startup and the | ||
|  |         padded samples at the end, we need trim from media to get the actual duration. An edit in the Edit List Box can | ||
|  |         achieve this demand, and the Edit Box and the Edit List Box shall be present. | ||
|  | 
 | ||
|  |         For sample-accurate trimming, proper timescale should be set to the timescale field in the Movie Header Box | ||
|  |         and the Media Header Box inside Track Box(es) for Opus bitstream. The timescale field in the Media Header Box is | ||
|  |         typically set to 48000. It is recommended that the timescale field in the Movie Header Box be set to the same | ||
|  |         value of the timescale field in the Media Header Box in order to avoid the rounding problem when specifying | ||
|  |         duration of edit if the timescales in all of the Media Header Boxes are set to the same value. | ||
|  | 
 | ||
|  |         For example, to indicate the actual duration of an Opus bitstream in a track with the timescale fields of both | ||
|  |         the Movie Header Box and the Media Header Box set to 48000, we would use the following edit: | ||
|  |             segment_duration = the number of the valid samples | ||
|  |             media_time = the number of the priming samples | ||
|  |             media_rate = 1 << 16 | ||
|  | 
 | ||
|  |         The Edit List Box is applied to whole movie including all movie fragments. Therefore, it is impossible to tell | ||
|  |         the actual duration in the case producing movie fragments on the fly such as live-streaming. In such cases, | ||
|  |         the duration of the last Opus sample may be helpful by setting zero to the segment_duration field since the | ||
|  |         value 0 represents implicit duration equal to the sum of the duration of all samples. | ||
|  | <a name="4.5"></a> | ||
|  |     4.5 Channel Mapping | ||
|  |         4.5.1 ISO Base Media native Channel Mapping<a name="4.5.1"></a> | ||
|  |             ISO Base Media File Format, that is ISO/IEC 14496-12 [1], defines an extension ChannelLayout to the | ||
|  |             AudioSampleEntry, which conveys information of mapping channels to loudspeaker positions. The ChannelLayout | ||
|  |             enables to specify the channel layout more flexibly than the predefined layouts of the ChannelMappingFamily. | ||
|  | 
 | ||
|  |             To utilize the ChannelLayout for OpusSampleEntry, the ChannelMappingFamily field should be set to 255. | ||
|  |             Even when the ChannelMappingFamily field is set to another value, the assignment of each output channel to | ||
|  |             loudspeaker position specified by the ChannelMappingFamily would be changed as specified by the ChannelLayout. | ||
|  |             The procedure of the assignment is the following. | ||
|  | 
 | ||
|  |                 1. Decoded channels are mapped to output channels according to the ChannelMappingTable. | ||
|  |                 2. Output channels are mapped to loudspeaker positions according to the ChannelLayout. | ||
|  | 
 | ||
|  |             In this way, the parameters of the Opus Specific Box are processed before the ChannelLayout, and the | ||
|  |             ChannelLayout shall follow the Opus Specific Box. | ||
|  | 
 | ||
|  |         4.5.2 Composition on all active tracks (informative)<a name="4.5.2"></a> | ||
|  |             By the application of alternate_group in the Track Header Box, whole audio channels in all active tracks from | ||
|  |             non-alternate group and/or different alternate group from each other are composited into the presentation. If | ||
|  |             an Opus sample consists of multiple Opus bitstreams, it can be splitted into individual Opus bitstreams and | ||
|  |             reconstructed into new Opus samples as long as every Opus bitstream has the same total duration in each Opus | ||
|  |             sample. This nature can be utilized to encapsulate a single Opus bitstream in each track without breaking the | ||
|  |             original channel layout. | ||
|  | 
 | ||
|  |             As an example, let's say there is a following track: | ||
|  |                 OutputChannelCount = 6; | ||
|  |                 StreamCount        = 4; | ||
|  |                 CoupledCount       = 2; | ||
|  |                 ChannelMapping     = {0, 4, 1, 2, 3, 5}; // front left, front center, front right, | ||
|  |                                                          // rear left, rear right, LFE | ||
|  |             Here, to couple front left to front right channels into the first stream, and couple rear left to rear right | ||
|  |             channels into the second stream, reordering is needed since coupled streams must precede any non-coupled | ||
|  |             stream. You extract the four Opus bitstreams from this track and you encapsulate two of the four into a track | ||
|  |             and the others into another track. The former track is as follows. | ||
|  |                 OutputChannelCount = 6; | ||
|  |                 StreamCount        = 2; | ||
|  |                 CoupledCount       = 2; | ||
|  |                 ChannelMapping     = {0, 255, 1, 2, 3, 255}; // front left, front center, front right, | ||
|  |                                                              // rear left, rear right, LFE | ||
|  |             And the latter track is as follows. | ||
|  |                 OutputChannelCount = 6; | ||
|  |                 StreamCount        = 2; | ||
|  |                 CoupledCount       = 0; | ||
|  |                 ChannelMapping     = {255, 0, 255, 255, 255, 1}; // front left, front center, front right, | ||
|  |                                                                  // rear left, rear right, LFE | ||
|  |             In addition, the value of the alternate_group field in the both tracks is set to 0. As the result, the player | ||
|  |             may play as if channels with 255 are not present, and play the presentation constructed from the both tracks | ||
|  |             in the same channel layout as the one of the original track. Keep in mind that the way of the composition, i.e. | ||
|  |             the mixing for playback, is not defined here, and maybe different results could occur except for the channel | ||
|  |             layout of the original, depending on an implementation or the definition of a derived file format. | ||
|  | 
 | ||
|  |             Note that some derived file formats may specify the restriction to ignore alternate grouping. In the context | ||
|  |             of such file formats, this application is not available. This unavailability does not mean incompatibilities | ||
|  |             among file formats unless the restriction to the value of the alternate_group field is specified and brings | ||
|  |             about any conflict among their definitions. | ||
|  | <a name="4.6"></a> | ||
|  |     4.6 Basic Structure (informative) | ||
|  |         4.6.1 Initial Movie<a name="4.6.1"></a> | ||
|  |             This subclause shows a basic structure of the Movie Box as follows: | ||
|  | 
 | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |moov|    |    |    |    |    |    |    | Movie Box                    | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |mvhd|    |    |    |    |    |    | Movie Header Box             | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |trak|    |    |    |    |    |    | Track Box                    | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |tkhd|    |    |    |    |    | Track Header Box             | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |edts|    |    |    |    |    | Edit Box                     | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |elst|    |    |    |    | Edit List Box                | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |mdia|    |    |    |    |    | Media Box                    | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |mdhd|    |    |    |    | Media Header Box             | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |hdlr|    |    |    |    | Handler Reference Box        | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |minf|    |    |    |    | Media Information Box        | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |smhd|    |    |    | Sound Media Header Box       | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |dinf|    |    |    | Data Information Box         | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |dref|    |    | Data Reference Box           | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |    |url |    | DataEntryUrlBox              | | ||
|  |             +----+----+----+----+----+----+ or +----+------------------------------+ | ||
|  |             |    |    |    |    |    |    |urn |    | DataEntryUrnBox              | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |stbl|    |    |    | Sample Table                 | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |stsd|    |    | Sample Description Box       | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |    |Opus|    | OpusSampleEntry              | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |    |    |dOps| Opus Specific Box            | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |stts|    |    | Decoding Time to Sample Box  | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |stsc|    |    | Sample To Chunk Box          | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |stsz|    |    | Sample Size Box              | | ||
|  |             +----+----+----+----+----+ or +----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |stz2|    |    | Compact Sample Size Box      | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |stco|    |    | Chunk Offset Box             | | ||
|  |             +----+----+----+----+----+ or +----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |co64|    |    | Chunk Large Offset Box       | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |sgpd|    |    | Sample Group Description Box | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |    |    |    |sbgp|    |    | Sample to Group Box          | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |mvex|*   |    |    |    |    |    | Movie Extends Box            | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |trex|*   |    |    |    |    | Track Extends Box            | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  | 
 | ||
|  |                     Figure 2 - Basic structure of Movie Box | ||
|  | 
 | ||
|  |             It is strongly recommended that the order of boxes should follow the above structure. | ||
|  |             Boxes marked with an asterisk (*) may be present. | ||
|  |             For most boxes listed above, the definition is as is defined in ISO/IEC 14496-12 [1]. The additional boxes | ||
|  |             and the additional requirements, restrictions and recommendations to the other boxes are described in this | ||
|  |             specification. | ||
|  | 
 | ||
|  |         4.6.2 Movie Fragments<a name="4.6.2"></a> | ||
|  |             This subclause shows a basic structure of the Movie Fragment Box as follows: | ||
|  | 
 | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |moof|    |    |    |    |    |    |    | Movie Fragment Box           | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |mfhd|    |    |    |    |    |    | Movie Fragment Header Box    | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |traf|    |    |    |    |    |    | Track Fragment Box           | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |tfhd|    |    |    |    |    | Track Fragment Header Box    | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |trun|    |    |    |    |    | Track Fragment Run Box       | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |sgpd|*   |    |    |    |    | Sample Group Description Box | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  |             |    |    |sbgp|    |    |    |    |    | Sample to Group Box          | | ||
|  |             +----+----+----+----+----+----+----+----+------------------------------+ | ||
|  | 
 | ||
|  |                     Figure 3 - Basic structure of Movie Fragment Box | ||
|  | 
 | ||
|  |             It is strongly recommended that the Movie Fragment Header Box and the Track Fragment Header Box be | ||
|  |             placed first in their container. | ||
|  |             Boxes marked with an asterisk (*) may be present. | ||
|  |             For the boxes listed above, the definition is as is defined in ISO/IEC 14496-12 [1]. | ||
|  | <a name="4.7"></a> | ||
|  |     4.7 Example of Encapsulation (informative) | ||
|  |         [File] | ||
|  |             size = 17757 | ||
|  |             [ftyp: File Type Box] | ||
|  |                 position = 0 | ||
|  |                 size = 24 | ||
|  |                 major_brand = Opus : Opus audio coding | ||
|  |                 minor_version = 0 | ||
|  |                 compatible_brands | ||
|  |                     brand[0] = Opus : Opus audio coding | ||
|  |                     brand[1] = iso2 : ISO Base Media file format version 2 | ||
|  |             [moov: Movie Box] | ||
|  |                 position = 24 | ||
|  |                 size = 757 | ||
|  |                 [mvhd: Movie Header Box] | ||
|  |                     position = 32 | ||
|  |                     size = 108 | ||
|  |                     version = 0 | ||
|  |                     flags = 0x000000 | ||
|  |                     creation_time = UTC 2014/12/12, 18:41:19 | ||
|  |                     modification_time = UTC 2014/12/12, 18:41:19 | ||
|  |                     timescale = 48000 | ||
|  |                     duration = 33600 (00:00:00.700) | ||
|  |                     rate = 1.000000 | ||
|  |                     volume = 1.000000 | ||
|  |                     reserved = 0x0000 | ||
|  |                     reserved = 0x00000000 | ||
|  |                     reserved = 0x00000000 | ||
|  |                     transformation matrix | ||
|  |                         | a, b, u |   | 1.000000, 0.000000, 0.000000 | | ||
|  |                         | c, d, v | = | 0.000000, 1.000000, 0.000000 | | ||
|  |                         | x, y, w |   | 0.000000, 0.000000, 1.000000 | | ||
|  |                     pre_defined = 0x00000000 | ||
|  |                     pre_defined = 0x00000000 | ||
|  |                     pre_defined = 0x00000000 | ||
|  |                     pre_defined = 0x00000000 | ||
|  |                     pre_defined = 0x00000000 | ||
|  |                     pre_defined = 0x00000000 | ||
|  |                     next_track_ID = 2 | ||
|  |                 [trak: Track Box] | ||
|  |                     position = 140 | ||
|  |                     size = 608 | ||
|  |                     [tkhd: Track Header Box] | ||
|  |                         position = 148 | ||
|  |                         size = 92 | ||
|  |                         version = 0 | ||
|  |                         flags = 0x000007 | ||
|  |                             Track enabled | ||
|  |                             Track in movie | ||
|  |                             Track in preview | ||
|  |                         creation_time = UTC 2014/12/12, 18:41:19 | ||
|  |                         modification_time = UTC 2014/12/12, 18:41:19 | ||
|  |                         track_ID = 1 | ||
|  |                         reserved = 0x00000000 | ||
|  |                         duration = 33600 (00:00:00.700) | ||
|  |                         reserved = 0x00000000 | ||
|  |                         reserved = 0x00000000 | ||
|  |                         layer = 0 | ||
|  |                         alternate_group = 0 | ||
|  |                         volume = 1.000000 | ||
|  |                         reserved = 0x0000 | ||
|  |                         transformation matrix | ||
|  |                             | a, b, u |   | 1.000000, 0.000000, 0.000000 | | ||
|  |                             | c, d, v | = | 0.000000, 1.000000, 0.000000 | | ||
|  |                             | x, y, w |   | 0.000000, 0.000000, 1.000000 | | ||
|  |                         width = 0.000000 | ||
|  |                         height = 0.000000 | ||
|  |                     [edts: Edit Box] | ||
|  |                         position = 240 | ||
|  |                         size = 36 | ||
|  |                         [elst: Edit List Box] | ||
|  |                             position = 281 | ||
|  |                             size = 28 | ||
|  |                             version = 0 | ||
|  |                             flags = 0x000000 | ||
|  |                             entry_count = 1 | ||
|  |                             entry[0] | ||
|  |                                 segment_duration = 33600 | ||
|  |                                 media_time = 312 | ||
|  |                                 media_rate = 1.000000 | ||
|  |                     [mdia: Media Box] | ||
|  |                         position = 276 | ||
|  |                         size = 472 | ||
|  |                         [mdhd: Media Header Box] | ||
|  |                             position = 284 | ||
|  |                             size = 32 | ||
|  |                             version = 0 | ||
|  |                             flags = 0x000000 | ||
|  |                             creation_time = UTC 2014/12/12, 18:41:19 | ||
|  |                             modification_time = UTC 2014/12/12, 18:41:19 | ||
|  |                             timescale = 48000 | ||
|  |                             duration = 34560 (00:00:00.720) | ||
|  |                             language = und | ||
|  |                             pre_defined = 0x0000 | ||
|  |                         [hdlr: Handler Reference Box] | ||
|  |                             position = 316 | ||
|  |                             size = 51 | ||
|  |                             version = 0 | ||
|  |                             flags = 0x000000 | ||
|  |                             pre_defined = 0x00000000 | ||
|  |                             handler_type = soun | ||
|  |                             reserved = 0x00000000 | ||
|  |                             reserved = 0x00000000 | ||
|  |                             reserved = 0x00000000 | ||
|  |                             name = Xiph Audio Handler | ||
|  |                         [minf: Media Information Box] | ||
|  |                             position = 367 | ||
|  |                             size = 381 | ||
|  |                             [smhd: Sound Media Header Box] | ||
|  |                                 position = 375 | ||
|  |                                 size = 16 | ||
|  |                                 version = 0 | ||
|  |                                 flags = 0x000000 | ||
|  |                                 balance = 0.000000 | ||
|  |                                 reserved = 0x0000 | ||
|  |                             [dinf: Data Information Box] | ||
|  |                                 position = 391 | ||
|  |                                 size = 36 | ||
|  |                                 [dref: Data Reference Box] | ||
|  |                                     position = 399 | ||
|  |                                     size = 28 | ||
|  |                                     version = 0 | ||
|  |                                     flags = 0x000000 | ||
|  |                                     entry_count = 1 | ||
|  |                                     [url : Data Entry Url Box] | ||
|  |                                         position = 415 | ||
|  |                                         size = 12 | ||
|  |                                         version = 0 | ||
|  |                                         flags = 0x000001 | ||
|  |                                         location = in the same file | ||
|  |                             [stbl: Sample Table Box] | ||
|  |                                 position = 427 | ||
|  |                                 size = 321 | ||
|  |                                 [stsd: Sample Description Box] | ||
|  |                                     position = 435 | ||
|  |                                     size = 79 | ||
|  |                                     version = 0 | ||
|  |                                     flags = 0x000000 | ||
|  |                                     entry_count = 1 | ||
|  |                                     [Opus: Audio Description] | ||
|  |                                         position = 451 | ||
|  |                                         size = 63 | ||
|  |                                         reserved = 0x000000000000 | ||
|  |                                         data_reference_index = 1 | ||
|  |                                         reserved = 0x0000 | ||
|  |                                         reserved = 0x0000 | ||
|  |                                         reserved = 0x00000000 | ||
|  |                                         channelcount = 6 | ||
|  |                                         samplesize = 16 | ||
|  |                                         pre_defined = 0 | ||
|  |                                         reserved = 0 | ||
|  |                                         samplerate = 48000.000000 | ||
|  |                                         [dOps: Opus Specific Box] | ||
|  |                                             position = 487 | ||
|  |                                             size = 27 | ||
|  |                                             Version = 0 | ||
|  |                                             OutputChannelCount = 6 | ||
|  |                                             PreSkip = 312 | ||
|  |                                             InputSampleRate = 48000 | ||
|  |                                             OutputGain = 0 | ||
|  |                                             ChannelMappingFamily = 1 | ||
|  |                                             StreamCount = 4 | ||
|  |                                             CoupledCount = 2 | ||
|  |                                             ChannelMapping | ||
|  |                                                 0 -> 0: front left | ||
|  |                                                 1 -> 4: fron center | ||
|  |                                                 2 -> 1: front right | ||
|  |                                                 3 -> 2: side left | ||
|  |                                                 4 -> 3: side right | ||
|  |                                                 5 -> 5: rear center | ||
|  |                                 [stts: Decoding Time to Sample Box] | ||
|  |                                     position = 514 | ||
|  |                                     size = 24 | ||
|  |                                     version = 0 | ||
|  |                                     flags = 0x000000 | ||
|  |                                     entry_count = 1 | ||
|  |                                     entry[0] | ||
|  |                                         sample_count = 18 | ||
|  |                                         sample_delta = 1920 | ||
|  |                                 [stsc: Sample To Chunk Box] | ||
|  |                                     position = 538 | ||
|  |                                     size = 40 | ||
|  |                                     version = 0 | ||
|  |                                     flags = 0x000000 | ||
|  |                                     entry_count = 2 | ||
|  |                                     entry[0] | ||
|  |                                         first_chunk = 1 | ||
|  |                                         samples_per_chunk = 13 | ||
|  |                                         sample_description_index = 1 | ||
|  |                                     entry[1] | ||
|  |                                         first_chunk = 2 | ||
|  |                                         samples_per_chunk = 5 | ||
|  |                                         sample_description_index = 1 | ||
|  |                                 [stsz: Sample Size Box] | ||
|  |                                     position = 578 | ||
|  |                                     size = 92 | ||
|  |                                     version = 0 | ||
|  |                                     flags = 0x000000 | ||
|  |                                     sample_size = 0 (variable) | ||
|  |                                     sample_count = 18 | ||
|  |                                     entry_size[0] = 977 | ||
|  |                                     entry_size[1] = 938 | ||
|  |                                     entry_size[2] = 939 | ||
|  |                                     entry_size[3] = 938 | ||
|  |                                     entry_size[4] = 934 | ||
|  |                                     entry_size[5] = 945 | ||
|  |                                     entry_size[6] = 948 | ||
|  |                                     entry_size[7] = 956 | ||
|  |                                     entry_size[8] = 955 | ||
|  |                                     entry_size[9] = 930 | ||
|  |                                     entry_size[10] = 933 | ||
|  |                                     entry_size[11] = 934 | ||
|  |                                     entry_size[12] = 972 | ||
|  |                                     entry_size[13] = 977 | ||
|  |                                     entry_size[14] = 958 | ||
|  |                                     entry_size[15] = 949 | ||
|  |                                     entry_size[16] = 962 | ||
|  |                                     entry_size[17] = 848 | ||
|  |                                 [stco: Chunk Offset Box] | ||
|  |                                     position = 670 | ||
|  |                                     size = 24 | ||
|  |                                     version = 0 | ||
|  |                                     flags = 0x000000 | ||
|  |                                     entry_count = 2 | ||
|  |                                     chunk_offset[0] = 797 | ||
|  |                                     chunk_offset[1] = 13096 | ||
|  |                                 [sgpd: Sample Group Description Box] | ||
|  |                                     position = 694 | ||
|  |                                     size = 26 | ||
|  |                                     version = 1 | ||
|  |                                     flags = 0x000000 | ||
|  |                                     grouping_type = roll | ||
|  |                                     default_length = 2 (constant) | ||
|  |                                     entry_count = 1 | ||
|  |                                     roll_distance[0] = -2 | ||
|  |                                 [sbgp: Sample to Group Box] | ||
|  |                                     position = 720 | ||
|  |                                     size = 28 | ||
|  |                                     version = 0 | ||
|  |                                     flags = 0x000000 | ||
|  |                                     grouping_type = roll | ||
|  |                                     entry_count = 1 | ||
|  |                                     entry[0] | ||
|  |                                         sample_count = 18 | ||
|  |                                         group_description_index = 1 | ||
|  |             [free: Free Space Box] | ||
|  |                 position = 748 | ||
|  |                 size = 8 | ||
|  |             [mdat: Media Data Box] | ||
|  |                 position = 756 | ||
|  |                 size = 17001 | ||
|  | <a name="5"></a> | ||
|  | 5 Authors' Address | ||
|  |     Yusuke Nakamura | ||
|  |         Email: muken.the.vfrmaniac |at| gmail.com | ||
|  |         </div> | ||
|  |     </body> | ||
|  | </html> |