BLOG: How to work with transport stream and HEVC as mezzanine file format?

15 June 2021



Defined in the 90s with the emergence of digital video broadcasting, the transport stream format has proven itself and is still widely used, both in contribution and in broadcasting. Its specifications have evolved over time, and the transport stream format can now carry a large number of video and audio codecs, as well as countless metadata. We will study in this article how this historical format can in fact be used as a mezzanine (or pivot) format in a broadcast processing chain, and how to take advantage of the spread of more efficient codecs such as HEVC, AV1 or VVC.


What is transport stream?

Let’s start by recalling the specifics of transport stream (TS), which as its name suggests is not a file format but a stream format. TS is a synchronous stream of 188 Bytes TS packets (or 204 Bytes), each TS packet containing a 4 bytes header, an optional adaptation field, and a payload. A TS is a multiplex of multiple elementary streams, each elementary stream being a sequence of TS packets with the same PID (Packet Identifier) value in its header.TS1

Each elementary data stream (video, audio, metadata, etc.) represents an ES (Elementary Stream). These elementary streams are then packetized to form the PES (Packetized Elementary Stream) layer and thus each PES is attached to an identifier called its PID.TS2

In order to then be able to correctly demultiplex the different data and select the desired services, a Program-specific information (PSI) metadata is generated for each channel.

The PSI data as defined by ISO/IEC 13818-1 (MPEG-2 Part 1: Systems) includes four tables:

  • PAT (Program Association Table)
  • PMT (Program Mapping Table)
  • CAT (Conditional Access Table – Optional)
  • NIT (Network Information Table – Optional)

Depending on the areas and applications, extensions of the TS specifications have been defined: DVB in Europe, ATSC in the USA, ISDB-T in Japan and LATAM for example.

Depending on the broadcast media used, other types of tables may be present in the streams, such as the various PSIP (Program and System Information Protocol) information defined within the framework of the ATSC and making it possible to transmit additional information relating to programs (System Time Table (STT), Event Information Table (EIT)…). Some of these tables are also used and defined in other standards such as DVB or ISDB-T. SDT (Service Description Tables) are also widely used in DVB, ATSC and ISDB-T distributions.

Extra information, such as SCTE-35 markers used for add insertion / splicing points, can also be transmitted over transport streams on specific PIDs.

Transport stream also contains synchronization information such as PCR (in order to regenerate the proper clock on receiver side) and PTS/DTS (Presentation and Decoding Time Stamps for audio and video decoding and buffer management).

So if we want to compare the transport stream to a classic media file (mxf or equivalent), it actually contains more information: PSI can be seen as the equivalent of CPL available in some advanced media file format (CPL: Composition Playlist like in IMF for example).
On the other hand, PES and TS packetization have a slight excess bandwidth, but which remains clearly negligible, especially for high bit rates. 
The transport stream being by definition a transport format, it has the advantage of being able to manage different Codecs, it is by nature future-proof. Initially defined for MPEG2, it then made it possible to transport videos in H264 then HEVC format, or even now JPEG2000 (VSF-TR01: 2018 standard). HEVC is now widely adopted in the contribution field, whether in Ultra HD or HD:

– 10-40 Mbits/s, HEVC Main 10 4:2:0/4:2:2 for HD contribution (up to 1080p).

– 50-100 Mbits/s, HEVC Main 10 4:2:0/4:2:2 for UltraHD contribution.


Standard workflow for live, ingest and playout

Before looking at the use of TS as a mezzanine format, let’s quickly recall what a mezzanine format is and where it is used in the creation of a TV channel. The content can come from different sources: live (sports, events …), replay of recorded live, content from the post productions (movies, advertisements …) or other content owners. In order to guarantee the quality and the conformity of the contents, the TV channels generally define a mezzanine format to be respected for the supply of the media files. Each service provider or TV channel has its own mezzanine format specifications according to its needs in terms of video quality, audio formats, subtitles and other metadata necessary for media asset management systems. In the case of recording live streams, they are also recorded in the appropriate mezzanine format (or even recorded and then transcoded) so that they can then be replayed by the playout systems. It is common to consider that the video stream should be encoded at 5 to 10 times the original bitrate to avoid quality loss, which has a definite impact on storage requirements.


Schematic diagram of an ingest / live / playout solution with mezzanine format

Switching to a mezzanine format has the advantage of standardizing the workflow and simplifying the management of the playout, which must only support one file format. However, this solution requires an unnecessary decoding and re-encoding step for live recordings, and drastically increases storage requirements. 

Historically, the mezzanine broadcast format is confined to the use of simple codecs such as mpeg2 or H264, and in the I-Only format. This was justified by the simplicity of decoding (in particular for the edition / post production of these contents), and the fact of being able to instantly access any image of the stream (I-Only). Nevertheless, the constant improvements in CPU power, as well as the integrated GPU accelerations, now allow the use of more efficient codecs (mainly HEVC) whether in I-Only or Long GOP. For example, new generations of Intel consumer processors, such as Ice Lake, now support GPU hardware acceleration for UHD HEVC encoding and decoding, up to 4: 2: 2 10 bits. 


Let’s do it with Transport Stream and HEVC?

Keeping the transport stream format as a mezzanine format makes it possible to avoid a decoding + encoding step, which is optimal for preserving audio and video quality. 

This method also makes it possible to reduce storage requirements since the contribution streams do not even need to be re-encoded at a higher speed. Less processing, less processing time, and less storage. 


Schematic diagram of a Transport Stream based ingest / live / playout solution


If we look at the case of a UHD chain replacing the XAVC Class 300 mezzanine format (600mbps @ 59fps) by HEVC 100 mbps transport stream (high quality uhd contribution feed), this makes it possible to divide by 6 the storage and network bandwidth requirements, while preserving 100 % of video and audio quality.

Another significant benefit, mainly in UHD, is the preservation of HDR metadata (static or dynamic) which is usually stored within the HEVC stream (SEI message). More critical than all, with the emergence of object-based audio, the traditional decode/re-encode approach is becoming more complex. Since decoding is related to a special rendering mode, it is not always feasible to decode object-based audio then re-encode it for distribution. Keeping the native compressed format can provide a good solution for such a use-case with object based audio.

It is clear that the use of TS in post-production can seem unnatural, nevertheless an intermediate solution can quite be used, with for example the use of an MXF or MP4 wrapper instead of the transport stream, the essential being the use of a codec like HEVC to maximize quality, and minimize storage and bandwidth requirements. Modern playout solutions, such as BBright’s UHD-Channel, know how to handle heterogeneous codecs or wrappers in their playlists. It is therefore entirely conceivable to use different formats for linear content resulting from post-production, and content received via contribution recorded natively in transport stream format.

Since the BBright UHD-Channel playout also integrates the ability to directly receive contribution streams (UHD HEVC up to 120 mbits / s), the concept can be further optimized by being coupled with the BBright UHD-TSI ingest solution. 


Schematic diagram of a BBright optimised Transport Stream based ingest / live / playout solution

Other use case: let’s get rid of live encoders

Let’s push the reflection around the use of the transport stream to its peak: why use a real-time encoder to compress the output of the playout in distribution format if all the content is already available in the transport stream format?

To keep the video workflow simple is, usually, the best way to save time and money and lower the risks of failures. Pre-encoding off-line any video into an HEVC or AVC Transport Stream is nowadays a straightforward task which can run on dedicated appliances, or on multiple private Virtual Machine instances or even on encoders available in the public Cloud.

Off-line encoders also offer the possibility to burn channel logos/graphics and to add the proper HDR signalization, multiple Closed Captions & DVB-Subtitles and different audio tracks/languages as well. In consequence, the same TS-processed playout pivot file can be used for distribution to different countries with individual and dedicated PID’s for each subtitles and languages.

Preparing in advance the channel branding and applying it during the offline encoding stage is the only operational constraint related to TS-based playout.


Schematic diagram of a BBright optimized native Transport Stream playout solution

This solution is particularly relevant for the deployment of UHD or even 8K thematic linear channels. No need to deploy a live encoder, each content is encoded only once offline, which in the process allows lower distribution rates to be achieved via non-real time double or triple pass encodings.


Schematic diagram of a BBright optimized native Transport Stream for UHD linear channel

In order to be compatible with the transmitters at the head of the network, most of the tables (PAT, PMT, SDT, …) have to be dynamically regenerated according to the content / TS files.


Schematic diagram of a BBright optimized native Transport Streamwith dynamic table regeneration
This solution also makes it possible to reach unmatched densities, with up to 50 playout channels on a single server. Channel synchronization, a new feature of the BBright Multi-TS solution, now makes it possible to feed OTT packagers and thus dispense with an OTT live encoder for demo or thematic channels.


Schematic diagram of a BBright native Transport Stream for high density and OTT applications



We have seen that this good old transport stream format still has undeniable advantages. With the emergence of a new generation of codec (HEVC, AV1, VCC …), this format makes it possible to avoid unnecessary transcoding. It preserves metadata in its native formats, and minimizes network bandwidth and storage requirements. 

Hopefully the major post production and NLE software vendors will quickly adopt these new codecs, as recent developments in CPU and GPU technologies make their use much easier, at least for HEVC. And ultimately, preparing content directly in their distribution formats will make more and more sense. Maintaining the quality and simplicity of the workflow is essential.