Page 11 - EngineerIT October 2022
P. 11
ICT VIDEO STREAMING
Telecommunications Standardization Sector
for Telecommunications and Information
Communication Technology) body defines
these media encoding standards, or
strictly speaking, decoding standards as
recommendations. We only refer to the
parts that are useful for our understanding
here. For reference, we will use the ITU-T
H.264 Recommendation for Advanced
Video Coding for Generic Audiovisual
Services where each encoding HTTP video
file segment is encoded as a (GOP) group
of pictures. The first picture/frame of the
GOP is always the primary reference of
the video segment called an IDR-frame
(Instantaneous Decoder Refresh frame).
The remaining frames in the video are
encoded as predictions from all previous
frames starting from the IDR-frame.
P-frames are past frame predictions and Figure A: Segment Structure of HLS and DASH
B-frames are bi-directional predictions from
past and future frames within the GOP.
The key difference is that the IDR-frames
require significantly more data than the
P-frames and B-frames. Typically, between
5-10 times more.
A high-level representation of the HLS/
DASH encoded structure using H.264 GOP
terminology with 8 second segments is
shown in Figure A.
Within each segment, the GOP with
the high data bits for the IDR-frame and
the lower data bits for the P/B-frames are
shown. The example further shows how
multiple sequences with differing average Figure B: Increasing Data Rate with Decreasing Segment Length
data rates would be generated for the data
rate adaptation process. or buffering the stream. This is the primary cause of buffering in streaming applications.
Apple Inc recommends 10 second The switching time may be improved by generating smaller segment lengths, but this
segment lengths but between 2 – 12 comes at the cost of an increasing average bitrate. The cause of this is illustrated in Figure B.
seconds are typical for streaming As the segment length is decreased, the higher data rate IDR-frames of consecutive
applications. The necessity for the starting segments begins to dominate the average bitrate of the stream. The impact is an increasing
frame to be an IDR-frame is that when the data requirement that results in the decoder having to switch to lower quality segments than
video player/decoder requests a different would not be the case for longer segments.
data rate segment, it must reset its This adaptation time vs data rate trade-off is the core of the challenge for network
prediction to the new sequence of frames conditions where data rate is constrained and/or where congestion occurs. The shorter
for correct decoding. segments are required for a smooth viewer experience but at a quality and data rate cost.
The length of the segment determines
how fast a decoder can adapt to changing Final Remarks
network conditions. After requesting an There is no doubt that innovations in the related international standards and implementation
8 second segment and it is in transit, the techniques will continue to improve the quality of service for streaming applications into
decoder must wait until it is fully received the long-term future. But it is the context in which they are used that determines their
before it can start receiving a lower/ effectiveness. In stressed conditions, as found in emerging economies, the limitations of
higher data rate segment. Therefore, the design are exposed. In (mobile) high data cost economies, limited Internet infrastructure or
worst-case switching time is 8 seconds, access (typical of rural regions) and network congestion, there are further challenging research
in this example. For mobile networks, an questions to be explored. To increase the inclusivity of wider populations and communities to
adaptation transition must typically be streaming services requires some new solutions that can overcome some of the limitations
between 1 – 2 seconds to prevent stalling discussed in this article. n
EngineerIT | October 2022 | 9