Jaz's Blog A space where I rant about computers

How HLS Works

Over the past few weeks, I’ve been building out server-side short video support for Bluesky.

The major aim of this feature is to support short (90 second max) video streaming at a quality that doesn’t cost an arm and a leg for us to provide for free.

In order to stay within these constraints, we’re considering making use of a video CDN that can bear the brunt of the bandwidth required to support Video-on-Demand streaming.

While the CDN is a pretty fully-featured product, we want to avoid too much vendor lock-in and provide some enhancements to our streaming platform that requires extending their offering and getting creative with video streaming protocols.

Some of the things we’d like to be able to do that don’t work out-of-the-box are:

  • Track view counts, viewer sessions, and duration viewed to provide better feedback for video performance.
  • Provide dynamic closed-caption support with the flexibility to automate them in the future.
  • Store a transcoded version of source files somewhere durable to provide a “source of truth” for videos when needed.
  • Append a “trailer” to the end of video streams for some branding in a TikTok-esque 3-second snippet.

In this post I’ll be focusing on the HLS-related features above, namely view/duration accounting, closed captions, and trailers.

HLS is Just a Bunch of Text files

HTTP Live Streaming (HLS) is a standard established by Apple in 2009 that allows for adaptive-bitrate live and Video-on-Demand (VOD) streaming.

For the purposes of this blog post, I’ll restrict my explanations to how HLS VOD streaming works.

A player that implements the HLS protocol is capable of dynamically adjusting the quality of a streamed video based on network conditions. Additionally, a server that implements the HLS protocol should provide one or more variants of a media stream which accommodate varying network qualities to allow for graceful degradation of stream quality without stopping playback.

A diagram showing HLS Master and Media playlists

HLS implements this by producing a series of plaintext (.m3u8) “playlist” files that tell the player what bitrates and resolutions the server provides so that the player can decide which variant it should stream.

HLS differentiates between two kinds of “playlist” files: Master Playlists, and Media Playlists.

Master Playlists

A Master Playlist is the first file fetched by your video player. It contains a series of variants which point to child Media Playlists. It also describes the approximate bitrate of the variant sources and the codecs and resolutions used by those sources.

$ curl https://my.video.host.com/video_15/playlist.m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360
360p/video.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720
720p/video.m3u8

In the above file, the key things to notice are the RESOLUTION parameters and the {res}/video.m3u8 links.

Your media player will generally start with the lowest resolution version before jumping up to higher resolutions once the network speed between you and the server is dialed in.

The links in this file are pointers to Media Playlists, generally as relative paths from the Master Playlist such that, if we wanted to grab the 720p Media Playlist, we’d navigate to: https://my.video.host.com/video_15/720p/video.m3u8.

A Master Playlist can also contain multi-track audio directives and directives for closed-captions but for now let’s move onto the Media Playlist.

Media Playlists

A Media Playlist is yet another plaintext file that provides your video player with two key bits of data: a list of media Segments (encoded as .ts video files) and headers for each Segment that tell the player the runtime of the media.

$ curl https://my.video.host.com/video_15/720p/video.m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:4
#EXTINF:4.000,
video0.ts
#EXTINF:4.000,
video1.ts
#EXTINF:4.000,
video2.ts
#EXTINF:4.000,
video3.ts
#EXTINF:4.000,
video4.ts
#EXTINF:2.800,
video5.ts

This Media Playlist describes a video that’s 22.8 seconds long (5 x 4-second Segments + 1 x 2.8-second Segment).

The playlist describes a VOD piece of media, meaning we know this playlist contains the entirety of the media the player needs.

The TARGETDURATION tells us the maximum length of each Segment so the player knows how many Segments to buffer ahead of time. During live streaming, that also lets the player know how frequently to refresh the playlist file to discover new Segments.

Finally the EXTINF headers for each Segment indicate the duration of the following .ts Segment file and the relative paths of the video#.ts tell the player where to load the actual media files from.

Where’s the Actual Media?

At this point, the video player has loaded two .m3u8 playlist files and got lots of metadata about how to play the video but it hasn’t actually loaded any media files.

The .ts files referenced in the Media Playlist are where the real media is, so if we wanted to control the playlists but let the CDN handle serving actual media, we can just redirect those video#.ts requests to our CDN.

.ts files are Transport Stream MPEG-2 encoded short media files that can contain video or audio and video.

Tracking Views

To track views of our HLS streams, we can leverage the fact that every video player must first load the Master Playlist.

When a user requests the Master Playlist, we can modify the results dynamically to provide a SessionID to each response and allow us to track the user session without cookies or headers:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360
360p/video.m3u8?session_id=12345
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720
720p/video.m3u8?session_id=12345

Now when their video player fetches the Media Playlists, it’ll include a query-string that we can use to identify the streaming session, ensuring we don’t double-count views on the video and can track which Segments of video were loaded in the session.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:4
#EXTINF:4.000,
video0.ts?session_id=12345&duration=4
#EXTINF:4.000,
video1.ts?session_id=12345&duration=4
#EXTINF:4.000,
video2.ts?session_id=12345&duration=4
#EXTINF:4.000,
video3.ts?session_id=12345&duration=4
#EXTINF:4.000,
video4.ts?session_id=12345&duration=4
#EXTINF:2.800,
video5.ts?session_id=12345&duration=2.8

Finally when the video player fetches the media Segment files, we can measure the Segment view before we redirect to our CDN with a 302, allowing us to know the amount of video-seconds loaded in the session and which Segments were loaded.

This method has limitations, namely that a media player loading a segment doesn’t necessarily mean it showed that segment to the viewer, but it’s the best we can do without an instrumented media player.

Adding Subtitles

Subtitles are included in the Master Playlist as a variant and then are referenced in each of the video variants to let the player know where to load subs from.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="en_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="en",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/en.m3u8"
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360,SUBTITLES="subs"
360p/video.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720,SUBTITLES="subs"
720p/video.m3u8

Just like with the video Media Playlists, we need a Media Playlist file for the subtitle track as well so that the player knows where to load the source files from and what duration of the stream they cover.

$ curl https://my.video.host.com/video_15/subtitles/en.m3u8

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:22.8
#EXTINF:22.800,
en.vtt

In this case, since we’re only serving a short video, we can just provide a single Segment that points at a WebVTT subtitle file encompassing the entire duration of the video.

If you crack open the en.vtt file you’ll see something like:

$ curl https://my.video.host.com/video_15/subtitles/en.vtt

WEBVTT

00:00.000 --> 00:02.000
According to all known laws
of aviation,

00:02.000 --> 00:04.000
there is no way a bee
should be able to fly.

00:04.000 --> 00:06.000
Its wings are too small to get
its fat little body off the ground.

...

The media player is capable of reading WebVTT and presenting the subtitles at the right time to the viewer.

For longer videos you may want to break up your VTT files into more Segments and update the subtitle Media Playlist accordingly.

To provide multiple languages and versions of subtitles, just add more EXT-X-MEDIA:TYPE=SUBTITLES lines to the Master Playlist and tweak the NAME, LANGUAGE (if different), and URI of the additional subtitle variant definitions.

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="en_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="en",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/en.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="fr_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="fr",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/fr.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="ja_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="ja",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/ja.m3u8"

Appending a Trailer

For branding purposes (and in other applications, for advertising purposes), it can be helpful to insert Segments of video into a playlist to change the content of the video without requiring the content be appended to and re-encoded with the source file.

Thankfully, HLS allows us to easily insert Segments into the Media Playlist using this one neat trick:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:4
#EXTINF:4.000,
video0.ts
#EXTINF:4.000,
video1.ts
#EXTINF:4.000,
video2.ts
#EXTINF:4.000,
video3.ts
#EXTINF:4.000,
video4.ts
#EXTINF:2.800,
video5.ts
#EXT-X-DISCONTINUITY
#EXTINF:3.337,
trailer0.ts
#EXTINF:1.201,
trailer1.ts
#EXTINF:1.301,
trailer2.ts
#EXT-X-ENDLIST

In this Media Playlist we use HLS’s EXT-X-DISCONTINUITY header to let the video player know that the following Segments may be in a different bitrate, resolution, and aspect-ratio than the preceding content.

Once we’ve provided the discontinuity header, we can add more Segments just like normal that point at a different media source broken up into .ts files.

Remember, HLS allows us to use relative or absolute paths here, so we could provide a full URL for these trailer#.ts files, or virtually route them so they can retain the path context of the currently viewed video.

Note that we don’t need to provide the discontinuity header here, and we could also name the trailer files something like video{6-8}.ts if we wanted to, but for clarity and proper player behavior, it’s best to use the discontinuity header if your trailer content doesn’t match the bitrate and resolution of the other video Segments.

When the video player goes to play this media, it will continue from video5.ts to trailer0.ts without missing a beat, making it appear as if the trailer is part of the original video.

This approach allows us to dynamically change the contents of the trailer for all videos, heavily cache the trailer .ts Segment files for performance, and avoid having to encode the trailer onto the end of every video source file.

Conclusion

At the end of the day, we’ve now got a video streaming service capable of tracking views and watch session durations, dynamic closed caption support, and branded trailers to help grow the platform.

HLS is not a terribly complex protocol. The vast majority of it is human-readable plaintext files and is easy to inspect in the wild to how it’s used in production.

When I started this project, I knew next to nothing about the protocol but was able to download some .m3u8 files and get digging to discover how the protocol worked, then build my own implementation of a HLS server to accommodate the video streaming needs of Bluesky.

To learn more about HLS, you can check out the official RFC here which describes all the features discussed above and more.

I hope this post encourages you to go explore other protocols you use every day by poking at them in the wild, downloading the files your browser interprets for you, and figuring out how simple some of these apparently “complex” systems are.

If you’re interested in solving problems like these, take a look at our open Job Recs.

If you have any questions about HLS, Bluesky, or other distributed, @scale social media infrastructure, you can find me on Bluesky here and you can discuss this post here.