Burned-in Subtitles:
A Journey to Quality

It’s tragic. We burn thousands of subs every single day — and yet so many of us remain oblivious to the fact that, in the process, we often ruin our videos’ image quality. How come? And to what extent? — No one seems to care. Only few brave warriors continue to defend the bastion of visual fidelity, but the battle is a losing one.

Well, I am here to stop this genocide of bits and pixels! It’s time to wake up from the encoding slumber and to rise against the cursed image degradation! So, join me on a journey of discovering the truth behind bitrates, codecs, quality metrics and compression — and witness a grand battle royale of the most popular “burner” tools. Let’s go!

Image Artifacts (The Nemesis)

To start our adventure, let’s define the villain. What we’re combating here is a visible loss of video quality, which creates odd-looking distortions (called “artifacts”) in the picture. I’m sure you’ve seen them before when watching a movie or a YouTube clip — things like colour banding:

Screen Shot 2023-07-24 at 10.33_edited.j

Source: GinaMilicia.com

Blockiness:

Screen Shot 2023-07-24 at 11.16.04 AM.png

Source: MKBHD

Visual noise:

Source: biamp.com

And so on. If you’re not careful enough when burning-in your subs, you can introduce such artifacts to the video without even knowing it! They won’t be as pronounced, of course, but they might still be visible to the audience, affecting their viewing experience — and that is something we want to avoid.

Codecs and Compression (A detour)

“So... why do we lose video quality when embedding subtitles? Aren’t we just copy-pasting pixels, while adding our text on top?” — Alas, not quite. This wouldn’t really work. To understand why, let’s take a small detour.

When capturing raw footage with a camera, retaining high graphical fidelity requires a massive amount of data. This amount will depend on the chosen resolution, framerate and color depth, but to give you a quick example, one of the cameras used for shooting Joker, Arria ALEXA 65, generates as much as 645 GB of pixel information per hour of filming in 4K at 24 fps. That’s a whopping 1.3 terabytes for a single two-hour movie!

Source: ArriRental.com

Such huge file sizes would be very difficult to manage in terms of data storage, broadcast bandwidth, playback, as well as upload/download. To make handling videos easier, engineers have come up with a way to decrease the size considerably by sacrificing a little bit of quality. This is done via intelligent algorithms, called codecs, which compress a file when you need to store or transfer it and decompress it when you want to play the video, similar to how we squeeze all our holiday clothes into a small luggage case and unpack them once in the hotel. The “packing” part is known as coding (or encoding), while “unpacking” — as decoding, and hence the name: co-dec. And just like our travel preparation will leave the shirts wrinkled, the “imperfect” (lossy) compression will affect the picture.

Now, there are many different codecs out there that serve various purposes:

Some make it possible to fit a movie on a tape or disk, like MPEG-1 for VHS and MPEG-2 for DVD.
Some are employed in post-production to provide virtually lossless yet efficient encoding for video editing, colour grading and visual effects (e.g. Apple’s ProRes and Avid’s DNxHD).
Some were created for end-user viewing to save your internet bandwidth, such as VP9 used by YouTube and AV1 used by Netflix.
Some work in real time on capture devices to help with storage limitations — like HEVC on the iPhone 13 and ARRIRAW-HDE on the aforementioned ALEXA 65 camera.
And so on.

How exactly these algorithms work is a fairly complex and technical topic, so I won’t go into much detail here, but basically they analyse each individual frame in terms of its content and relation to the surrounding frames, and then use a lot of mathematical trickery to cram this visual information into fewer bits of data, by trading off a certain amount of fidelity — and the stronger the compression, the more fidelity will be dropped.

So, going back to the original question, we can’t just copy-and-paste the source pixels when burning our subs, because such lossless encoding would essentially produce an uncompressed file hundreds of gigabytes in size, which would be entirely unmanageable. Hence, we must use a video codec and accept a lossy result.

But is that a bad thing? — No, not at all, because our aim isn’t to retain perfect quality, but rather to minimize the loss to such an extent as to make it unnoticeable to the human eye.

Choosing the Right Codec (A Treacherous Path)

Okay, now we know what video codecs are and what they do — but how does one decide which one to pick, from this myriad of options? Well, generally, there are three main factors to consider:

1. Bitrate and compression efficiency

A video’s bitrate is the amount of data transferred during its playback. It’s measured in megabits per second, and if you multiply it by the duration of the clip, you get the video’s file size. This attribute is quite useful, because it correlates with image quality — the higher the bitrate, the better picture you will get.

Here’s Tom Scott demonstrating the difference:

Screen Shot 2023-07-23 at 9.27.00 PM (2).png

Screen Shot 2023-07-23 at 9.33.29 PM (2).png

(If you are not sure which bitrate will be best for your video, check out this article.)

So, a more efficient codec is one that achieves the same level of quality at a smaller bitrate. For instance, VP9 produces files that are 40% smaller in size compared to VP8, and the picture is as good — if not better! However, this improvement comes at the cost of...

2. Encoding complexity

This parameter reflects how long it will take your computer to encode the video (that is, to burn-in the subtitles) with a particular codec. More often than not, the higher the codec’s complexity, the better its overall efficiency. To give you an example, H.266 takes twenty-six times longer to work its magic compared to H.265, but it saves around 50% of the bitrate. So, for the same video, you will get roughly equal quality at half the file size...

Which, you know, is great, but is it worth throttling your computer for a whole day instead of just one hour? Depends on your client’s requirements and budget, I guess.

3. Compatibility

You can use the most efficient codec and give it all the time in the world, but if it isn’t supported by whatever software/hardware your client is going to use, then it’s all for nothing. H.266 is cool, but most media players don’t even know how to open it!

✏️ Quick note

Unless you know exactly what you’re doing or you’ve been given specific instructions from the client, I recommend using MPEG-4 (also known as H.264). This is by far the most versatile, compatible, and safe-to-use codec out there, and it has quite decent complexity and efficiency. As for the extension, MP4 should do the trick.

If you want to learn more about the ins and outs of video codecs, so that you feel more confident working with them, head over here.

Quality Metrics (A Deus Ex Machina)

We’ve been mentioning quality a fair bit up to this point, but how do we actually measure it? Well...

Analysing video quality is no easy task. In theory, we could spend countless hours putting every single frame under a magnifying glass and documenting each little imperfection, but that would be not very practical — and also somewhat subjective. So, a more automated and structured approach is required.

Luckily, we don’t need to invent anything, as quality estimation algorithms already exist; they seek to quantify image degradation as perceived by an average viewer, using precise numbers. Just like codecs, these metrics are numerous, and they work in varied ways:

Peak signal-to-noise ratio (PSNR) assesses the amount of visual noise in the picture;
Structural Similarity Index (SSIM) looks at how much different the original and encoded videos’ pixels are;
Motion-based Video Integrity Evaluation (MOVIE) includes an analysis of how the image changes over time;
Video Multimethod Assessment Fusion (VMAF) combines several techniques for improved results;
Etc.

Among them, my favourite is VMAF, created by Netflix in cooperation with the University of Southern California. Why? — Well, it has several clear advantages for a subtitler:

It adequately reflects the level of video quality and outperforms many other similar metrics.
It offers a well-defined, easy-to-use grading system from 0 to 100, where 0 is horrendous and 100 is pristine.
It includes a concrete target threshold at which the quality loss becomes undetectable for 50% of people (94).
It takes into account the average viewing distance, which affects our subjective quality perception.
It has a separate mode for mobile videos.
There is a free Windows tool for estimating the VMAF score: https://github.com/fifonik/FFMetrics

This graph shows the VMAF score for each individual frame (1400 frames total)

Now let’s use this metric to compare the performance of some of the most popular tools for embedding subs!

Subtitle Burners (A Showdown! 🔥)

Ladies and gentlemen, onto our battle royale! ⚔️

Contenders

Test videos

The two clips I’ll be using for my evaluations include several visual elements known to be very difficult to encode without a considerable quality loss, such as smoke, confetti, a wide range of lights/shadows, etc.

SMOKING CLIP — 2048×1080, 50 fps, 4.44 Mbps, H.264 HD (1-1-1)
CONFETTI CLIP — 2048×1080, 25 fps, 5.58 Mbps, H.264 HD (1-1-1)

Encoding Specs

Same resolution, framerate, bitrate and codec as source. For testing, I will be burning-in an empty subtitle file — otherwise, the text itself would introduce a huge change to the pixel information, so it’d become a comparison between apples and oranges.

VMAF Scoring System

The exact score to aim for will generally depend on the video’s purpose (e.g. personal use, uploading to YouTube, broadcast, post-production, etc.), but from what I could find from reading web articles, exploring internet forums and doing my own testing, for us subtitlers, delivering anything under 90 points average is unacceptable, 90–92 can be considered tolerable, 93–95 — ideal, and over 95, the viewer won’t notice the difference, so at this point you’re just wasting the bandwidth by making the file larger than needed. Furthermore, there should be as few segments dipping below 90 as possible, because the drop in quality will most likely be perceptible to the viewer.

Results (as of July 2023)

Conclusions

The results are interesting, to say the least. Happy Scribe surprised me with its excellent level of quality, which, for a completely free, browser-based service is unheard of. Yes, it did cheat by auto-increasing the output bitrate a little bit (since you can’t adjust it yourself), but still, the performance is commendable.

Handbrake made me scratch my head with the overwhelming number of settings, and I did make a couple of mistakes here and there, which forced me to restart the encoding and waste a lot of computing time. Namely, I forgot to switch from Peak Framerate to Constant Framerate and to turn off the automatic resolution. If you decide to use this tool, make sure to watch/read the available tutorials, because it isn’t very newbie-friendly.

OOONA provided good quality, but it crashed on me a few times, and some of the settings didn’t quite work on occasion. Also, it uses an older version of FFmpeg (maybe that’s why?) On the bright side, the developers told me that there will be a new and improved version of the tool, so I’m looking forward to that.

Subtitle Edit clearly underperformed. It uses the same FFmpeg libraries as most of the other tools I checked, but somehow delivers worse results with identical settings. Oh well. I really like SE, so I hope they enhance the burning module and make it up to par.

SubtitleNEXT lags behind in the “paid tools” department. Almost no settings and suboptimal quality. Well, some work to be done here. At least the rest of the software is great!

Spot did very well, and if you already own a license, rest assured the burning tool produces quality results. At the same time, some adjustments feel somewhat manual and need to be automated — like, for instance, creating a separate font folder.

Finally, DaVinci Resolve did as expected — decent quality, and if you select “automatic bitrate”, you will get superb results. But just like Handbrake, it’s quite technical settings-wise, and since it’s a video editing tool rather than a subtitling one, there’s a learning curve to it.

So, is there a clear winner in this battle royale? — No, not at all! Some of the burners excel in one aspect, some — in another, and which tool to pick will ultimately depend on your personal needs and finances. But at least now you should be well-equipped to make a good, informed choice and slay our monster — the cursed image degradation!

All right, this is it for now. I hope you found this article useful. As always, if you have any questions, thoughts or remarks, feel free to write them in a comment below.

Cheers!

Burned-in Subtitles: A Journey to Quality