Watching Videos on the Command Line - Part 1

This is part one of (probably) two. It’ll be a pretty rambly story mostly about why I developed three versions of a terminal video player. There will be a second part that will go deeper into the technical side (like what ffmpeg APIs to call and why, explained in plain english).

So I’ve written a video player that works right in the terminal. Three times in fact. I think as far as there’s been any motivation other than my own curiosity, the thing I’d point to would be the telnet star wars experience. That is a very cool experience, and undoubtedly an amazing project, but it most certainly is not a universal movie viewing experience but rather a unique project that likely took hundreds of hours to complete.

I vaguely remember that I had seen a universal terminal video player before embarking on this project, and there are a few of tools that for example show images right in your terminal when you ls a folder. What I found at the time in this regard though were mainly things like iTerm2’s "inline images protocol" that, while able to display images at resolutions that far exceed the surrounding terminal, relies on special escape codes that are unlikely to work in other terminal emulators, and likely also aren’t made for video, so their performance is a big question mark. That’s not the kind of shortcut that I wanted to take when watching videos in my terminal.

So then, without further research into what tools might already exist that would satisfy my curiosity, as I often tend to do when my urge to experiement and make something myself is greater than my need for a polished ready-made tool, I set out on this journey with ffmpeg in one hand and golang in the other.

The idea I settled on was to use the unicode "full block" character █ as pixels, or rather two of them each to make for a more square appearance ██. These would then be colored by ANSI escape sequences, specifically those for 24-Bit color: ESC[38;2;⟨r⟩;⟨g⟩;⟨b⟩m.

watch⌗

The first program I made between uni lectures (and sometimes during, but don’t tell the lecturers), called watch (on GitLab), was about as crude as it gets. It called ffmpeg as a subprocess, using stdout and stderr to pipe video and audio respectively back to the main process, thus implicitly relying on either the ffmpeg process or the golang os/exec package to buffer those streams. It also had horrible frame timing, mainly because I had no idea what I was really doing, and you’re lucky if you can get it to stay in sync with the audio for more than like five seconds. Perhaps its single saving grace, and something that might make this approach worth revisiting, is that the ffmpeg CLI seems to be more stable than their API, so when its successor, watch2, broke due to unmaintained API bindings, watch kept on working. Barely, mind you, but working.

This first version was far from perfect, but it satisfied my curiosity at the time enough to let my attention wander to newer and shinier ideas. The thing that then made me come back around to it was when a semester or two later, in our lecture on operating systems, we had the option to gather bonus points through a variety of small projects. One of those projects, the last one on the list, happened to be to make a video player from scratch using the ffmpeg API. The suggestion was to use C, construct a GUI window and have ffmpeg draw to that. But I already had a somewhat working DIY video player on hand, so I asked the lecturer if it would be valid if I just converted that one to use the ffmpeg API instead of CLI, and the answer happened to be yes, so I set out on my journey once again, with ffmpeg in one hand and golang in the other. Or rather the ffmpeg API docs and ffmpeg API bindings for golang, respectively.

watch2⌗

This next version then, called watch2 (on GitLab), was mainly an excercise in reading API documentation and figuring out what steps are needed to get video data frame by frame fed into my program in a format that it could understand. With this rewrite also came significant improvements to the way I handled frame timing, so watch2 was in fact able to keep audio and video in sync for a lot longer than its predecessor.

The fatal problem with watch2 ended up being that the golang ffmpeg API bindings that I used were not actively maintained anymore, even before I started the project. There also didn’t seem to be any alternatives that I could find, so as new ffmpeg major versions came out, watch2 fell into disrepair and wouldn’t work anymore.

Another few years later I get another wave of curious friends and coworkers who of course wanted to see the terminal video player in action when I told them about it. Together with the desire to try out rust on something more than a "hello world" app, as well as the discovery that rust has well-maintained ffmpeg API bindings, came then the decision to rewrite watch one more time. So then, with rust in one hand this time, and better ffmpeg API bindings in the other, I set out on my journey for the final time (for now anyway).

watch2r⌗

The final version, called watch2r (on Gitlab), was perhaps not the easiest first project one could have chosen for learning a new language, but it was far from difficult, thanks to rusts helpful compiler errors; in most cases it was more difficult to figure out how to deal with breaking changes to the ffmpeg API. That doesn’t necessarily mean though that the code will win any beauty contests: to anyone more familiar with rust than I am, it probably looks like a go programmers attempt to convert her code to rust (which, frankly, is exactly what it is, so they’re not wrong).

watch2r does however come with numerous improvements, both in terms of performance, as well as user experience. First, it uses the unicode "upper half block" character ▀ and can thus put four pixels in the same space that were previously occupied by just one, by using background and foreground coloring to put two pixels in each character, rather than needing two characters per pixel. Another improvement is an algorithm to reduce the number of color escape sequences emitted if the color of two subsequent pixels (within the same row) are of sufficiently similar color; this doesn’t really seem to improve the performance significantly, but every little bit helps. Apart from those two major things, there are a few minor improvements, like more accurate frame skipping and improved CLI parameters (& documentation).

Another thing I did with this final version was to try running it in a different terminal emulator. Up until then, I had always tested in iTerm2, which is my daily driver for terminal needs on my mac. By recommendation from my girlfriend I then also tried out Alacritty, which is a GPU-accelerated terminal emulator. This yielded a drastic improvement in performance: instead of a 64x36 resolution at 15 fps, the video playback could now run at 304x171 with 25 fps without major issues (i.e. more than a 4-fold increase in side length, with a more than 22-fold increase in total pixels).

Conclusion⌗

Over the course of this project, especially during the two later implementations, I learned quite a bit about video (container) formats due to the way that the ffmpeg API works closely to those structures. I’m also personally really happy with the results, especially watch2r is quite a usable piece of software. That is if you don’t need features like pausing the video or seeking or adjusting the volume. Maybe I’ll add those features in the future, but I think they’ll need even more AV-programming know-how than just playing a video straight from front to back without interruptions.

After finishing the development of watch2r I was informed of the existence of mpv, which is also a terminal video player. mpv has a variety of output modes, and one of them (mpv --vo=tct to be precise) yields a similar in-terminal rendering using unicode block characters like my program does. mpv also has cool features like being able to play YouTube videos and Twitch streams, and also media player controls. One thing I’m proud to say it doesn’t have (at least as far as I’ve been able to test) is better performance ✨

Thank you for reading my first full blog post!

If you have any feedback or comments, you can leave them under this fedi post here.