DNG image support for FFmpeg - GSoC 2019

21/08/2019 #gsoc #ffmpeg

Introduction

It’s been two years since I’ve had the chance to apply to GSoC and I was quite excited for it.

When looking at project ideas for FFmpeg, there were a few that caught my attention, but DNG decoding seemed like the most approachable for someone new to the multimedia world like myself.

The Digital Negative (DNG) format is Adobe’s image standard that was created to store image data in a generic, highly-compatible format, unlike RAW files that have specific formats based on manufacturer and camera type.¹

Adobe maintains a list of native DNG compatible cameras here and there’s another list here. The selection consists mainly of Casio, Leica, Pentax, Ricoh and several Samsung cameras.

I promptly started reading the DNG & TIFF Specifications and realized that I understood more than I expected. Both were pretty self-contained and TIFF is such a well known and used format that I could find additional resources on it pretty easily.

Coming from the emulator development scene, I often don’t have nice, official specs to base my understanding and code on, so that was a nice change of pace!

After looking at the specs I decided I wanted to give this a try. I started lurking on IRC, set up a development environment and dove into the code.

FFmpeg is pretty large, but it’s in C which is itself pretty simple and its file structure is clear, so browsing the code wasn’t too hard. Initially I mostly looked at TIFF decoding, alongside reading the specs.

Qualification Patch

I realized that a bit of code (relating to IFD parsing) could use some work and that it lacked support for multi-page TIFFs, so I pushed a patch for it:

avcodec/tiff: Multi-page support (merged)

I also started work on DNG and sent the following:

avcodec/tiff: Add support for recognizing DNG files (not merged in this form)

which resulted in a regression and didn’t work correctly with CinemaDNG images, so it was not merged as it is above.

Proposal

While getting the above patch upstreamed I wrote up the proposal: DNG Raw Image Format Support

Thankfully FFmpeg’s guidelines for the proposal are pretty lenient, which allowed me to spent less time on that and more time on code and the project itself.

Coding Period

Initial Patchset

A couple of days after the coding period started I pushed three commits, here they are listed with short descriptions:

avcodec/tiff: Option to decode embedded thumbnail
Added the ability to decode thumbnails — particularly useful for DNG images that have SubIFDs that are not necessarily thumbnails.

libavcodec/tiff: Process SubIFDs tag with multiple entries
Improved (Sub)IFD iteration logic and removed some related assumptions.

avcodec/tiff: Recognize DNG/CinemaDNG images
Added some preliminary identification & tag parsing functionality for DNGs.

More details are of course in each commit’s description.

Bulk of the work

After this, I scoured the internet for more DNG samples and started analyzing them. After inspecting files and implementing a few things here and there, it became apparent that almost all DNGs embed Lossless JPEGs in tiles for their main image data (which is also bayer-filtered) so I would need to handle this somehow. Embedded thumbnails (which most DNGs have) are either uncompressed or more commonly baseline JPEGs, which we could already decode (and after the -thumbnail commit could do so at will).

So, I knew I would have to invoke FFmpeg’s JPEG decoder at some point. I asked my mentor and he suggested that I look at how the TDSC decoder does it, which proved to be quite helpful.

Prior to that I had implemented support for tiled images. Basically, for lower-resolution images, the standard TIFF method of breaking the image into strips is adequate. However high-resolution images can be accessed more efficiently — and compression tends to work better — if the image is broken into roughly square tiles instead of horizontally-wide but vertically narrow strips. Since DNG images tend to be quite large, that’s what most of them use: tiles of Lossless JPEGs.

Later on I implemented DNG color scaling: the mapping of stored raw sensor values into linear reference values. It consists of 4 steps: Linearization (LUT lookup, optional), Black Subtraction, Rescaling and Clipping (to a 0.0-1.0 range).

After this, the pixels are usually bayer-filtered so we have to de-bayer the image, but that was already implemented so I didn’t have to do much there.

Invoking the JPEG decoder, tiles and color scaling were all included in this commit:

lavc/tiff: Decode embedded JPEGs in DNG images

However, the Lossless JPEG decoding code in FFmpeg couldn’t handle the embedded files, so I had to modify it. Figuring this out probably took longer than the rest of the three things above combined! Here’s the commit:

lavc/mjpegdec: Decode Huffman-coded lossless JPEGs embedded in DNGs

Studying dcraw’s code, comparing it with FFmpeg’s in debugging sessions and diff-ing values passing through the inner loops was instrumental in figuring this out.

Last Month

After implementing proper color scaling, most images looked dark and I realized that I needed to convert them from linear sensor values to the sRGB color space:

lavc/tiff: Convert DNGs to sRGB color space

I also fixed an issue that I had come across in the beginning, where I was getting a “mjpeg_decode_dc: bad vlc: 0:0” error. Turns out that DNG JPEGs from that specific encoder had huffman codes that contained a bad mapping.

The issue was fixed here:

lavc/jpegtables: Handle multiple mappings to the same value

August was generally a whole lot of bug fixing and additions to improve compatibility with DNGs I could find. There were too many changes to elaborate on all of them here and some of them I don’t even remember because they were squashed with other commits.

In summary, I:

Improved support for uncompressed DNGs (like applying proper color scaling)
Added support for decoding of DNGs with single-component JPEGs ([1], [2])
Added support for 10-bit and 14-bit DNG images
Added support for DNGs with striped (non-tiled) JPEGs images
Added support for decoding of LinearRaw images (non-bayer)
Fixed various bugs ([1], [2], [3], [4], [5], [6])

Merged Work

All my merged commits from before and during GSoC can be viewed here: https://github.com/FFmpeg/FFmpeg/commits?author=VelocityRa&until=2019-09-03

Most links in the rest of the post are yet unmerged (from my own fork) and may be slightly different, so please use the link here for the final merged revisions.

I’m happy to say that all DNGs that I have found (tens of them, encoded with unique encoders or having unique specs) — save for 2 that I didn’t have time to fix yet — can be decoded.

Future Work

Mapping Camera Color Space to CIE XYZ Space

The DNG spec defines a processing model for mapping between the camera color space coordinates (linear reference values) and CIE XYZ color space (with a D50 white point). I didn’t have enough time to implement this, so some images’ colors look off.

After looking into it, I realized implementing this is much more involved than the color scaling mentioned above and I couldn’t simply sacrifice a bugfix or two for it. I’d probably need weeks and it would add a lot of complexity to the code.

Refer to Chapter 6 of the DNG specification for details on the color space mapping process.

Post-processing

After color scaling and colorspace transformations are applied, we could do some post-processing, like dcraw does. Some examples of what could be done:

White-balancing
Better demosaicing with ie. Adaptive Homogeneity-Directed (AHD) interpolation
Median filter
Wavelet denoising

Missing DNG Features

There are some DNG features that are defined in the spec but aren’t used by most images (encoders). Nonetheless, they would be welcome additions for completeness.

The features I can think of are support for:

Masked Pixels (“MaskedAreas” DNG tag)
Floating Point Image Data
Opcode Lists
Proxy DNG Files

The DNG spec is huge and I knew I wouldn’t have time for all the features it allows for, but I bet on most actual images being decodeable with just some basic decoder features and thankfully it proved to be the case.

Tools 🛠️

010 Editor

Since before getting my first patch merged, I’ve been using 010 Editor with a custom Binary Template for DNGs. I extended its built-in TIFF template by adding SubIFD iteration and DNG/CinemaDNG tags. It proved to be an extremely useful tool to have throughout the project, I used it almost every day.

I’ve used it in a bunch of reverse engineering projects (mostly relating to emulation work) so it was neat finding a use for it in here as well - just goes to show how robust it is. I wish it was OSS but you can’t have everything. :)

ExifTool

I found out that ExifTool’s -htmlDump option does something very similar (better in some respects), so that’s a nice OSS alternative too.

dcraw

dcraw is an old ANSI C program that could decode pretty much any raw image from any digital camera. It’s pretty well known - it or forks of it are used in many projects that need such functionality. I used it as a reference in some parts of the project.

gdbgui

I’ve been using WSL for building ffmpeg, so my debugger options were pretty limited. Initially I used gdb, but after finding out about gdbgui my productivity definitely increased. I had used it a few years ago, and it’s gotten much better since.

JPEGsnoop

JPEGsnoop helped me understand how Huffman coding information - stored in the DHT marker - is encoded in JPEGs and it was useful in debugging a related issue with FFmpeg code (the one fixed in this commit).

Beyond Compare

I used Beyond Compare to diff huffman-decoded values and a few other things with the ones outputted by dcraw. When I was writing my PS1 emulator) I tried a lot of diffing tools and this was by far the best one. It can handle 30GB+ log files while being quite responsive and the GUI is simple and convenient. Unfortunately it’s paid and closed-source, just like 010 Editor.

Acknowledgements

I’d like to thank:

My mentor, Paul B Mahol (durandal_1707 on IRC) for helping me throughout the project, by answering my questions and guiding me when I wasn’t sure on how to progress.
The entire FFmpeg team for maintaining and constantly improving FFmpeg.
- Open Source is largely a thankless job so it’s important to show appreciation. Not to mention that FFmpeg is without a doubt the best suite of libraries for multimedia and the world is better for it.
Google for GSoC, a program that helps introduce so many students to open source, working with non-trivial projects and collaborating with other people
- I’m not sure if it’s the case for every university, but the kind of complexity you normally see in a Computer/Software Engineering degree is a lot lower than a real-world project such as this, so it’s a very valuable experience for students that haven’t been exposed to it.

Closing thoughts

As someone who is working and has worked on side-projects and open-source anyway for years as a hobby, being directly funded for it is amazing.

Other than that, it gave me a chance to work on a domain that I didn’t have much experience in (I still don’t, but less so!), that I maybe wouldn’t have went out of my way to do otherwise.

It’s unfortunate I can’t have this experience again as a student, but I will probably pursue being a mentor in the following years; I’ve been an unofficial mentor this year for a Kodi project and it’s also been quite fun.

I hope I’ll be motivated to write more on here in the future, but until then, cheers and thank you for reading!

Here’s a comparison with RAW formats: https://photographylife.com/dng-vs-raw ↩