[FFmpeg-devel] [RFC] A file format to store generic raw image/video files

Wed Oct 18 01:11:38 EEST 2023

# RFC: A file format to store generic raw image/video files

Context: Developers and researchers often want to use raw image/video
files, as they avoid the effects of encoding. Dealing with raw
image/video files is typically done using raw files. This is an
operational pain point, as the metadata about the file needs to be
carried out on the side. Raw video/image metadata includes (at least)
pixel format (aka `pix_fmt`), width, height, framerate, and others.

For example, in order to read a specific rgba file that I got from a
shader, I need to write:
```
$ ffmpeg -y -f rawvideo -video_size 1920x1080 -pix_fmt rgba -i
post_shader.1920x1080.rgba -vf scale="out_color_matrix=bt601" out.png
```

# Y4M: A YUV, Planar Raw File Format

This problem has been partially addressed in
[y4m](https://wiki.multimedia.cx/index.php/YUV4MPEG2). y4m is a raw
(uncompressed), annotated image/video format for YUV content. It
supports both images and video (a set of consecutive frames). It also
supports framerates, interlaced/progressive content, aspect, location
of subsampled chroma planes, and color ranges.

The main limitation of y4m is that it only supports a reduced set of
planar, 8-bit, YUV formats (444, 422, 420): yuv420p, yuv422p, and
yuv444p.

# R4M: A Generic Raw File Format

We propose a new raw image/video format, R4M, that extends Y4M to support:
* (a) every `pix_fmt` that ffmpeg supports. The implementation would
support generic AVPixFmtDescriptor's, so any new pixel format based on
it will be automatically supported. This includes
planar/packed/semiplanar, yuv/rgb/bayer, and others.
* (b) different bit sizes (this is part of `pix_fmt`),
* (c) strides (pixel, row, plane), to allow devices to just dump their
buffers directly, and
* (d) flexible color information (h273, ICC, etc.).

# Details

R4M should support any raw image/video format that can be used by ffmpeg.

R4M would keep the current Y4M features.
* (1) support all y4m image settings
  * width
  * height
  * progressive/interlaced (top/bottom field first)/mixed mode
  * aspect ratio
  * chroma subsampling [implicit in `pix_fmt`]
  * chroma location
  * XYSCSS=420JPEG [this is the same than chroma subsampling + chroma location]
  * color range [implicit in color info]
* (2) support all y4m video settings
  * framerates (as ratio)

In terms of implementation, R4M would keep the Y4M format structure:
text-based (readable) per-file and per-frame text headers, followed by
explicitly-sized, per-frame, binary dumps of each frame. In terms of
file format, we would keep the text-based header format (with a
different header ID). As for the field list, we would add a
human-readable dump of the `AVPixFmtDescriptor` struct, and remove the
fields that are implicit in the `AVPixFmtDescriptor` dump (e.g. remove
the "C" (colorspace) item).

R4M would add the following new features:
* (3) support every `pix_fmt` in ffmpeg (including
planar/packed/semiplanar, yuv/rgb/bayer, etc.), and in general any
`pix_fmt` representable using a `AVPixFmtDescriptor` struct.
  * planar/packed/semiplanar
  * yuv/rgb/bayer

* (4) support different bitsizes [implicit on `pix_fmt`]
  * 8/9/10/12/14/16-bit color/other
  * little-endian/big-endian
  * float formats

* (5) support image strides [implicit on `pix_fmt`]
  * pixel stride
  * row stride
  * plane stride
  * different values for different planes (e.g. Y, U, V, UV)

* (6) support for flexible color information (a set of new fields).
  * h273: cp, tc, mc, range
  * ICC profile (restricted/unrestricted)
  * nclx (jpeg-xr color definition, ISO/IEC 29199-2/T.832)

* (7) dynamic per-frame values
  * goal is to support videos where e.g. the resolution changes in the middle
  * support per-FRAME settings

In terms of implementation, the suggestion is to write a thin library
inside ffmpeg that can read from R4M files to ffmpeg buffers, and that
can write from ffmpeg buffers to R4M files. An alternative is doing it
as an external library. Rationale for in-ffmpeg library is this
approach will make the code easier to land and better integrated with
ffmpeg. In fact, this is how ffmpeg supports y4m right now
(libavformat/yuv4mpeg* instead of any of the y4m libraries around).
There is some value in an external library that supports it (e.g. I'd
be interested in adding imlib2 support at some point, as I use it for
quickly seeing images using feh), but I expect the source of truth to
be what ffmpeg has. This is IMO the case for y4m.

# Q&A

Q1: Would this new format be acceptable in ffmpeg? If so, I could
write a more detailed proposal with the R4M format, following the
guidelines suggested here.

Q2: Anybody has any other suggestion for a feature that should be
added for a raw image/video file format?

Q3: Anybody is interested in collaborating with the format? I would be
more than happy to use a github repo for the format description.

Thanks,
-Chema