[FFmpeg-devel] [RFC/PATCH] MV-HEVC decoding

Mon Sep 9 20:45:57 EEST 2024

Thanks for the patch set!  Ran some initial tests and here are some
findings:

1. Tested using the HTM ref SW encoded bitstreams; the HTM encoded 2 layer
MV-HEVC elementary streams are decoded by both the HTM decoder and ffmpeg,
and both resulted in the same decoded pixels for both views.

2. Tested with the iPhone captured sample videos (some samples are
available from
https://blog.frame.io/2024/02/01/how-to-capture-and-view-vision-pro-spatial-video/).
To decode using the HTM decoder, converted the .mov file to .hevc
elementary stream (using "ffmpeg -view_ids -1 -i sample.mov -vcodec copy
sample.hevc").  The iPhone captured MV-HEVC seems to be non-conformant as
for the very first AU we have the following NAL units: prefix SEI (layer ID
0), CRA (layer ID 0), prefix SEI (layer ID 0), CRA (layer ID 1).  Based on
Section 7.4.2.4.4 of the HEVC spec, a prefix SEI with layer ID 0 indicates
a new AU, so the bitstream is not conformant for the 2 layer case.  The
reference HTM decoder seems to be tolerant to this and correctly decodes
the "non-conformant" elementary stream, but ffmpeg does not.  However,
ffmpeg does decode the .mov container file correctly as each AU is
correctly passed to the decoder as a video sample.  Aside from this
con-conformance issue, both the HTM decoder and ffmpeg seem to produce
pixel exact views.  For the view by view comparison, I had to manually skip
some initial frames output by the HTM decoder, as ffmpeg did not output
some of the first few views/frames.

3.  When ffmpeg skips some of the decoded views for output, I think it
makes sense to skip both views of an AU (when both views are being
output)?  Currently one view from an AU can be skipped without skipping its
corresponding other view.  To avoid some of the views from being skipped, I
had to use the "-vsync 0" option for my tests.  When I use this option, I
see the following warning: "Application provided invalid, non monotonically
increasing dts to muxer in stream 0: 1 >= 1".  This seems due to both views
of an AU having the same DTS, which is expected.

4. Ran ffmpeg with the "-vf showinfo" option and the side data seems to
indicate correct info for the views.  I tested both cases where the base
layer corresponds to either left or right view, and both cases seem to
result in correct side data.

BR,
Danny