[FFmpeg-devel] [PATCH 10/14] swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats
Ramiro Polla
ramiro.polla at gmail.com
Sat Sep 28 19:10:00 EEST 2024
Hi Christophe,
On Sat, Sep 28, 2024 at 3:21 PM Christophe Gisquet
<christophe.gisquet at gmail.com> wrote:
> Le lun. 23 sept. 2024 à 16:19, Ramiro Polla <ramiro.polla at gmail.com> a écrit :
> > For bit depths <= 14, amax is 16-bit and offset is 32-bit.
> > For bit depths > 14, amax is 32-bit and offset is 64-bit.
>
> [...]
>
> > -static void chrRangeToJpeg_c(int16_t *dstU, int16_t *dstV, int width)
> > +static void chrRangeToJpeg_c(int16_t *dstU, int16_t *dstV, int width,
> > + int amax, int coeff, int64_t _offset)
> > {
> > + int offset = _offset;
> > int i;
> > for (i = 0; i < width; i++) {
> > - dstU[i] = (FFMIN(dstU[i], 30775) * 4663 - 9289992) >> 12; // -264
> > - dstV[i] = (FFMIN(dstV[i], 30775) * 4663 - 9289992) >> 12; // -264
> > + dstU[i] = (FFMIN(dstU[i], amax) * coeff + offset) >> 14;
> > + dstV[i] = (FFMIN(dstV[i], amax) * coeff + offset) >> 14;
> > }
> > }
> >
> > -static void chrRangeFromJpeg_c(int16_t *dstU, int16_t *dstV, int width)
> > +static void chrRangeFromJpeg_c(int16_t *dstU, int16_t *dstV, int width,
> > + int amax, int coeff, int64_t _offset)
> > {
> > + int offset = _offset;
> > int i;
> > for (i = 0; i < width; i++) {
> > - dstU[i] = (dstU[i] * 1799 + 4081085) >> 11; // 1469
> > - dstV[i] = (dstV[i] * 1799 + 4081085) >> 11; // 1469
> > + dstU[i] = (dstU[i] * coeff + offset) >> 14;
> > + dstV[i] = (dstV[i] * coeff + offset) >> 14;
> > }
> > }
> >
> > -static void lumRangeToJpeg_c(int16_t *dst, int width)
> > +static void lumRangeToJpeg_c(int16_t *dst, int width,
> > + int amax, int coeff, int64_t _offset)
> > {
> > + int offset = _offset;
> > int i;
> > for (i = 0; i < width; i++)
> > - dst[i] = (FFMIN(dst[i], 30189) * 19077 - 39057361) >> 14;
> > + dst[i] = (FFMIN(dst[i], amax) * coeff + offset) >> 14;
> > }
>
> I'm a bit surprised by some of these formulas and the range you assert
> above. Somehow you make it clear by casting the offset parameters to
> narrower types, so all of the following is a non-issue.
>
> So, maybe some cases are special and so on, and you are working within
> some API constrain, but the code excerpts bother me a bit:
> * If dst as input is 16 bits, I don't see why amax would really be an
> int (except native integer etc)
> * If dst[i] = anything >> 14, with dst[i] 16 bits, you want "anything"
> to be at most of 30 bits of dynamics. offset doesn't sound like it
> could be of a wider range, and not 64 bits.
> * Any version that does treat things as 64 bits computations, will
> essentially halve the likely throughput of the SIMD, besides
> potentially using way slower computations and restricting these to
> fewer archs.
>
> I expect you know that, but I'd have expected an intermediate type of
> functions with parameters eg '(int16_t *dst, int width, int16_t amax,
> int16_t coeff, int32_t _offset)'
> Maybe to avoid issues with the weird Win ABI where the MSBs can be
> garbage, that can indeed be '(int16_t *dst, ptrdiff_t width, int amax,
> int16_t coeff, int32_t _offset)'.
I think you missed that the internal representation (the contents of
dst) is 15 bits for bit_depth <= 14, and 19 bits for bit_depth > 14.
It is confusing, but when I say bit depths > 14, I mean input bit
depths > 14, which have an internal representation of 19 bits (and
therefore the contents of dst are int32_t). Look at chrRangeToJpeg16_c
for example.
For bit depth > 14, I had to raise the precision of the calculation in
order to respect the mpeg range. In this case, the shift is 18 bits
instead of 14, and the offset becomes greater than 32 bits.
These are the actual values calculated for each case (hopefully my
mail client doesn't mangle them):
lum from 8 coeff 14071 (0x0036f7) offset 33553280 ( 0x0001fffb80)
chr from 8 coeff 14393 (0x003839) offset 33528960 ( 0x0001ff9c80)
lum to 8 coeff 19078 (0x004a86) offset -39092480 (-0x0002548100)
chr to 8 coeff 18652 (0x0048dc) offset -38215680 (-0x0002472000)
lum from 10 coeff 14030 (0x0036ce) offset 33544640 ( 0x0001ffd9c0)
chr from 10 coeff 14350 (0x00380e) offset 33554880 ( 0x00020001c0)
lum to 10 coeff 19134 (0x004abe) offset -39204096 (-0x0002563500)
chr to 10 coeff 18707 (0x004913) offset -38332416 (-0x000248e800)
lum from 12 coeff 14020 (0x0036c4) offset 33535520 ( 0x0001ffb620)
chr from 12 coeff 14340 (0x003804) offset 33538080 ( 0x0001ffc020)
lum to 12 coeff 19148 (0x004acc) offset -39232000 (-0x000256a200)
chr to 12 coeff 18720 (0x004920) offset -38338560 (-0x0002490000)
lum from 14 coeff 14017 (0x0036c1) offset 33549698 ( 0x0001ffed82)
chr from 14 coeff 14337 (0x003801) offset 33550338 ( 0x0001fff002)
lum to 14 coeff 19151 (0x004acf) offset -39223936 (-0x0002568280)
chr to 14 coeff 18723 (0x004923) offset -38332416 (-0x000248e800)
lum from 16 coeff 224260 (0x036c04) offset 8589631520 ( 0x01fffb6020)
chr from 16 coeff 229380 (0x038004) offset 8589672480 ( 0x01fffc0020)
lum to 16 coeff 306429 (0x04acfd) offset -10041292800 (-0x025681f800)
chr to 16 coeff 299589 (0x049245) offset -9817128960 (-0x0249258000)
Therefore, for bit depths <= 14, we have:
- shift: 14
- dst: 15 bits
- coeff: 15 bits unsigned
- offset: 27 bits signed
And for bit_depths > 14, we have:
- shift: 18
- dst: 19 bits
- coeff: 19 bits unsigned
- offset: 35 bits signed
The API is indeed a bit confusing because it is shared for all bit
depths. It should be:
(int16_t *dst, int width, uint16_t coeff, int32_t offset) for bit_depth <= 14
and
(int32_t *dst, int width, uint32_t coeff, int64_t offset) for bit_depth > 14
In the code we cast dst from int16_t * to int32_t * in the 16-bit
functions, and cast offset from int64_t to int (it could be int32_t)
in the non-16-bit functions. Suggestions on how to make this clearer
without added complexity or performance hits are welcome.
By the way, the v2 patchset removed the need for amax by saturating
the output instead of limiting the input, but your points are still
valid.
Ramiro
More information about the ffmpeg-devel
mailing list