[FFmpeg-devel] [PATCH] Support > 8 bit input in yuv2rgb.

Sat Nov 9 13:05:53 CET 2013

On 09.11.2013, at 12:37, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Fri, Nov 08, 2013 at 10:47:20PM +0100, Reimar Döffinger wrote:
>> Significantly faster than the default path (which defaults to
>> bicubic scaling even if no real scaling happens), though
>> the templating is kind of ugly and increases code size a bit.
>> 
>> Signed-off-by: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
>> ---
>> libswscale/swscale_unscaled.c |   3 +
>> libswscale/yuv2rgb.c          | 550 ++++++------------------------------------
>> libswscale/yuv2rgb_template.c | 458 +++++++++++++++++++++++++++++++++++
>> 3 files changed, 537 insertions(+), 474 deletions(-)
>> create mode 100644 libswscale/yuv2rgb_template.c
>> 
>> diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
>> index 83086f7..8842f35 100644
>> --- a/libswscale/swscale_unscaled.c
>> +++ b/libswscale/swscale_unscaled.c
>> @@ -1217,6 +1217,9 @@ void ff_get_unscaled_swscale(SwsContext *c)
>>     }
>>     /* yuv2bgr */
>>     if ((srcFormat == AV_PIX_FMT_YUV420P || srcFormat == AV_PIX_FMT_YUV422P ||
>> +         srcFormat == AV_PIX_FMT_YUV420P9 || srcFormat == AV_PIX_FMT_YUV422P9 ||
>> +         srcFormat == AV_PIX_FMT_YUV420P10 || srcFormat == AV_PIX_FMT_YUV422P10 ||
>> +         srcFormat == AV_PIX_FMT_YUV420P16 || srcFormat == AV_PIX_FMT_YUV422P16 ||
>>          srcFormat == AV_PIX_FMT_YUVA420P) && isAnyRGB(dstFormat) &&
>>         !(flags & SWS_ACCURATE_RND) && (c->dither == SWS_DITHER_BAYER || c->dither == SWS_DITHER_AUTO) && !(dstH & 1)) {
>>         c->swscale = ff_yuv2rgb_get_func_ptr(c);
>> diff --git a/libswscale/yuv2rgb.c b/libswscale/yuv2rgb.c
>> index 77c56a9..28de37e 100644
>> --- a/libswscale/yuv2rgb.c
>> +++ b/libswscale/yuv2rgb.c
>> @@ -54,72 +54,72 @@ const int *sws_getCoefficients(int colorspace)
>> }
>> 
>> #define LOADCHROMA(i)                               \
>> -    U = pu[i];                                      \
>> -    V = pv[i];                                      \
>> +    U = pu[i] >> shift;                             \
>> +    V = pv[i] >> shift;                             \
>>     r = (void *)c->table_rV[V+YUVRGB_TABLE_HEADROOM];                     \
>>     g = (void *)(c->table_gU[U+YUVRGB_TABLE_HEADROOM] + c->table_gV[V+YUVRGB_TABLE_HEADROOM]);  \
>>     b = (void *)c->table_bU[U+YUVRGB_TABLE_HEADROOM];
> 
> are the shifts faster than bigger tables ?
> (it would be slightly more accurate with bigger tables)

I haven't tested. But note that I also added 16-bit support, we are talking about 256 times larger table.
If keeping the shift for Y that would still be around 48 MB if I calculated right?
There could be a "compromise" by making the tables for y, u and v 9 bit and only shifting for > 9 bit, to get better precision. That would only increase their size 4x I believe.
I guess even making the tables 10 bit might still be reasonable...
However in both cases I think that will mean further changes since it would also need to increase the HEADROOM stuff.