[FFmpeg-devel] [PATCH] improve yuv422p to rgb in libswscale
Baptiste Coudurier
baptiste.coudurier
Wed Dec 1 04:26:42 CET 2010
On 11/30/2010 07:13 PM, Michael Niedermayer wrote:
> On Tue, Nov 30, 2010 at 07:05:05PM -0800, Baptiste Coudurier wrote:
>> On 11/30/2010 06:53 PM, Michael Niedermayer wrote:
>>> On Tue, Nov 30, 2010 at 06:29:32PM -0800, Baptiste Coudurier wrote:
>>>> On 11/30/2010 06:15 PM, Michael Niedermayer wrote:
>>>>> On Tue, Nov 30, 2010 at 05:20:42PM -0800, Baptiste Coudurier wrote:
>>>>>> On 11/30/2010 05:13 PM, Michael Niedermayer wrote:
>>>>>>> On Tue, Nov 30, 2010 at 04:07:20AM -0800, Baptiste Coudurier wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> $subject, use full vertical data when convert 422p, improve quality a lot.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Baptiste COUDURIER
>>>>>>>> Key fingerprint 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
>>>>>>>> FFmpeg maintainer http://www.ffmpeg.org
>>>>>>>
>>>>>>>> x86/yuv2rgb_template.c | 25 ++++-------
>>>>>>>> yuv2rgb.c | 109 ++++++-------------------------------------------
>>>>>>>> 2 files changed, 26 insertions(+), 108 deletions(-)
>>>>>>>> 16f384c9b114c76572a539511c42267ce2942c67 yuv422p_to_rgb.patch
>>>>>>>
>>>>>>> this looks like it would make 420p->rgb quite a bit slower.
>>>>>>
>>>>>> Do you think changing>>1 to>>vshift would make is quite a bit slower ?
>>>>>
>>>>> no but that stuff:
>>>>> @@ -152,134 +144,102 @@
>>>>> YUV2RGBFUNC(yuv2rgb_c_48, uint8_t, 0)
>>>>> LOADCHROMA(0);
>>>>> PUTRGB48(dst_1,py_1,0);
>>>>> - PUTRGB48(dst_2,py_2,0);
>>>>>
>>>>> LOADCHROMA(1);
>>>>> - PUTRGB48(dst_2,py_2,1);
>>>>> PUTRGB48(dst_1,py_1,1);
>>>>>
>>>>> LOADCHROMA(2);
>>>>> PUTRGB48(dst_1,py_1,2);
>>>>> - PUTRGB48(dst_2,py_2,2);
>>>>>
>>>>> LOADCHROMA(3);
>>>>> - PUTRGB48(dst_2,py_2,3);
>>>>> PUTRGB48(dst_1,py_1,3);
>>>>> ENDYUV2RGBLINE(48)
>>>>> LOADCHROMA(0);
>>>>> PUTRGB48(dst_1,py_1,0);
>>>>> - PUTRGB48(dst_2,py_2,0);
>>>>>
>>>>> LOADCHROMA(1);
>>>>> - PUTRGB48(dst_2,py_2,1);
>>>>> PUTRGB48(dst_1,py_1,1);
>>>>> ENDYUV2RGBFUNC()
>>>>>
>>>>> and then running the code twice
>>>>
>>>> This is the C code, the mmx routine for yuv420 to rgb24 is already present.
>>>>
>>>> I don't understand running the code twice, can you please clarify ?
>>>
>>> yes, iam seeing this:
>>> - for (y=0; y<srcSliceH; y+=2) {\
>>> + for (y=0; y<srcSliceH; y++) {\
>>>
>>> and thus iam thinking it runs the code twice after the patch,
>>> do i miss something?
>>
>> The old code was processing 2 lines at once, see py_1 and py_2, that's
>> why a lot of code is removed in the macros.
>
> yes but i think its slower after the patch
I guess it is a bit slower because of LOADCHROMA.
>>
>> Do you agree that the mmx change looks trivial ?
>
> yes mmx looks good
Well, problem now is that mmx routine will have different behaviour, and
will be better.
I mean, the C routine is not supposed to be that fast is it ?
Also adding a check for some user parameters would require adding tests
in a lot of places I fear.
--
Baptiste COUDURIER
Key fingerprint 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
FFmpeg maintainer http://www.ffmpeg.org
More information about the ffmpeg-devel
mailing list