[FFmpeg-devel] [PATCH] make building swscale rgb template conditional

Sun Sep 5 08:22:06 CEST 2010

On Thu, Aug 26, 2010 at 4:35 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Thu, Aug 26, 2010 at 06:24:37AM +0200, Luca Barbato wrote:
>> On 08/25/2010 04:22 PM, Michael Niedermayer wrote:
>> > On Wed, Aug 25, 2010 at 04:14:21PM +0200, Luca Barbato wrote:
>> >> On 08/25/2010 10:18 AM, Michael Niedermayer wrote:
[...]
>> >>> SSE2 is always disabled after this patch
>> >>> i dont think it was always disabled before, because the code likely was
>> >>> tested when it was written
>> >>
>> >> Apparently not and that was the question I was waiting for an answer...
>> >
>> > it did not work (in _mplayer_) when it was commited?
>>
>> From a quick look, mplayer has HAVE_SSE2, ffmpeg has HAVE_SSE, so
>> in mplayer interleaveBytes gets SSE2 instead of mmx no matter what, on
>> ffmpeg the opposite...
>
> hmm so it was working in mplayer (without runtime detect) but not ffmpeg ...
> that should be fixed

No cpu acceleration has worked for rgb2rgb in FFmpeg for quite some
time. Since rgb2rgb uses global function pointers, and
sws_getContext() initializes them under "if (!rgb15to16)
sws_rgb2rgb_init(flags);", they only get initialized once, which in
ffmpeg.c is in:
sws_opts = sws_getContext(16,16,0, 16,16,0, sws_flags, NULL,NULL,NULL);
where sws_flags is SWS_BICUBIC and specifies no cpu caps.

Using global function pointers is obviously a bad idea. I'd suggest we
drop exposing rgb2rgb but I faintly remember Michael being against
that because some programs already use that interface and we'd have to
bump major if we want to remove it, and also because it's useful to
have a simpler alternative to get rgb2rgb functions than going through
the swscale interface. Also I'd like to point out that on shared
builds these global function pointers no longer work since only
swscale_*, sws_*, and ff_* stuff is exported.

If we must keep the rgb2rgb interface (not the global pointers, these
should definitely go away at next major bump), I suggest we either:
1- create a simpler function like sws_getRGBContext that only takes
arguments relative to rgb2rgb (like src and dst pixel format), and
that can be used with sws_scale()
2- create an rgb2rgb context which the user can initialize and then
use rgb->rgb24tobgr32() and such.

Michael, how do you suggest we proceed regarding the rgb2rgb interface?

And another point, even if those optimizations did work, sse2
currently code only occurs in interleaveBytes, and on that function
both mmx2 and sse2 are *slower* than the mmx code. I asked Jason why
that could happen and he said it's because of movntq, and I should be
careful when benchmarking. I tried many kinds of tests:
- converting the same input to the same output many times and not
using the output after that
- converting the same input to the same output many times and running
crc over output
- converting different input to same output many times and running crc
over output
- converting different input to different output many times and not
using the output after that
- converting different input to different output many times and
running crc over output
none of those made mmx2 or sse2 faster than mmx. I also tried with
many widths/heights. What scenario should I test to see a benefit in
using sse2 here? (btw this was all done using a core2duo)

Going back to my original patch (0003), it did not change
functionality (since sse2 didn't work on ffmpeg anyways). Is it ok to
apply it before working on the other issues?