[Ffmpeg-devel] Snow slicing support
Guillaume Poirier
docmaintainerwannabe
Tue Apr 11 09:57:04 CEST 2006
hi,
Oded Shimon wrote:
> On Thu, Apr 06, 2006 at 05:19:57PM +0200, Michael Niedermayer wrote:
>>On Mon, Apr 03, 2006 at 09:47:58PM +0300, Oded Shimon wrote:
>>>Just thought this patch might be of general interest to anyone. What I
>>>find interesting about it, is that it's not the sliced output that helps
>>>at all, but the rearranging of how the data is handled, of unpacking
>>>coeffs seperately from decoding image. It is actually a surprisngly huge
>>>difference on my cpu, almost 20% faster in some cases. This code trades
>>>off code switches against data switches, and even in my high res video
>>>(944x544), code switches prooved to be far more expensive...
>>>
>>>I don't really expect this patch to go in CVS, but I am interested in any
>>>comments if anyone has any...
>>
>>this needs testing with different resolutions, bitrates and cpus
>>(320x240 720x576 p4 athlon ...)
>>
>>is this speed difference also there with other gcc versions
>>and most interresting is it there too at lower -O
>>
>>if its consistently faster (or at least not slower) then this should be
>>applied
>
>
> Do you have any suggestions with how to test this efficiently? cache
> performance is hard to benchmark, especially in high level code. :/
>
> using mplayer -benchmark several times gave me wild results:
>
> without patch:
> BENCHMARKs: VC: 108.872s VO: 17.123s A: 1.205s Sys: 32.865s = 160.065s
> BENCHMARKs: VC: 102.149s VO: 15.351s A: 1.198s Sys: 33.220s = 151.918s
> BENCHMARKs: VC: 99.299s VO: 15.920s A: 1.517s Sys: 34.233s = 150.970s
> BENCHMARKs: VC: 101.674s VO: 16.263s A: 1.284s Sys: 32.215s = 151.436s
>
> with patch:
> BENCHMARKs: VC: 97.398s VO: 15.675s A: 1.299s Sys: 36.363s = 150.734s
> BENCHMARKs: VC: 95.429s VO: 15.321s A: 1.174s Sys: 38.613s = 150.536s
> BENCHMARKs: VC: 96.610s VO: 15.528s A: 1.181s Sys: 37.275s = 150.594s
> BENCHMARKs: VC: 95.816s VO: 15.297s A: 1.197s Sys: 38.248s = 150.558s
>
> (these are old benchmarks, and on that single file)
>
> In this case the difference was still obvious, but the results are very
> inaccurate. is there a better way for this? maybe START_TIMER around the
> whole decode() function?
That what I'd do. Since measuring the whole decode() function is likely
to be inaccurate due to interruptions, I suggest you run it a number of
times (let's say, 100 times) and take the mean value, and the shortest
value.
This should take care of the measurement fuzzinesss.
Guillaume
More information about the ffmpeg-devel
mailing list