[FFmpeg-devel] [PATCH] Optimization of original IFF codec
Sebastian Vater
cdgs.basty
Mon Apr 26 17:50:37 CEST 2010
Hi Micha, I did a new patch which also optimizes 24bpp decoding.
Again, I did some benchmarks ;)
Michael Niedermayer a ?crit :
> On Mon, Apr 26, 2010 at 12:19:28AM +0200, Sebastian Vater wrote:
>
>> Hi Michael!
>>
>> Michael Niedermayer a ?crit :
>>
>>> On Sun, Apr 25, 2010 at 01:49:54PM +0200, Sebastian Vater wrote:
>>>
>>>
>>>>> that loop then can be unrolled by a factor of 4 and its inside for the
>>>>> uint8_t type case be implemented like:
>>>>> v= lut[get_bits(&gb, 4)];
>>>>> AV_WN32A(dst+b, AV_RN32A(dst+b) | v);
>>>>>
>>>>>
>>>>>
>>>> The thing is that type can be both uint8_t and uint32_t. It's a #define
>>>> macro which gets the type (uint8_t or uint32_t) passed by.
>>>>
>>>> So not fixed yet because I'm unsure here, if those two lines can be done
>>>> with dst being uint32_t also.
>>>>
>>>>
>>> they can, and it will speed the uint8 case up significantly
>>>
>>>
>> When I understand you right, I have to create a lookup table the
>> following way:
>> For each of the 4-pair read bits:
>> {0000 = 0, 0001 = 1 << plane, 0010 = 0x100 << plane, 0011 = (1 << plane)
>> | (0x100 << plane), 0100 = (0x10000 << plane), ...}
>>
>> Is that correct?
>>
Benchmarking original code (with my latest patch without lut):
basty at cdgs-basty:~/src/ffmpeg/build$ ./ffplay ../patches/Ooze.iff
FFplay version git-36b1b3c, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 26 2010 00:00:19 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b32790]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/Ooze.iff':
Duration: N/A, bitrate: N/A
Stream #0.0: Video: iff_byterun1, rgba, 666x536, PAR 1:1 DAR
333:268, 90k tbr, 90k tbn, 90k tbc
55480 dezicycles in decodeplane32, 1 runs, 0 skips
54105 dezicycles in decodeplane32, 2 runs, 0 skips
53517 dezicycles in decodeplane32, 4 runs, 0 skips
53095 dezicycles in decodeplane32, 8 runs, 0 skips
52895 dezicycles in decodeplane32, 16 runs, 0 skips
52772 dezicycles in decodeplane32, 32 runs, 0 skips
52663 dezicycles in decodeplane32, 64 runs, 0 skips
52584 dezicycles in decodeplane32, 128 runs, 0 skips
52938 dezicycles in decodeplane32, 256 runs, 0 skips
52717 dezicycles in decodeplane32, 512 runs, 0 skips
52682 dezicycles in decodeplane32, 1023 runs, 1 skips
52675 dezicycles in decodeplane32, 2045 runs, 3 skips sq= 0B f=0/0
52710 dezicycles in decodeplane32, 4088 runs, 8 skips
52810 dezicycles in decodeplane32, 8165 runs, 27 skipssq= 0B f=0/0
0.39 A-V: 0.000 s:0.0 aq= 0KB vq= 0KB sq= 0B f=0/0 0/0
Benchmarking with your idea about lut table with my new implementation of this patch without inline statement:
FFplay version git-36b1b3c, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 26 2010 00:00:19 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b32790]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/Ooze.iff':
Duration: N/A, bitrate: N/A
Stream #0.0: Video: iff_byterun1, rgba, 666x536, PAR 1:1 DAR 333:268, 90k tbr, 90k tbn, 90k tbc
41950 dezicycles in decodeplane32, 1 runs, 0 skips
31175 dezicycles in decodeplane32, 2 runs, 0 skips
51882 dezicycles in decodeplane32, 4 runs, 0 skips
35820 dezicycles in decodeplane32, 8 runs, 0 skips
27796 dezicycles in decodeplane32, 16 runs, 0 skips
23752 dezicycles in decodeplane32, 32 runs, 0 skips
21754 dezicycles in decodeplane32, 64 runs, 0 skips
20713 dezicycles in decodeplane32, 128 runs, 0 skips
20193 dezicycles in decodeplane32, 256 runs, 0 skips
19934 dezicycles in decodeplane32, 512 runs, 0 skips
19814 dezicycles in decodeplane32, 1023 runs, 1 skips
19756 dezicycles in decodeplane32, 2047 runs, 1 skips
19752 dezicycles in decodeplane32, 4092 runs, 4 skips
19724 dezicycles in decodeplane32, 8184 runs, 8 skips sq= 0B f=0/0
2.35 A-V: 0.000 s:0.0 aq= 0KB vq= 0KB sq= 0B f=0/0 0/0
Benchmarking with your idea about lut table with my new implementation of this patch with inline statement:
basty at cdgs-basty:~/src/ffmpeg/build$ ./ffplay ../patches/Ooze.iff
FFplay version git-36b1b3c, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 26 2010 00:00:19 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b32790]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/Ooze.iff':
Duration: N/A, bitrate: N/A
Stream #0.0: Video: iff_byterun1, rgba, 666x536, PAR 1:1 DAR 333:268, 90k tbr, 90k tbn, 90k tbc
48350 dezicycles in decodeplane32, 1 runs, 0 skips
38375 dezicycles in decodeplane32, 2 runs, 0 skips
32605 dezicycles in decodeplane32, 4 runs, 0 skips
29513 dezicycles in decodeplane32, 8 runs, 0 skips
27933 dezicycles in decodeplane32, 16 runs, 0 skips
30328 dezicycles in decodeplane32, 32 runs, 0 skips
29113 dezicycles in decodeplane32, 64 runs, 0 skips
27651 dezicycles in decodeplane32, 128 runs, 0 skips
26902 dezicycles in decodeplane32, 256 runs, 0 skips
26522 dezicycles in decodeplane32, 512 runs, 0 skips
26341 dezicycles in decodeplane32, 1023 runs, 1 skips
26305 dezicycles in decodeplane32, 2046 runs, 2 skips
26296 dezicycles in decodeplane32, 4092 runs, 4 skips sq= 0B f=0/0
26239 dezicycles in decodeplane32, 8182 runs, 10 skips
1.30 A-V: 0.000 s:0.0 aq= 0KB vq= 0KB sq= 0B f=0/0 0/0
So, as opposed to decodeplane8 where adding the inline statement makes it much faster, with decodeplane32 we have just the opposite...
If you're looking at this patch, you'll notice at I commented out two AV_WN64A lines...I did this because they made everything much slower.
But this might not be the case with a 64-bit CPU, since I haven't one, could someone check with this?
--
Best regards,
:-) Basty/CDGS (-:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iff-optimize-lut.patch
Type: text/x-patch
Size: 6215 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100426/aa130de4/attachment.bin>
More information about the ffmpeg-devel
mailing list