[FFmpeg-devel] [PATCH] Fix non-rounding up to next 16-bit aligned bug in IFF decoder
Sebastian Vater
cdgs.basty
Thu Apr 29 15:36:07 CEST 2010
Sebastian Vater a ?crit :
> M?ns Rullg?rd a ?crit :
>
>> Sebastian Vater <cdgs.basty at googlemail.com> writes:
>>
>>
>>
>>> Just got the idea, we can get rid of the GetBitContext
>>> completely...Instead of reading 4 bits, we simply read a byte:
>>> const uint8_t lut_offsets = *buf++; // instead of get_bits(gb,4);
>>>
>>>
>> That's a separate thing.
>>
>>
>
> Separate in what way? What did you mean exactly?
>
>
>>> Then we do loop unrolling by 8 and do two accesses to lut one with >> 4
>>> and one with & 0x0F, or we get even rid of this and create a lut table
>>> with 256 entries using AV_WN64A / AV_RN64A ;-)
>>>
>>> The advance here is that on a 64 bit CPU we get another nice speed
>>> improvement ;-)
>>> If we avoid calculations with AV_RN64A etc.
>>>
>>>
>> Those macros don't do any calculations. All they do is some magic to
>> avoid type aliasing errors.
>>
>>
>
> Yes, I know, but I meant stuff like (lut0[...] << 32ULL) | lut1[...];
>
> But this isn't necessary if we use an 8-bit table storing uint64_t's...
>
>
>>
>>
>>> gcc just should use 2 registers on 32-bit CPU and that's it.
>>>
>>>
>> Should, but doesn't.
>>
>>
>
> With the way I meant above, it should...I'll test that now, but without
> a completed table and tell you what it does.
>
>
Damn, that's fucking amazing!!!!
Just did 2 benchmarks, one with old patch 32-bit mode and one with get
rid of GetBitContext and AV_RN64A, etc.
Please note that I'm in my Office, so these Benchmarks are not for
Athlon XP +2100, but for a Pentium 4 processor (which we have here).
basty at euler:~/src/ffmpeg/build$ ./ffplay ../patches/MRLake.iff
FFplay version git-5b9f10d, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 29 2010 15:19:11 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b2feb0]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/MRLake.iff':
Duration: N/A, bitrate: N/A
Stream #0.0: Video: iff_byterun1, pal8, 737x595, PAR 17:20 DAR
737:700, 90k tbr, 90k tbn, 90k tbc
25720 dezicycles in decodeplane8, 1 runs, 0
skipsbasty at euler:~/src/ffmpeg/build$ ./ffplay ../patches/MRLake.iff
FFplay version git-5b9f10d, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 29 2010 15:19:11 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b2feb0]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/MRLake.iff':
Duration: N/A, bitrate: N/A
Stream #0.0: Video: iff_byterun1, pal8, 737x595, PAR 17:20 DAR
737:700, 90k tbr, 90k tbn, 90k tbc
25720 dezicycles in decodeplane8, 1 runs, 0 skips
21520 dezicycles in decodeplane8, 2 runs, 0 skips
19260 dezicycles in decodeplane8, 4 runs, 0 skips
17675 dezicycles in decodeplane8, 8 runs, 0 skips
16652 dezicycles in decodeplane8, 16 runs, 0 skips
16006 dezicycles in decodeplane8, 32 runs, 0 skips
15623 dezicycles in decodeplane8, 64 runs, 0 skips
15503 dezicycles in decodeplane8, 128 runs, 0 skips
15573 dezicycles in decodeplane8, 256 runs, 0 skips
15440 dezicycles in decodeplane8, 512 runs, 0 skips
15496 dezicycles in decodeplane8, 1024 runs, 0 skips
15422 dezicycles in decodeplane8, 2047 runs, 1 skips
15395 dezicycles in decodeplane8, 4095 runs, 1 skips
2.44 A-V: 0.000 s:0.0 aq= 0KB vq= 0KB sq= 0B f=0/0 0/0
basty at euler:~/src/ffmpeg/build$ ./ffplay ../patches/MRLake.iff
FFplay version git-5b9f10d, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 29 2010 15:19:11 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b30eb0]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/MRLake.iff':
Duration: N/A, bitrate: N/A
Stream #0.0: Video: iff_byterun1, pal8, 737x595, PAR 17:20 DAR
737:700, 90k tbr, 90k tbn, 90k tbc
65280 dezicycles in decodeplane8, 1 runs, 0 skips
40420 dezicycles in decodeplane8, 2 runs, 0 skips
26430 dezicycles in decodeplane8, 4 runs, 0 skips
19030 dezicycles in decodeplane8, 8 runs, 0 skips
13897 dezicycles in decodeplane8, 16 runs, 0 skips
11361 dezicycles in decodeplane8, 32 runs, 0 skips
10090 dezicycles in decodeplane8, 64 runs, 0 skips
9600 dezicycles in decodeplane8, 128 runs, 0 skips
9390 dezicycles in decodeplane8, 256 runs, 0 skips
9114 dezicycles in decodeplane8, 512 runs, 0 skips
9063 dezicycles in decodeplane8, 1024 runs, 0 skips
9081 dezicycles in decodeplane8, 2048 runs, 0 skips
9176 dezicycles in decodeplane8, 4096 runs, 0 skips
2.72 A-V: 0.000 s:0.0 aq= 0KB vq= 0KB sq= 0B f=0/0 0/0
21520 dezicycles in decodeplane8, 2 runs, 0 skips
19260 dezicycles in decodeplane8, 4 runs, 0 skips
17675 dezicycles in decodeplane8, 8 runs, 0 skips
16652 dezicycles in decodeplane8, 16 runs, 0 skips
16006 dezicycles in decodeplane8, 32 runs, 0 skips
15623 dezicycles in decodeplane8, 64 runs, 0 skips
15503 dezicycles in decodeplane8, 128 runs, 0 skips
15573 dezicycles in decodeplane8, 256 runs, 0 skips
15440 dezicycles in decodeplane8, 512 runs, 0 skips
15496 dezicycles in decodeplane8, 1024 runs, 0 skips
15422 dezicycles in decodeplane8, 2047 runs, 1 skips
15395 dezicycles in decodeplane8, 4095 runs, 1 skips
2.44 A-V: 0.000 s:0.0 aq= 0KB vq= 0KB sq= 0B f=0/0 0/0
basty at euler:~/src/ffmpeg/build$ ./ffplay ../patches/MRLake.iff
FFplay version git-5b9f10d, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 29 2010 15:19:11 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b30eb0]Estimating duration from bitrate, this
mbasty at euler:~/src/ffmpeg/build$ ./ffplay ../patches/MRLake.iff
FFplay version git-5b9f10d, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 29 2010 15:19:11 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b2feb0]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/MRLake.iff':
Duration: N/A, bitrate: N/A
Stream #0.0: Video: iff_byterun1, pal8, 737x595, PAR 17:20 DAR
737:700, 90k tbr, 90k tbn, 90k tbc
25720 dezicycles in decodeplane8, 1 runs, 0 skips
21520 dezicycles in decodeplane8, 2 runs, 0 skips
19260 dezicycles in decodeplane8, 4 runs, 0 skips
17675 dezicycles in decodeplane8, 8 runs, 0 skips
16652 dezicycles in decodeplane8, 16 runs, 0 skips
16006 dezicycles in decodeplane8, 32 runs, 0 skips
15623 dezicycles in decodeplane8, 64 runs, 0 skips
15503 dezicycles in decodeplane8, 128 runs, 0 skips
15573 dezicycles in decodeplane8, 256 runs, 0 skips
15440 dezicycles in decodeplane8, 512 runs, 0 skips
15496 dezicycles in decodeplane8, 1024 runs, 0 skips
15422 dezicycles in decodeplane8, 2047 runs, 1 skips
15395 dezicycles in decodeplane8, 4095 runs, 1 skips
2.44 A-V: 0.000 s:0.0 aq= 0KB vq= 0KB sq= 0B f=0/0 0/0
Benchmark for 64-bit patch using the following code:
static void decodeplane8(uint8_t *dst,
const uint8_t *buf,
const unsigned buf_size,
const unsigned bps,
const unsigned plane)
{
START_TIMER;
const uint8_t *end = dst + (buf_size * 8);
const uint64_t *lut = plane8_lut[plane];
for(; dst < end; dst += 8) {
const uint64_t v = AV_RN64A(dst) | lut[*buf++];
AV_WN64A(dst, v);
}
STOP_TIMER("decodeplane8");
}
basty at euler:~/src/ffmpeg/build$ ./ffplay ../patches/MRLake.iff
FFplay version git-5b9f10d, Copyright (c) 2003-2010 the FFmpeg developers
built on Apr 29 2010 15:19:11 with gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
configuration:
libavutil 50.14. 0 / 50.14. 0
libavcodec 52.66. 0 / 52.66. 0
libavformat 52.61. 0 / 52.61. 0
libavdevice 52. 2. 0 / 52. 2. 0
libswscale 0.10. 0 / 0.10. 0
[IFF @ 0x8b30eb0]Estimating duration from bitrate, this may be inaccurate
Input #0, IFF, from '../patches/MRLake.iff':
Duration: N/A, bitrate: N/A
Stream #0.0: Video: iff_byterun1, pal8, 737x595, PAR 17:20 DAR
737:700, 90k tbr, 90k tbn, 90k tbc
65280 dezicycles in decodeplane8, 1 runs, 0 skips
40420 dezicycles in decodeplane8, 2 runs, 0 skips
26430 dezicycles in decodeplane8, 4 runs, 0 skips
19030 dezicycles in decodeplane8, 8 runs, 0 skips
13897 dezicycles in decodeplane8, 16 runs, 0 skips
11361 dezicycles in decodeplane8, 32 runs, 0 skips
10090 dezicycles in decodeplane8, 64 runs, 0 skips
9600 dezicycles in decodeplane8, 128 runs, 0 skips
9390 dezicycles in decodeplane8, 256 runs, 0 skips
9114 dezicycles in decodeplane8, 512 runs, 0 skips
9063 dezicycles in decodeplane8, 1024 runs, 0 skips
9081 dezicycles in decodeplane8, 2048 runs, 0 skips
9176 dezicycles in decodeplane8, 4096 runs, 0 skips
2.72 A-V: 0.000 s:0.0 aq= 0KB vq= 0KB sq= 0B f=0/0 0/0ay
be inaccurate
Disassembly of inlined decodeplane8 in 64 bit patch on x86_32:
532: 8b 54 24 40 mov 0x40(%esp),%edx
536: c7 44 24 4c 00 00 00 movl $0x0,0x4c(%esp)
53d: 00
53e: 8b 8a cc 00 00 00 mov 0xcc(%edx),%ecx
544: 0f 31 rdtsc
546: 89 54 24 60 mov %edx,0x60(%esp)
54a: 8b 5c 24 60 mov 0x60(%esp),%ebx
54e: 31 d2 xor %edx,%edx
550: c7 44 24 64 00 00 00 movl $0x0,0x64(%esp)
557: 00
558: 8b 74 24 64 mov 0x64(%esp),%esi
55c: 89 de mov %ebx,%esi
55e: bb 00 00 00 00 mov $0x0,%ebx
563: 89 5c 24 60 mov %ebx,0x60(%esp)
567: 01 44 24 60 add %eax,0x60(%esp)
56b: 8b 44 24 44 mov 0x44(%esp),%eax
56f: 89 74 24 64 mov %esi,0x64(%esp)
573: 11 54 24 64 adc %edx,0x64(%esp)
577: 2b 84 24 9c 00 00 00 sub 0x9c(%esp),%eax
57e: 39 c8 cmp %ecx,%eax
580: 76 02 jbe 584 <decode_frame_ilbm+0x2d4>
582: 89 c8 mov %ecx,%eax
584: 8b 74 24 50 mov 0x50(%esp),%esi
588: 8d 04 c6 lea (%esi,%eax,8),%eax
58b: 89 44 24 5c mov %eax,0x5c(%esp)
58f: 8b 44 24 4c mov 0x4c(%esp),%eax
593: c1 e0 07 shl $0x7,%eax
596: 05 00 00 00 00 add $0x0,%eax
59b: 89 44 24 58 mov %eax,0x58(%esp)
59f: 8b 44 24 5c mov 0x5c(%esp),%eax
5a3: 39 c6 cmp %eax,%esi
5a5: 73 30 jae 5d7 <decode_frame_ilbm+0x327>
5a7: 8b ac 24 9c 00 00 00 mov 0x9c(%esp),%ebp
5ae: 89 f7 mov %esi,%edi
5b0: 0f b6 75 00 movzbl 0x0(%ebp),%esi
5b4: 83 c5 01 add $0x1,%ebp
5b7: 8b 4c 24 58 mov 0x58(%esp),%ecx
5bb: 8b 5f 04 mov 0x4(%edi),%ebx
5be: 8b 07 mov (%edi),%eax
5c0: 8b 54 f1 04 mov 0x4(%ecx,%esi,8),%edx
5c4: 0b 04 f1 or (%ecx,%esi,8),%eax
5c7: 09 da or %ebx,%edx
5c9: 89 07 mov %eax,(%edi)
5cb: 89 57 04 mov %edx,0x4(%edi)
5ce: 83 c7 08 add $0x8,%edi
5d1: 39 7c 24 5c cmp %edi,0x5c(%esp)
5d5: 77 d9 ja 5b0 <decode_frame_ilbm+0x300>
5d7: 0f 31 rdtsc
5d9: 8b 1d 04 00 00 00 mov 0x4,%ebx
5df: 89 d7 mov %edx,%edi
5e1: 31 ed xor %ebp,%ebp
5e3: 89 fd mov %edi,%ebp
5e5: bf 00 00 00 00 mov $0x0,%edi
5ea: 31 d2 xor %edx,%edx
5ec: 01 c7 add %eax,%edi
5ee: 11 d5 adc %edx,%ebp
5f0: 89 5c 24 38 mov %ebx,0x38(%esp)
5f4: 83 eb 01 sub $0x1,%ebx
5f7: 0f 8e d9 00 00 00 jle 6d6 <decode_frame_ilbm+0x426>
5fd: 89 f8 mov %edi,%eax
5ff: 8b 1d 08 00 00 00 mov 0x8,%ebx
605: 89 ea mov %ebp,%edx
607: 2b 44 24 60 sub 0x60(%esp),%eax
60b: 8b 35 0c 00 00 00 mov 0xc,%esi
611: 1b 54 24 64 sbb 0x64(%esp),%edx
615: 89 44 24 68 mov %eax,0x68(%esp)
619: 8b 44 24 38 mov 0x38(%esp),%eax
61d: 89 54 24 6c mov %edx,0x6c(%esp)
621: 89 f1 mov %esi,%ecx
623: 89 da mov %ebx,%edx
625: 0f a4 d1 03 shld $0x3,%edx,%ecx
629: c1 e2 03 shl $0x3,%edx
62c: 89 14 24 mov %edx,(%esp)
62f: 89 c2 mov %eax,%edx
631: c1 fa 1f sar $0x1f,%edx
634: 89 4c 24 04 mov %ecx,0x4(%esp)
638: 89 44 24 08 mov %eax,0x8(%esp)
63c: 89 54 24 0c mov %edx,0xc(%esp)
--
Best regards,
:-) Basty/CDGS (-:
More information about the ffmpeg-devel
mailing list