[FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
James Darnley
jdarnley at obe.tv
Fri Dec 2 01:49:23 EET 2016
On 2016-12-01 23:16, Michael Niedermayer wrote:
> On Thu, Dec 01, 2016 at 05:57:44PM +0100, James Darnley wrote:
>> Yorkfield:
>> - mmx2: 2.44x faster (278 vs. 114 cycles)
>> - sse2: 3.35x faster (278 vs. 83 cycles)
>>
>> Skylake:
>> - mmx2: 1.69x faster (169 vs. 100 cycles)
>> - sse2: 2.34x faster (169 vs. 72 cycles)
>> - avx: 2.32x faster (169 vs. 73 cycles)
>> ---
>> libavcodec/x86/h264_deblock_10bit.asm | 118 ++++++++++++++++++++++++++++++++++
>> libavcodec/x86/h264dsp_init.c | 9 +++
>> 2 files changed, 127 insertions(+)
>
> breaks build on linux x86-32
>
> YASM libavcodec/x86/h264_deblock_10bit.o
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register in 64-bit mode
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: undefined symbol `bpl' (first use)
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: (Each undefined symbol is reported only once.)
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register in 64-bit mode
Ah. I shouldn't do clever things like trying to use the byte-sized
registers. It isn't needed and causes problems like this. Changed
locally. Also changed in the 4:2:0 chroma intra patch.
More information about the ffmpeg-devel
mailing list