[FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct
Ronald S. Bultje
rsbultje at gmail.com
Tue Jun 6 15:48:54 EEST 2017
Hi,
On Mon, Jun 5, 2017 at 8:02 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> On Mon, Jun 5, 2017 at 7:23 AM, James Darnley <jdarnley at obe.tv> wrote:
>
>> I forgot to mention in my cover letter that although the dct test
>> passes, fate does not. As I mentioned on IRC, changing them causes
>> errors elsewhere in fate. I am currently looking into this problem and
>> I'm sure I will speak to you or others about it.
>
>
> I'll have a look at this.
>
This makes the output of dct-test exact:
diff --git a/libavcodec/x86/simple_idct10.asm
b/libavcodec/x86/simple_idct10.asm
index ae848b7..0dd1ae5 100644
--- a/libavcodec/x86/simple_idct10.asm
+++ b/libavcodec/x86/simple_idct10.asm
@@ -52,6 +52,9 @@ times 4 dw %2, %3
%define W6sh2 8867 ; W6 = 35468 = 8867<<2
%define W7sh2 4520 ; W7 = 18081 = 4520<<2 + 1
+pw_round_20_div_w4: times 8 dw ((1 << (20 - 1)) / W4sh2)
+
+
CONST_DEC w4_plus_w2, W4sh2, +W2sh2
CONST_DEC w4_min_w2, W4sh2, -W2sh2
CONST_DEC w4_plus_w6, W4sh2, +W6sh2
@@ -71,7 +74,7 @@ SECTION .text
%macro idct_fn 0
cglobal simple_idct8, 1, 1, 16, block
- IDCT_FN "", 11, "", 20
+ IDCT_FN "", 11, pw_round_20_div_w4, 20
RET
cglobal simple_idct10, 1, 1, 16, block
How the final patch should look (i.e. change coefficients only for mpeg
idct and not for prores idct to keep fate happy? Or change C code for
prores so coefficients are identical?) is up to you, I don't have a
preference. Michael might have an opinion on that.
Ronald
More information about the ffmpeg-devel
mailing list