[FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

Tue Oct 13 09:01:44 CEST 2015

Hi,

2015-10-13 2:26 GMT+02:00 Michael Niedermayer <michael at niedermayer.cc>:
> On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote:
>> When the input of a pass has 15 or 16 bits of precision (in particular
>> the column pass), the addition of a bias to W4 may lead to overflows
>> in the input to pmaddwd.
>>
>> This requires postponing the adding of the bias to after the first
>> butterfly. To do so, the fact that m15, unused although zeroed, is
>> exploited. In case the pass is safe, an address can be directly used,
>> and the number of xmm regs can be decreased. Otherwise, the 32bits bias
>> is loaded into it.
>> ---
>>  libavcodec/x86/proresdsp.asm              |  8 ++++----
>>  libavcodec/x86/simple_idct10_template.asm | 13 ++++++++++++-
>>  2 files changed, 16 insertions(+), 5 deletions(-)
>
> how can i reproduce these overflows ?

Generate the vsynth3-dnxhd-1080i-10bit.mov added after another patch.

Decode it first using faani (you could miss the error).

Now, for the parameters that fail. You know how
(1<<(%pass_bitdepth-1))/W4 is added to the first butterfly. The macro
allows to pass the right pw_ to it (essentially times 4 dw
1<<(%pass_bitdepth-1-14)), or "" and expects to find a
pd_round_%pass_bitdepth (essentially times 4 dd
1<<(%pass_bitdepth-1)). This is indicated in the comments of the
template: "Adding 1<<(%2-1) for >=15 bits values".

Contrast:
"", 13, pw_8, 18, 0, pw_1023 => stddev:    1.33 PSNR: 45.61 MAXDIFF:  255
"", 12, pw_16, 19, 0, pw_1023 => stddev:    0.33 PSNR: 57.61 MAXDIFF:  255
to the result of the current parameters (no difference)

The same input doesn't cause issue to prores, for some reason,
probably because the mean DC (through times 4 dw 0x2008) is added at
the last pass.

-- 
Christophe