[FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass
Christophe Gisquet
christophe.gisquet at gmail.com
Tue Oct 13 09:01:44 CEST 2015
Hi,
2015-10-13 2:26 GMT+02:00 Michael Niedermayer <michael at niedermayer.cc>:
> On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote:
>> When the input of a pass has 15 or 16 bits of precision (in particular
>> the column pass), the addition of a bias to W4 may lead to overflows
>> in the input to pmaddwd.
>>
>> This requires postponing the adding of the bias to after the first
>> butterfly. To do so, the fact that m15, unused although zeroed, is
>> exploited. In case the pass is safe, an address can be directly used,
>> and the number of xmm regs can be decreased. Otherwise, the 32bits bias
>> is loaded into it.
>> ---
>> libavcodec/x86/proresdsp.asm | 8 ++++----
>> libavcodec/x86/simple_idct10_template.asm | 13 ++++++++++++-
>> 2 files changed, 16 insertions(+), 5 deletions(-)
>
> how can i reproduce these overflows ?
Generate the vsynth3-dnxhd-1080i-10bit.mov added after another patch.
Decode it first using faani (you could miss the error).
Now, for the parameters that fail. You know how
(1<<(%pass_bitdepth-1))/W4 is added to the first butterfly. The macro
allows to pass the right pw_ to it (essentially times 4 dw
1<<(%pass_bitdepth-1-14)), or "" and expects to find a
pd_round_%pass_bitdepth (essentially times 4 dd
1<<(%pass_bitdepth-1)). This is indicated in the comments of the
template: "Adding 1<<(%2-1) for >=15 bits values".
Contrast:
"", 13, pw_8, 18, 0, pw_1023 => stddev: 1.33 PSNR: 45.61 MAXDIFF: 255
"", 12, pw_16, 19, 0, pw_1023 => stddev: 0.33 PSNR: 57.61 MAXDIFF: 255
to the result of the current parameters (no difference)
The same input doesn't cause issue to prores, for some reason,
probably because the mean DC (through times 4 dw 0x2008) is added at
the last pass.
--
Christophe
More information about the ffmpeg-devel
mailing list