[FFmpeg-devel] [PATCH] avcodec/proresenc_anatoliy: change quantization scaling to floating point to utilize vectorization
David Murmann
david.murmann at btf.de
Tue Feb 27 23:22:09 EET 2018
On 2/27/2018 9:58 PM, Hendrik Leppkes wrote:
> On Tue, Feb 27, 2018 at 9:35 PM, David Murmann <david.murmann at btf.de>
wrote:
>> Quantization scaling seems to be a slight bottleneck,
>> this change allows the compiler to more easily vectorize
>> the loop. This improves total encoding performance in my
>> tests by about 10-20%.
>>
>> Signed-off-by: David Murmann <david at btf.de>
>> ---
>> libavcodec/proresenc_anatoliy.c | 12 ++++++++----
>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>
[...]
>> + for (j = 0; j < blocks_per_slice; j++) {
>> + for (i = 0; i < 64; i++) {
>> + block[i] = (float)in[(j << 6) + i] / (float)qmat[i];
>> + }
>> +
>> + for (i = 1; i < 64; i++) {
>> + int val = block[progressive_scan[i]];
>> if (val) {
>> encode_codeword(pb, run, run_to_cb[FFMIN(prev_run,
15)]);
>
> Usually, using float is best avoided. Did you test re-factoring the
> loop structure without changing it to float?
Yes, the vector instructions don't have integer division, AFAIK, and the
compiler just generates a loop with idivs. This is quite a bit slower
than converting to float, dividing and converting back, if the compiler
uses vector instructions. In the general case this wouldn't be exact,
but since the input values are int16 they should losslessly fit into
float32. On platforms where this auto-vectorization fails this might
actually be quite a bit slower, but I have not seen that in my tests
(though I have only tested on x86_64).
--
David Murmann
david at btf.de
Telefon +49 (0) 221 82008710
Fax +49 (0) 221 82008799
http://btf.de/
--
btf GmbH | Leyendeckerstr. 27, 50825 Köln | +49 (0) 221 82 00 87 10
Geschäftsführer: Philipp Käßbohrer & Matthias Murmann | HR Köln | HRB 74707
More information about the ffmpeg-devel
mailing list