[FFmpeg-devel] [PATCH 2/3] x86/float_dsp: unroll loop in vector_fmac_scalar
Christophe Gisquet
christophe.gisquet at gmail.com
Wed Apr 16 18:35:56 CEST 2014
Le 16 avr. 2014 18:12, "James Almer" <jamrial at gmail.com> a écrit :
> Athlon 64 7750+ mingw-w64. Went from 274 cycles to 257 when i benched with
> the dts-es sample i uploaded for the fate test.
OK.
> Also, does aac even use vector_fmac_scalar? A grep on libavcodec shows
> results only in dcadec.c.
I must have mistaken in which batch I modified what code. So what I am
remembering must have been for something else, then.
> The difference in the resulting code is in the order of instructions
thanks
> to the unrolling of the loop. The mulps now have enough room to finish
before
> the addps are executed, and so do the addps before the mova to memory.
I would have expected this to be handled by out of order execution. But I
guess the mulps have too long a latency to not cause a dependency. I can't
help benchmark this atm but there should be no harm to your changes then.
OK from my side then.
Best regards,
Christophe
More information about the ffmpeg-devel
mailing list