[FFmpeg-devel] [PATCH 2/2] tta/x86: add ff_ttafilter_process_dec_{ssse3, sse4}

Paul B Mahol onemda at gmail.com
Tue Feb 11 12:35:43 CET 2014


On 2/11/14, Christophe Gisquet <christophe.gisquet at gmail.com> wrote:
> Hi,
>
> 2014-02-11 6:02 GMT+01:00 James Almer <jamrial at gmail.com>:
>>> What did however affect speed negatively was calling the asm functions
>>> using
>>> all seven elements from TTAFilter as arguments as i mentioned I'd do in
>>> my
>>> previous email. I lost about 10 cycles on Win64 and 38 on Win32 just by
>>> doing
>>> that.
>>> I assume this is because of the prologue code in x86inc.
>>>
>>> I'll send an updated patch soon. If you find any dependencies please tell
>>> so.
>>>
>>
>> New patchset sent. Kinda bummed at the loss of performance for using
>> seven
>> general purpose registers for the arguments, but if it's safer then it
>> can't
>> be helped.
>
> Well, I don't feel confident, but it makes sense it works. I don't
> know what opinion other people have, nor a way to mitigate a potential
> issue. I fear leaving a comment along the declaration of the TTA*
> struct about the need for a total size multiple of 16, and making sure
> the tables addresses in TTAFilter remain aligned might help, but not
> failproof at all.

What about all those ALIGNED macros that are put into c code of various structs?

>
> Regarding the register dependency, it's minor compared to what I
> missed, but it may depend on how good the CPU you test on is good at
> out of order execution, so it's always good to keep that in mind.
>
> Thanks for the reordering, it's much cleaner now. Also you can use
> SWAP to "virtually" rename registers so as to keep consistency between
> code path.
>
> --
> Christophe
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>


More information about the ffmpeg-devel mailing list