[FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls
Ganesh Ajjanagadde
gajjanag at mit.edu
Thu Jan 14 17:26:50 CET 2016
On Thu, Jan 14, 2016 at 11:16 AM, James Almer <jamrial at gmail.com> wrote:
> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote:
>> On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner <henrik at gramner.com> wrote:
>>> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that
>>> gets assembled as FMA3) since normal FMA3 opcodes are horrible to
>>> read, nobody ever remembers the ordering of operands.
>>
>> 1. It is very easy to remember: take fmadd231pd x, y, z for instance.
>> This means 2*3 + 1, so x = y*z+x. How the macro is more readable is
>> beyond me; especially with some side cases that are undocumented, see
>> below.
>
> fmaddps dst, src1, src2, src3 is always going to be easier to read for anyone
> without having to think about what number belongs to what operation and what
> operand. And it will output either FMA4 or FMA3 depending on the value passed
> to INIT_[XY]MM.
The fma3/fma4 thing is the only benefit. Even that is generally not a
big deal; AMD quickly started supporting fma3.
>
>> 2. If anything, the macro is harder, since it is not Intel supported,
>
> Of course it wont be there, it's not defined by them. Non-destructive four
> operand fma is defined by AMD.
Of course I know this.
>
>> I can't look it up at
>> https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf.
>
> Neither are any of the dozens other compat macros in x86utils. And many of
> them are also undocumented within x86utils. This point is absurd.
How is it absurd? You expect me to use something that lacks clear
documentation, and claim that it is "more readable". What other macros
have/lack is irrelevant to the point.
>
>> 3. The macro does not seem to take care of the mov's (if any), still
>> requiring explicit thought on the part of the programmer.
>
> Yes, and? It's not an emulation macro like the uppercase ones that become
> several instructions. It translate a single FMA4-like instruction into
> either an FMA4 or FMA3 one.
>
> fmaddps xmm0, xmm0, xmm1, xmm2
>
> becomes
>
> vfmaddps xmm0, xmm0, xmm1, xmm2 if FMA4
> vfmadd132ps xmm0, xmm2, xmm1 if FMA3
>
> If you try to use it with four different operands, it will work with FMA4
> but not FMA3, since as i said it's not trying to emulate anything.
Thanks for mentioning the convention; but this is an important one and
AFAIK not mentioned in any documentation within FFmpeg.
>
>> 4. The macro lacks documentation. In particular, it is not a thorough
>> fma4 emulation in the spirit of
>> https://gist.github.com/rygorous/22180ced9c7a00bd68dd.
>>
>> Or put in other words, IMO not good.
>
> No, it's good and what's done in every other asm file precisely for being
> more flexible and readable.
Flexibility, yes, readability still no.
> Especially since it allows one to write both
> FMA4 and FMA3 functions without duplicating code.
Fine.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list