[FFmpeg-devel] What new instructions would you like?

Lauri Kasanen cand at gmx.com
Sat Feb 1 19:44:30 EET 2020


On Sat, 1 Feb 2020 12:53:28 +0100
James Darnley <james.darnley at gmail.com> wrote:

> On 30/12/2019, Lauri Kasanen <cand at gmx.com> wrote:
> > For the Libre RISC-V project, I'm going to research the popular codecs
> > and design new instructions to help speed them up. With ffmpeg being
> > home to lots of asm folks for many platforms, I also want to ask your
> > opinion.
> >
> > What new instructions would you like? Anything particular you find
> > missing in existing ISAs, slow, or cumbersome?
>
> Do you mean SIMD instructions?  I have no idea what exists in RISC-V
> already or what capabilities or limitations it has, and I am going to
> use x86 language and terms such as byte, word, dword, qword.
>
> Things I have found missing in old(er) x86 instruction sets are
> missing word size and signed/unsigned variants for existing
> operations.  Some operations may have byte and word variants but dword
> and qword might be missing, or there might be a signed version but not
> an unsigned version (and vice versa).  A couple of things I had to
> emulate:
> * packed absolute value of dwords
> * packed maximum unsigned words
> * packed max and min signed dwords (I might have really wanted
> unsigned for this)
> * arithmetic right shift of qwords
> * pack dwords to words with unsigned saturation
>
> Shuffle instructions.  pshufb is very useful and I think I read on IRC
> that arm/aarch64/neon does not have an equivalent.  (Or was that other
> shuffles?)  It allows for arbitrary reordering of bytes and setting
> bytes to 0.  On x86 it takes the shuffle pattern from another SIMD
> register but I usually use it with a constant pattern that gets loaded
> from memory.  An interesting improvement would be if you can encode 17
> * 16 (or however long your vectors might be) values in an immediate
> value so it doesn't require another register.
>
> Good documentation.  The intel instruction manual has pretty good
> explanation of what the instructions do.  The old instructions from
> around the time of MMX and SSE had excellent diagrams, these might
> have been mostly for shuffle operations.  I need to look and jog my
> memory.  I think punpcklbw is an example of what I mean.  The entry in
> the manual for it has a good diagram IMO.  (At least the version I am
> currently looking at)
>
> No stupid lane stuff.  AVX2 brought us a SIMD vector length extension
> from 16 to 32 bytes.  Good except for the stupid lanes they were split
> into making it hard to "mix" data from the low 0-15 bytes and the high
> 16-31 bytes.
>
> I forgot about this email for a month.  Sorry about that.  Seeing
> RISC-V in the schedule at FOSDEM reminded me about this.

Thanks for your thoughts. The project scope is both SIMD and scalar, if
there's for example a particular bit packing that's slow and
unparallelizable, it might benefit from a dedicated instruction.

- Lauri


More information about the ffmpeg-devel mailing list