[FFmpeg-devel] [PATCH 0/5] RISC-V: Improve H264 decoding performance using RVV intrinsic

Wed May 10 11:46:57 EEST 2023

Hi Lynne

I fully respect the policy and understand the disadvantages of intrinsic
code.
Considering the benefits of the open ISA like RISC-V,
the intrinsic code should still have a better chance of being optimized by
the compiler for hardware variants.

At this moment, the intrinsic implementation is the only thing available.
It would take a significant amount of time to rewrite it in assembly due to
the large amount of functions.

I was wondering if we could treat the intrinsic code as an initial version
for the RISC-V port with the following modification.
    - Add an option --enable-rvv-intrinsic to EXPLICITLY enable the
intrinsic optimization, which is disabled by default.
      Based on the given conditions, vector supports in GCC and intrinsics
dislike and limits. Disabling it by default seems a reasonable way.

For those who want to be involved in the optimization of H.264 decoder on
RISC-V can work on the assembly and decide whether to refer to intrinsic
code.
I believe this would be a good starting point for future optimization.

On Wed, May 10, 2023 at 12:51 AM Rémi Denis-Courmont <remi at remlab.net>
wrote:

>         Hi,
>
> Le tiistaina 9. toukokuuta 2023, 12.50.25 EEST Arnie Chang a écrit :
> > We are submitting a set of patches that significantly improve H.264
> decoding
> > performance by utilizing RVV intrinsic code.
>
> I believe that there is a general dislike of compiler intrinsic for vector
> optimisations in FFmpeg for a plurality of reasons. FWIW, that dislike is
> not
> limited to FFmpeg:
> https://www.reddit.com/r/RISCV/comments/131hlgq/comment/ji1ie3l/
> Indeed, in my personal opinion, RISC-V V intrinsics specifically are
> painful to
> read/write compared to assembler.
>
> On top of that, in this particular case, intrinsics have at least three,
> possibly four, additional and more objective challenges as compared to the
> existing RVV assembler:
>
> 1) They are less portable, requiring the most bleeding edge version of
> compilers. Case in point: our FATE GCC instance does not support them as
> of
> today (because Debian Unstable does not).
>
> 2) They do not work with run-time CPU detection, at least not currently.
> This
> is going to be a major stumbling point for Linux distributions which need
> to
> build code that runs on processors without vector unit.
>
> 3) V intrinsics require specifying the group multiplier at every
> instruction.
> In most cases, this is just very inconvenient. But in those algorithms
> that
> require a fixed vector size (e.g. Opus DSP already now), this simply does
> _not_
> work.
>
> Essentially, this is the downside of relying on the compiler to do the
> register allocation.
>
> 4) (Unsure) Intrinsics are notorious for missing some code points.
>
>
> The first two points may be addressed eventually. But the third point is
> intrinsic to intrinsics (hohoho). So unless there is a case for why
> intrinsics
> would be all but _required_, please avoid them.
>
> Now I do realise that that means some of the code won't be XLEN-indepent.
> Well, we can cross that bridge with macros if/when somebody actually cares
> about FFmpeg vector optimisations on RV32I.
>
> Br,
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
>
>
>
>