[FFmpeg-devel] [PATCH 1/4] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_v

Martin Storsjö martin at martin.st
Sun Sep 17 00:46:07 EEST 2023


On Thu, 14 Sep 2023, Logan.Lyu wrote:

> Hi Martin,
>
> You can try the attached patchset. If that doesn't work, My code branch 
> address is https://github.com/myais2023/FFmpeg/tree/hevc-aarch64

Thanks for the patches. Functionally, they seem to work, and the issues i 
saw in the code are relatively minor. Unfortunately, some of the issues 
are issues that we've been through in many earlier patches, so I would 
hope that you would pay attention to them in the future before posting 
more patches.


In patch 1, you've got a bunch of sxtw instructions for src/dst stride 
parameters that have the type ptrdiff_t - that shouldn't be necessary?

In patch 2, you're moving the macros calc_epelh, calc_epelh2, 
load_epel_filterh - can you split out the move into a separate commit? 
(This isn't strictly necessary but would make things even clearer.)

In patch 2, you're storing below the stack, then decrementing it 
afterwards - e.g. like this:

> +        stp             x0, x30, [sp, #-16]
> +        stp             x1, x2, [sp, #-32]
> +        stp             x3, x4, [sp, #-48]
> +        stp             x5, x6, [sp, #-64]!

Please change that so that you're first predecrementing the whole area, 
then storing the other elements above that stack pointer, e.g. like this:

stp x0, x30, [sp, #-64]!
stp x1, x2, [sp, #16]
stp x3, x4, [sp, #32]

etc.

The same issue also appears in variouos places within functions like this:

> +        stp             x0, x1, [sp, #-16]
> +        stp             x4, x6, [sp, #-32]
> +        stp             xzr, x30, [sp, #-48]!

Please fix all of these cases - you can search through your patches for 
anything related to storing on the stack. Also, storing xzr here seems 
superfluous - if you've got an odd number of registers to store, just make 
one instruction str instead of stp (but keep the stack aligned).

Then in patch 4, you've got yet another pattern for doing these stores, 
where you have superfluous consecutive stack decrements like this:

> +        stp             x6, x30, [sp, #-16]!
> +        mov             x7, #16
> +        stp             x0, x1, [sp, #-16]!
> +        stp             x2, x3, [sp, #-16]!
> +        stp             x4, x5, [sp, #-16]!

Please just do one stack decrement covering all the stack space you need.

I believe these issues have been raised in earlier reviews as well.

// Martin



More information about the ffmpeg-devel mailing list