[FFmpeg-devel] [aarch64] yuv2planeX - unroll outer loop by 4 to increase performance by 6.3%

Thu Sep 3 18:51:58 EEST 2020

On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> faster is better obviously, so if its tested with odd sizes and arm
> developers had a chance to comment. it should be ok

Hi, I'm looking for feedback from ARM maintainers on the attached patch.
Ok to commit the patch?

Thanks,
Sebastian

On Wed, Aug 19, 2020 at 1:37 PM Sebastian Pop <sebpop at gmail.com> wrote:

> Thanks Michael for your feedback.
>
> On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer <michael at niedermayer.cc>
> wrote:
>
>> faster is better obviously, so if its tested with odd sizes and arm
>> developers had a chance to comment. it should be ok
>>
>>
> The current patch was tested with `make check` on Arm64 Graviton2.
> I also have tested randomly selected rescale factors, for example:
> ./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
> bench=start,scale=1023x42,bench=stop -f null -
>
>
>> one potential improvment is to use the unrolled code for odd width
>> too and use the non unrolled for the end
>>
>
> Done.  Please see the amended patch.
>
> Thanks,
> Sebastian
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-yuv2planeX-unroll-outer-loop-by-4-increases-.patch
Type: application/octet-stream
Size: 7488 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20200903/1e703b72/attachment.obj>