[FFmpeg-devel] [aarch64] yuv2planeX - unroll outer loop by 4 to increase performance by 6.3%

Sebastian Pop sebpop at gmail.com
Wed Aug 19 21:37:00 EEST 2020


Thanks Michael for your feedback.

On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> faster is better obviously, so if its tested with odd sizes and arm
> developers had a chance to comment. it should be ok
>
>
The current patch was tested with `make check` on Arm64 Graviton2.
I also have tested randomly selected rescale factors, for example:
./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
bench=start,scale=1023x42,bench=stop -f null -


> one potential improvment is to use the unrolled code for odd width
> too and use the non unrolled for the end
>

Done.  Please see the amended patch.

Thanks,
Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-yuv2planeX-unroll-outer-loop-by-4-increases-.patch
Type: application/octet-stream
Size: 7488 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20200819/73e0cd3a/attachment.obj>


More information about the ffmpeg-devel mailing list