[FFmpeg-devel] [aarch64] yuv2planeX - unroll outer loop by 4 to increase performance by 6.3%
Sebastian Pop
sebpop at gmail.com
Wed Aug 19 21:37:00 EEST 2020
Thanks Michael for your feedback.
On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer <michael at niedermayer.cc>
wrote:
> faster is better obviously, so if its tested with odd sizes and arm
> developers had a chance to comment. it should be ok
>
>
The current patch was tested with `make check` on Arm64 Graviton2.
I also have tested randomly selected rescale factors, for example:
./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
bench=start,scale=1023x42,bench=stop -f null -
> one potential improvment is to use the unrolled code for odd width
> too and use the non unrolled for the end
>
Done. Please see the amended patch.
Thanks,
Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-yuv2planeX-unroll-outer-loop-by-4-increases-.patch
Type: application/octet-stream
Size: 7488 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20200819/73e0cd3a/attachment.obj>
More information about the ffmpeg-devel
mailing list