[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.
Dan Parrot
dan.parrot at mail.com
Tue Jul 5 05:29:46 EEST 2016
On Mon, 2016-07-04 at 16:30 +0000, Carl Eugen Hoyos wrote:
> Dan Parrot <dan.parrot <at> mail.com> writes:
>
> > > Did you test if using ffmpeg -benchmark -f rawvideo -i /dev/zero...
> > > showed different results?
> > > I believe this should be both easier and faster to test.
> >
> > Sorry, I don't understand what that command line just above
> > is trying to achieve. Could you elaborate?
>
> Instead of running the whole fate suite that takes long and
> does not test libswscale for most commands, just test an
> ffmpeg command line that only tests libswscale:
> $ ffmpeg -benchmark -f rawvideo -pix_fmt rgb24
> -i /dev/zero -pix_fmt yuv420p -f null -vframes 10000 -
$ ./ffmpeg -benchmark -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero
-pix_fmt yuv420p -f null -vframes 1000 -
frame= 1000 fps= 16 q=-0.0 Lsize=N/A time=00:00:40.00 bitrate=N/A
speed=0.632x
video:477kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: unknown
bench: utime=62.794s
bench: maxrss=21184kB
> vs
>
> $ ffmpeg -cpuflags 0 -benchmark -f rawvideo -pix_fmt rgb24
> -i /dev/zero -pix_fmt yuv420p -f null -vframes 10000 -
$ ./ffmpeg -cpuflags 0 -benchmark -f rawvideo -pix_fmt rgb24 -s hd1080
-i /dev/zero -pix_fmt yuv420p -f null -vframes 1000 -
frame= 1000 fps= 12 q=-0.0 Lsize=N/A time=00:00:40.00 bitrate=N/A
speed=0.479x
video:477kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: unknown
bench: utime=82.918s
bench: maxrss=21120kB
> [...]
>
> > Surprisingly, gcc is producing some badly suboptimal assembly.
>
> Just to make sure I don't misunderstand:
> Does this mean intrinsics are suboptimal to write assembly
> code?
So, the latest version of GCC does produce more efficient assembly.
To recap: GCC 5.3.1 produces assembly that does not take full advantage
of PPC64 POWER8 SIMD instructions. GCC 6.1.1 is much better and produces
shorter sequences that do use SIMD assembly instructions.
> > > Can you confirm with START_TIMER / STOP_TIMER that there is no
> > > gain?
> >
> > SystemTap probes provide identical functionality by measuring
> > deltas between function entry and function return.
>
> Sorry, I don't understand:
> Did you test with both methods to verify that they provide
> the same results?
> Note that if it turns out that START_TIMER / STOP_TIMER
> cannot be used on ppc64 (le) this would be important
> information for us.
These start/stop macros are the last issue I have outstanding. I hope to
be done in a few hours.
More information about the ffmpeg-devel
mailing list