[FFmpeg-devel] [PATCH] Some ARM VFP optimizations (vector_fmul, vector_fmul_reverse, float_to_int16)
Siarhei Siamashka
siarhei.siamashka
Sun Apr 20 17:41:04 CEST 2008
Hello,
Here is a patch which adds some initial optimizations for ARM VFP (floating
point coprocessor available in some ARM11 cores).
Standard regression test from ffmpeg runs successfully (changing to
ALT_BITSTREAM_READER reader is needed to pass tests though, because
A32_BITSTREAM_READER does not work with flashsv decoder - that's not
ARM specific problem, but can be reproduced on x86 too).
Also my additional test program 'test-vfp.c' from
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libavcodec/tests/?root=mplayer
runs successfully and verifies performance, correctness and absence of any
incorrect memory accesses outside memory buffers.
Right now I'm more interested in getting ARM VFP support in FFmpeg build
infrastructure (configure script, etc.). More optimizations will follow
(vector_fmul_add_add, vorbis_inverse_coupling, imdct/fft, ...).
Also there are many data cache misses in vorbis decoding code. Reducing memory
use (if it is possible of course) may improve performance. Adding
prefetch instructions to ARM VFP optimized functions might also help, but PLD
instruction has no effect in OS2008 firmware (I'm currently researching this
particular problem and I think that I already know what has caused it).
Current benchmark results (Nokia N810, OS2008 firmware, ARM11 400MHz):
64-kbit ogg vorbis file ('ffmpeg -benchmark -i test64.ogg -f null /dev/null')
sample file decoding time before patch:
16.227 16.289 16.242 (average 16.253, stddev 0.032)
sample file decoding time after patch:
14.406 14.336 14.281 (average 14.341, stddev 0.063)
that's ~13.7% improvement overall
Report from oprofile before patch:
samples % image name symbol name
61766 18.9862 ffmpeg_g ff_imdct_calc
55035 16.9171 ffmpeg_g ff_fft_calc_c
33879 10.4140 ffmpeg_g vorbis_decode_frame
31120 9.5659 ffmpeg_g ff_vector_fmul_add_add_c
21592 6.6371 ffmpeg_g vorbis_inverse_coupling
20999 6.4549 ffmpeg_g vector_fmul_c
18154 5.5803 ffmpeg_g vector_fmul_reverse_c
17366 5.3381 ffmpeg_g pcm_encode_frame
13632 4.1903 ffmpeg_g ff_float_to_int16_c
11375 3.4965 ffmpeg_g ff_vorbis_floor1_render_list
6839 2.1022 libc-2.5.so memset
5367 1.6498 ffmpeg_g vorbis_floor1_decode
4193 1.2889 libc-2.5.so memcpy
2350 0.7224 ffmpeg_g output_packet
2216 0.6812 ffmpeg_g main
1423 0.4374 libc-2.5.so _int_malloc
975 0.2997 ffmpeg_g __aeabi_idiv
960 0.2951 ffmpeg_g __udivsi3
951 0.2923 ffmpeg_g __divdi3
935 0.2874 ffmpeg_g compute_pkt_fields
909 0.2794 libc-2.5.so memalign
866 0.2662 libc-2.5.so malloc_consolidate
824 0.2533 ffmpeg_g av_rescale_rnd
782 0.2404 libc-2.5.so _int_free
717 0.2204 ffmpeg_g compute_pkt_fields2
663 0.2038 ffmpeg_g av_interleaved_write_frame
637 0.1958 libc-2.5.so _int_memalign
588 0.1807 ffmpeg_g av_read_frame_internal
553 0.1700 ffmpeg_g build_table
505 0.1552 ffmpeg_g av_interleave_packet_per_dts
432 0.1328 ffmpeg_g ogg_packet
385 0.1183 ffmpeg_g .plt
381 0.1171 ffmpeg_g ogg_read_packet
352 0.1082 ffmpeg_g __gnu_ldivmod_helper
282 0.0867 ffmpeg_g avcodec_decode_audio2
254 0.0781 libc-2.5.so select
237 0.0729 libc-2.5.so free
Report from oprofile after patch:
samples % image name symbol name
59798 20.6286 ffmpeg_g.vfp ff_imdct_calc
54855 18.9234 ffmpeg_g.vfp ff_fft_calc_c
33664 11.6131 ffmpeg_g.vfp vorbis_decode_frame
32138 11.0867 ffmpeg_g.vfp ff_vector_fmul_add_add_c
21674 7.4769 ffmpeg_g.vfp vorbis_inverse_coupling
17204 5.9349 ffmpeg_g.vfp pcm_encode_frame
11785 4.0655 ffmpeg_g.vfp ff_vorbis_floor1_render_list
7472 2.5776 ffmpeg_g.vfp float_to_int16_vfp
6731 2.3220 libc-2.5.so memset
6678 2.3037 ffmpeg_g.vfp vector_fmul_vfp
5284 1.8228 ffmpeg_g.vfp vorbis_floor1_decode
4820 1.6628 ffmpeg_g.vfp vector_fmul_reverse_vfp
3975 1.3713 libc-2.5.so memcpy
2358 0.8134 ffmpeg_g.vfp output_packet
2239 0.7724 ffmpeg_g.vfp main
1461 0.5040 libc-2.5.so _int_malloc
1247 0.4302 ffmpeg_g.vfp __divdi3
1078 0.3719 ffmpeg_g.vfp __udivsi3
1059 0.3653 ffmpeg_g.vfp __aeabi_idiv
881 0.3039 ffmpeg_g.vfp compute_pkt_fields
805 0.2777 libc-2.5.so malloc_consolidate
744 0.2567 libc-2.5.so memalign
714 0.2463 libc-2.5.so _int_free
679 0.2342 ffmpeg_g.vfp compute_pkt_fields2
616 0.2125 libc-2.5.so _int_memalign
589 0.2032 ffmpeg_g.vfp av_interleaved_write_frame
565 0.1949 ffmpeg_g.vfp av_rescale_rnd
550 0.1897 ffmpeg_g.vfp build_table
537 0.1852 ffmpeg_g.vfp av_interleave_packet_per_dts
504 0.1739 ffmpeg_g.vfp av_read_frame_internal
501 0.1728 ffmpeg_g.vfp ogg_packet
382 0.1318 ffmpeg_g.vfp avcodec_decode_audio2
339 0.1169 ffmpeg_g.vfp .plt
321 0.1107 ffmpeg_g.vfp av_get_bits_per_sample
293 0.1011 ffmpeg_g.vfp ogg_read_packet
278 0.0959 ffmpeg_g.vfp __aeabi_uidivmod
275 0.0949 libc-2.5.so select
272 0.0938 ffmpeg_g.vfp __gnu_ldivmod_helper
243 0.0838 libm-2.5.so lrintf
236 0.0814 libc-2.5.so free
--
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: armvfp.diff
Type: text/x-diff
Size: 11320 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080420/f1777ecb/attachment.diff>
More information about the ffmpeg-devel
mailing list