[FFmpeg-devel] [PATCHv2] avutil/lls: speed up performance of solve_lls
Michael Niedermayer
michaelni at gmx.at
Wed Nov 25 12:29:07 CET 2015
On Tue, Nov 24, 2015 at 10:13:22PM -0500, Ganesh Ajjanagadde wrote:
> This is a trivial rewrite of the loops that results in better
> prefetching and associated cache efficiency. Essentially, the problem is
> that modern prefetching logic is based on finite state Markov memory, a reasonable
> assumption that is used elsewhere in CPU's in for instance branch
> predictors.
>
> Surrounding loops all iterate forward through the array, making the
> predictor think of prefetching in the forward direction, but the
> intermediate loop is unnecessarily in the backward direction.
>
> Speedup is nontrivial. Benchmarks obtained by 10^6 iterations within
> solve_lls, with START/STOP_TIMER. File is tests/data/fate/flac-16-lpc-cholesky.err.
> Hardware: x86-64, Haswell, GNU/Linux.
>
> new:
> 17291 decicycles in solve_lls, 2096706 runs, 446 skips
> 17255 decicycles in solve_lls, 4193657 runs, 647 skips
> 17231 decicycles in solve_lls, 8384997 runs, 3611 skips
> 17189 decicycles in solve_lls,16771010 runs, 6206 skips
> 17132 decicycles in solve_lls,33544757 runs, 9675 skips
> 17092 decicycles in solve_lls,67092404 runs, 16460 skips
> 17058 decicycles in solve_lls,134188213 runs, 29515 skips
>
> old:
> 18009 decicycles in solve_lls, 2096665 runs, 487 skips
> 17805 decicycles in solve_lls, 4193320 runs, 984 skips
> 17779 decicycles in solve_lls, 8386855 runs, 1753 skips
> 18289 decicycles in solve_lls,16774280 runs, 2936 skips
> 18158 decicycles in solve_lls,33548104 runs, 6328 skips
> 18420 decicycles in solve_lls,67091793 runs, 17071 skips
> 18310 decicycles in solve_lls,134187219 runs, 30509 skips
>
> Reviewed-by: Michael Niedermayer <michael at niedermayer.cc>
> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> ---
> libavutil/lls.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
LGTM
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Those who are best at talking, realize last or never when they are wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151125/1159f54b/attachment.sig>
More information about the ffmpeg-devel
mailing list