[FFmpeg-devel] [PATCH] H264 MC8 SSSE3 minor speedups
Michael Niedermayer
michaelni
Sat Dec 18 03:50:01 CET 2010
On Fri, Dec 17, 2010 at 08:28:55PM -0500, Ronald S. Bultje wrote:
> Hi,
>
> On Sat, Aug 21, 2010 at 1:18 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> > On Sat, 21 Aug 2010, Ronald S. Bultje wrote:
> >
> >> 604 dezicycles in w=8, 65535 runs, 1 skips
> >> 603 dezicycles in w=8, 131067 runs, 5 skips
> >> 606 dezicycles in w=8, 262137 runs, 7 skips
> >> 606 dezicycles in w=8, 524275 runs, 13 skips
> >> 605 dezicycles in w=8, 1048552 runs, 24 skips
> >
> > Bad benchmark technique. You should report only the last dezicycle line
> > (i.e. the one with the highest # of runs, which includes all the previous
> > data). But run the whole program multiple times, and report the last line
> > from each.
>
> Late...
>
> first change (movq+mohlhps -> movdqa, before
> 532 dezicycles in mc8, 524271 runs, 17 skips
> 532 dezicycles in mc8, 524273 runs, 15 skips
> 539 dezicycles in mc8, 524267 runs, 21 skips
> 537 dezicycles in mc8, 524272 runs, 16 skips
> 532 dezicycles in mc8, 524274 runs, 14 skips
> 538 dezicycles in mc8, 524274 runs, 14 skips
> after
> 533 dezicycles in mc8, 524278 runs, 10 skips
> 528 dezicycles in mc8, 524267 runs, 21 skips
> 527 dezicycles in mc8, 524272 runs, 16 skips
> 525 dezicycles in mc8, 524269 runs, 19 skips
> 525 dezicycles in mc8, 524274 runs, 14 skips
> 530 dezicycles in mc8, 524276 runs, 12 skips
>
> So a little (~1 cycle) faster.
>
> Then the other change (remove movdqa), before (with above change included):
> 1004 dezicycles in mc8, 131070 runs, 2 skips
> 1008 dezicycles in mc8, 131066 runs, 6 skips
> 996 dezicycles in mc8, 131068 runs, 4 skips
> 1000 dezicycles in mc8, 131068 runs, 4 skips
> 1055 dezicycles in mc8, 131065 runs, 7 skips
> 1006 dezicycles in mc8, 131069 runs, 3 skips
> after:
> 1007 dezicycles in mc8, 131070 runs, 2 skips
> 1005 dezicycles in mc8, 131067 runs, 5 skips
> 1017 dezicycles in mc8, 131068 runs, 4 skips
> 1008 dezicycles in mc8, 131064 runs, 8 skips
> 990 dezicycles in mc8, 131070 runs, 2 skips
> 1014 dezicycles in mc8, 131067 runs, 5 skips
>
> So confusingly, the 2nd change appears to not be faster. Also binary
> size is the same (probably b/c of alignment further down).
> What to do?
random ideas:
1. find something else to optimize
2. change it or leave it as it is
3. commit sepuku
4. benchmark on a different cpu
5. download pr0n
6. join the salvation army
7. tell me to shut up with the bad off topic jokes
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I have often repented speaking, but never of holding my tongue.
-- Xenocrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101218/6f571cfd/attachment.pgp>
More information about the ffmpeg-devel
mailing list