[FFmpeg-devel] [PATCH 1/2] swresample: Refactor resample asm and port it to yasm
Michael Niedermayer
michaelni at gmx.at
Thu Mar 20 04:04:02 CET 2014
On Wed, Mar 19, 2014 at 10:16:17PM -0300, James Almer wrote:
> On 19/03/14 9:08 PM, Michael Niedermayer wrote:
> > On Wed, Mar 19, 2014 at 06:45:03PM -0300, James Almer wrote:
> >> This reduces code duplication and makes it easier to implement new asm
> >> functions in the future
> >>
> >> Signed-off-by: James Almer <jamrial at gmail.com>
> >> ---
> >> libswresample/resample.c | 96 ++++++++++---------------------------
> >> libswresample/resample_template.c | 49 +++++++------------
> >> libswresample/swresample_internal.h | 24 ++++++++++
> >> libswresample/x86/Makefile | 1 +
> >> libswresample/x86/resample.asm | 64 +++++++++++++++++++++++++
> >> libswresample/x86/resample_mmx.h | 74 ----------------------------
> >> libswresample/x86/swresample_x86.c | 16 +++++++
> >> 7 files changed, 148 insertions(+), 176 deletions(-)
> >> create mode 100644 libswresample/x86/resample.asm
> >> delete mode 100644 libswresample/x86/resample_mmx.h
> >
> > benchmark:
> >
> > before: 253482 decicycles in resample, 1024 runs, 0 skips
> > after 356545 decicycles in resample, 1024 runs, 0 skips
> >
> > tested using ffplay HAYLEY\ WESTENRA-WHISPERS\ IN\ A\ DREAM.webm -af aformat=s32,aresample=48000,aformat=s32
> >
> >
>
> Where did you put the timer.h macros? I put them at the beginning and end of
> the swri_resample_<sampleformat> function/macro in resample_template.c
i had them in the loop in multiple_resample()
> And what about 16bits 44100khz to 16 bits 22050khz (using the sse2 code), which
> is the one i tried and where i noticed a boost?
i didnt try that one
>
> Testing a 16bits 44100khz file and using the command you mention above (but with
> ffmpeg) i get
>
> before: 2606446 decicycles in resample, 65522 runs, 14 skips
> after: 2642538 decicycles in resample, 65497 runs, 39 skips
interresting
>
> Which is indeed slower but not nearly as bad as in your test. Though without
> testing the same files doubt we could get a proper picture.
>
> Nonetheless, we can drop this patch if it really affects performance that much
> in some scenarios. I mainly wrote it to reduce the considerable code duplication
> that exists and that will increase with each asm version added, and to remove
> arch-specific code that was outside the respective folders.
>
> I can port the float sse version to inline in that case.
>
> >> +%if mmsize == 8
> >> + emms
> >> +%endif
> >
> > this is not ok
> > emms is slow and does not belong in the inner loop
>
> This is a problem. Not sure how to make sure to run emms_c() from outside the
> loop only when an mmx version of scalarproduct is used.
well, it was outside before the patch
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The real ebay dictionary, page 2
"100% positive feedback" - "All either got their money back or didnt complain"
"Best seller ever, very honest" - "Seller refunded buyer after failed scam"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140320/22f8edfa/attachment.asc>
More information about the ffmpeg-devel
mailing list