[FFmpeg-devel] [PATCH] swresample/resample: improve bessel function accuracy

Tue Nov 3 05:16:06 CET 2015

>if one removes the crippling
>-fno-tree-vectorize
Yes, I think a config option to turn this flag on (like the unsafe
bitstream reader) would be good. Defaulting to off by default if it doesn't
break anything for at least a few people (and compilers) who test it. It's
not a big performance impact but every little bit counts nowadays.

On 3 November 2015 at 03:10, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:

> On Mon, Nov 2, 2015 at 9:32 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
> > On Mon, Nov 2, 2015 at 6:49 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
> >> On Mon, Nov 2, 2015 at 6:37 PM, wm4 <nfxjfg at googlemail.com> wrote:
> >>> On Mon,  2 Nov 2015 14:49:54 -0500
> >>> Ganesh Ajjanagadde <gajjanagadde at gmail.com> wrote:
> >>>
> >>>> This improves accuracy for the bessel function, and this in turn
> should
> >>>> improve the quality of the Kaiser window.
> >>>
> >>>
> >>> "Should"? Does it or does it not? If you don't know, why the patch?
> >>
> >> It improves the window in the sense of mathematically matching the
> >> Kaiser window expression due to the improved bessel function accuracy.
> >> That claim I have verified and placed in the commit message with
> >> evidence.
> >>
> >> What that translates into in terms of resampling accuracy, I don't
> >> know. Normally, such things do help reduce the noise floor, but I
> >> don't know an easy way to test via FATE or associated tools. I put
> >> that caveat in the bottom lines of the patch.
> >
> > Turns out the init speed is definitely improved as well (~20% boost
> > with default settings).
> > I did a cheap trick of calling build filter 1000 times to get a large
> > number of runs.
> > Results (x86-64, Haswell, GNU/Linux):
> >
> > test: fate-swr-resample-dblp-44100-2626
> >
> > new:
> > 995894468 decicycles in build_filter(loop 1000),     256 runs,      0
> skips
> > 1029719302 decicycles in build_filter(loop 1000),     512 runs,      0
> skips
> > 984101131 decicycles in build_filter(loop 1000),    1024 runs,      0
> skips
> >
> > old:
> > 1250020763 decicycles in build_filter(loop 1000),     256 runs,      0
> skips
> > 1246353282 decicycles in build_filter(loop 1000),     512 runs,      0
> skips
> > 1220017565 decicycles in build_filter(loop 1000),    1024 runs,      0
> skips
> >
> > Thus, this patch has both things going for it luckily. Will leave to
> > the maintainer (Michael I believe) to give details of accuracy
> > benefits as translated to the actual resampling if easily testable,
> > and I will add the perf numbers to the message.
>
> One last comment on performance (this is something I have discussed
> with Timothy privately), if one removes the crippling
> -fno-tree-vectorize, FATE still passes on my GCC 5.2 setup, and one
> gets further ~5 % perf improvement here:
> 949318408 decicycles in build_filter(loop 1000),     256 runs,      0 skips
> 948795082 decicycles in build_filter(loop 1000),     512 runs,      0 skips
> 928925076 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
>
> I am sure a lot of other places may benefit. Yes, I know it was buggy
> on older GCC versions, etc, but I see no reason to cripple users with
> modern toolchains.
> The commit 973859f5230e77beea7bb59dc081870689d6d191 is ancient and has
> a "sloppy" commit message.
>
> "It provides no speed benefit" is not true, and even back then was
> never backed up by hard numbers, though the commit would have been ok
> on non x86 at the time, possibly even now. As can be seen clearly from
> https://ffmpeg.org/pipermail/ffmpeg-devel/2009-July/069586.html, much
> of Mans's evidence came from non x86 architectures which are Mans's
> expertise. In the absence of any complaints, the change was committed
> and x86 people have suffered from it. Yes, particular versions may be
> buggy, but such things should be checked more carefully especially
> since this is "free performance".
>
> Mans's argument was that adventurous people could add it back in.
> Needless to say, it was not documented anywhere, and casual users like
> me (and I suspect some others here) were completely unaware of this.
> It is quite obscure and easy to miss unless one digs through configure
> (or config.mak), or finds out from someone else.
>
> I also recall (and this prompted the private conversation with
> Timothy) that Timothy's memcpy change from the loop actually yield no
> difference with a half decent auto-vectoriser. It is a low hanging
> fruit in my view, and I have posted a perf number above.
>
> At the very least, we should inform users of this low hanging thing
> e.g by printing out during configure an "info" message - they can try
> out an "adventurous" build with this, test out FATE, and if it passes,
> go ahead with a new build.
> Or we could have a "dev option" for --enable-vectorize.
> Or at a more ambitious scale, we could collect configs where this will
> work, test for them, and only disable otherwise.
> There are a number of possibilities, for which a separate thread is better.
>
> >
> >>
> >>> _______________________________________________
> >>> ffmpeg-devel mailing list
> >>> ffmpeg-devel at ffmpeg.org
> >>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>