[FFmpeg-devel] [PATCH] swscale/arm: add ff_nv{12, 21}_to_{argb, rgba, abgr, bgra}_neon
Michael Niedermayer
michaelni at gmx.at
Fri Nov 20 18:46:16 CET 2015
On Thu, Nov 19, 2015 at 06:29:23PM +0100, Clément Bœsch wrote:
> On Thu, Nov 19, 2015 at 04:50:54PM +0100, Michael Niedermayer wrote:
> > On Thu, Nov 19, 2015 at 11:48:53AM +0100, Clément Bœsch wrote:
> > > From: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> > >
> > > Signed-off-by: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> > > Signed-off-by: Clément Bœsch <clement at stupeflix.com>
> > >
> > > ---
> > > The function takes about 29ms with a 1080p source (testsrc2) on a
> > > cortex-a8. Though, 16ms (more than half the time!) is spend in the vst2
> > > call. Any suggestion on how to speed up this?
> > >
> > > Also, the reference code seems to cause some kind of ringing, while our
> > > ASM doesn't:
> > > http://b.pkh.me/nv12-rgba-ref.png
> > > http://b.pkh.me/nv12-rgba-neon.png
> >
> > what did you test exactly here ?
>
> ./ffmpeg -f lavfi -i testsrc2 -vf format=nv12,format=rgba -ss 1 -frames:v 1 -y nv12-rgba-ref.png
>
> (on ARM though, and with -cpuflags 0)
>
> > but there are several codepathes for rgb output, one uses LUTs and
> > not all use full resolution chroma
> >
>
> Yeah, we noticed...
>
> Note: on x86 there are some yuv2rgb mmx code but it's not called above
> because it doesn't handle nv12 (only yuv420 & friends), so the chroma
> issue is reproducible (it's calling the LUT path).
>
> >
> > >
> > > Last, we noticed that the y_offset is scaled to 1<<9 for some reason we
> > > couldn't figure out. Hopefully we're doing it correctly here.
> >
> > [...]
> > > +.macro compute_half_line dst half_y ofmt
> > > + vmovl.u8 q7, \half_y @ 8px of Y
> > > + vdup.16 q5, r9
> > > + vsub.s16 q7, q5
> > > + vmull.s16 q1, d14, d0 @ q1 = (srcY - y_offset) * y_coeff (left)
> > > + vmull.s16 q2, d15, d0 @ q2 = (srcY - y_offset) * y_coeff (right)
> >
> > if you do something like (srcY) * y_coeff - y_offset2
> > then you could keep a bit more precission in the requested brightness
> > correction
>
> The code in swscale/output.c seems to always use the form we use here. Is
> it on purpose?
if srcY has some extra bits precission then it shuld be fine
>
> > OTOH maybe you want to be bitexact to some existing codepath
> >
>
> Right... I suppose we don't have much tests with custom
> brightness/contrast/saturation. Should I add expose them in vf_scale and
> see how much breaks? :)
contrast/brightness/saturation fate tests are welcome
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151120/bd5e58ac/attachment.sig>
More information about the ffmpeg-devel
mailing list