[FFmpeg-devel] [PATCH] Optimized unscaled yuvp9/yuvp10 -> yuvp16 conversion.
Michael Niedermayer
michaelni at gmx.at
Sun Aug 12 14:49:45 CEST 2012
On Sun, Aug 12, 2012 at 02:47:09PM +0200, Reimar Döffinger wrote:
> On Sat, Aug 11, 2012 at 04:52:19PM +0200, Michael Niedermayer wrote:
> > On Sat, Aug 11, 2012 at 02:18:36PM +0200, Reimar Döffinger wrote:
> > > About 30% faster on 32 bit Atom, 120% faster on 64 bit Phenom2.
> > > This is interesting because supporting P16 is easier in e.g.
> > > OpenGL (can misuse support for any 2-component 8 bit format),
> > > whereas supporting p9/p10 without conversion needs a texture
> > > format with at least 14 bits actual precision.
> > >
> > > Signed-off-by: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
> > > ---
> > > libswscale/swscale_unscaled.c | 26 ++++++++++++++++++++++++++
> > > 1 file changed, 26 insertions(+)
> > >
> > > diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
> > > index c391a07..6618966 100644
> > > --- a/libswscale/swscale_unscaled.c
> > > +++ b/libswscale/swscale_unscaled.c
> > > @@ -830,7 +830,33 @@ static int planarCopyWrapper(SwsContext *c, const uint8_t *src[],
> > > srcPtr += srcStride[plane];
> > > }
> > > } else if (src_depth <= dst_depth) {
> > > + int orig_length = length;
> > > for (i = 0; i < height; i++) {
> > > + if(isBE(c->srcFormat) == HAVE_BIGENDIAN &&
> > > + isBE(c->dstFormat) == HAVE_BIGENDIAN) {
> > > + unsigned shift = dst_depth - src_depth;
> > > + length = orig_length;
> > > +#if HAVE_FAST_64BIT
> > > +#define FAST_COPY_UP(shift) \
> > > + for (j = 0; j < length - 3; j += 4) { \
> > > + uint64_t v = AV_RN64A(srcPtr2 + j); \
> > > + AV_WN64A(dstPtr2 + j, v << shift); \
> > > + } \
> > > + length &= 3;
> > > +#else
> > > +#define FAST_COPY_UP(shift) \
> > > + for (j = 0; j < length - 1; j += 2) { \
> > > + uint32_t v = AV_RN32A(srcPtr2 + j); \
> > > + AV_WN32A(dstPtr2 + j, v << shift); \
> > > + } \
> > > + length &= 1;
> > > +#endif
> >
> > these look wrong for the shiftonly==0 case
>
> Ops, sorry, I went back and forth a few time how to handle that case
> and at some point the condition was lost.
> The code is not meant to handle shiftonly==0 because
> a) The case I was looking at (MPlayer) never uses it
> b) It needs an extra "and" compared to the non-SIMDified version,
> which means for 32 bit it tends to not be relevantly faster, at
> least for some compiler/compiler options variations (for example
> when compiling with 4.6 for Atom the loop won't be unrolled, so
> lots of loop overhead, whereas when compiling for k8 it will be
> unrolled and prefetch added...).
ok then the patch LGTM with a if(shiftonly) added
thnaks
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Observe your enemies, for they first find out your faults. -- Antisthenes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120812/06d13194/attachment.asc>
More information about the ffmpeg-devel
mailing list