[FFmpeg-devel] [PATCH] lavfi/vf_deshake: fix segfaults #2443
João Bernardo
jbvsmo at gmail.com
Mon Apr 15 16:43:31 CEST 2013
> > 2nd - SSE2 instruction PSADBW need memory aligned for 128bit
> operands
> > (XMM)
> >
> > diff --git a/libavcodec/x86/motion_est.c b/libavcodec/x86/motion_est.c
> > index 3ffb002..d828d8a 100644
> > --- a/libavcodec/x86/motion_est.c
> > +++ b/libavcodec/x86/motion_est.c
> > @@ -104,8 +104,10 @@ static int sad16_sse2(void *v, uint8_t *blk2,
> uint8_t
> > *blk1, int stride, int h)
> > "1: \n\t"
> > "movdqu (%1), %%xmm0 \n\t"
> > "movdqu (%1, %4), %%xmm1 \n\t"
> > - "psadbw (%2), %%xmm0 \n\t"
> > - "psadbw (%2, %4), %%xmm1 \n\t"
> > + "movdqu (%2), %%xmm2 \n\t"
> > + "movdqu (%2, %4), %%xmm3 \n\t"
> > + "psadbw %%xmm2, %%xmm0 \n\t"
> > + "psadbw %%xmm3, %%xmm1 \n\t"
>
> The input to this function must be aligned to the blocksize of 8 or 16
>
This is not possible with "rx" values on deshake filter (unless you do
extra copying).
> the caller is buggy if it calls this function on misaligned data,
> the alignment requirement exists to maximize speed and to simplify
> SIMD implementions
>
For already aligned data, the above change will likely not affect
performance, but it will not crash for misaligned data.
So SSE2 should be disabled for deshake filter (bad idea) or you can add a
"sad16_sse2_misaligned" version of the function or
you can also try to measure the performance of both cases.
> please see dsputil.h:
> typedef int (*me_cmp_func)(void /*MpegEncContext*/ *s, uint8_t
> *blk1/*align width (8 or 16)*/, uint8_t *blk2/*align 1*/, int line_size,
> int h)/* __attribute__ ((const))*/;
>
>
Actually the requirement is for it be aligned on width 16 for XMM
registers. On MMX registers, you can have the packed sum
on misaligned data if you disable the check.
More information about the ffmpeg-devel
mailing list