[FFmpeg-devel] [PATCH] Add add_pixels4/8() to h264dsp, and remove add_pixels4 from dsputil.

Mon Feb 11 02:27:35 CET 2013

On Sun, Feb 10, 2013 at 05:16:43PM -0800, Ronald S. Bultje wrote:
> Hi,
> 
> On Sun, Feb 10, 2013 at 5:10 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Sun, Feb 10, 2013 at 04:12:55PM -0800, Ronald S. Bultje wrote:
> >> Hi,
> >>
> >> On Sat, Feb 9, 2013 at 5:49 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> > On Sat, Feb 09, 2013 at 03:43:56PM -0800, Ronald S. Bultje wrote:
> >> >> From: "Ronald S. Bultje" <rsbultje at gmail.com>
> >> >>
> >> >> These functions are mostly H264-specific (the only other user I can
> >> >> spot is bink), and this allows us to special-case some functionality
> >> >> for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
> >> >> and merge the duplicate 32-bit-coeff for >8bpp (identical).
> >> >
> >> > [...]
> >> >
> >> >> +
> >> >> +#include "bit_depth_template.c"
> >> >> +
> >> >> +static void FUNCC(ff_h264_add_pixels4)(uint8_t *_dst, int16_t *_src, int stride)
> >> >> +{
> >> >> +    int i;
> >> >> +    pixel *dst = (pixel *) _dst;
> >> >> +    dctcoef *src = (dctcoef *) _src;
> >> >
> >> >> +    stride /= sizeof(pixel);
> >> >
> >> > a >> should be faster
> >>
> >> It's used as an increment for type int16_t, so it's actually undone in
> >> the assembly. Example disassembly on x86-32:
> >>
> >> _ff_h264_add_pixels4_8_c:
> >> 0000bc20        pushl   %ebx
> >> 0000bc21        pushl   %esi
> >> 0000bc22        movl    0x0c(%esp),%eax ; dst
> >> 0000bc26        addl    $0x03,%eax
> >> 0000bc29        xorl    %ecx,%ecx
> >> 0000bc2b        movl    0x14(%esp),%edx ; linesize
> >> 0000bc2f        movl    0x10(%esp),%esi ; block
> >> 0000bc33        nopw    _ff_h264dsp_init(%eax,%eax)
> >> 0000bc39        nopl    _ff_h264dsp_init(%eax)
> >> 0000bc40        movb    (%esi,%ecx,8),%bl ; load
> >> 0000bc43        addb    %bl,0xfd(%eax) ; add
> >> 0000bc46        movb    0x02(%esi,%ecx,8),%bl ; load
> >> 0000bc4a        addb    %bl,0xfe(%eax) ; add
> >> 0000bc4d        movb    0x04(%esi,%ecx,8),%bl ; load
> >> 0000bc51        addb    %bl,0xff(%eax) ; add
> >> 0000bc54        movb    0x06(%esi,%ecx,8),%bl ; load
> >> 0000bc58        addb    %bl,(%eax) ; add
> >> 0000bc5a        addl    %edx,%eax ; += linesize
> >> 0000bc5c        incl    %ecx ; block increment
> >> 0000bc5d        cmpl    $0x04,%ecx ; next line
> >> 0000bc60        jne     0x0000bc40 ; jump
> >> 0000bc62        movl    $_ff_h264dsp_init,0x04(%esi) ; $_... is
> >> actually zero, so this zeroes the block
> >> 0000bc69        movl    $_ff_h264dsp_init,(%esi)
> >> 0000bc6f        movl    $_ff_h264dsp_init,0x0c(%esi)
> >> 0000bc76        movl    $_ff_h264dsp_init,0x08(%esi)
> >> 0000bc7d        movl    $_ff_h264dsp_init,0x14(%esi)
> >> 0000bc84        movl    $_ff_h264dsp_init,0x10(%esi)
> >> 0000bc8b        movl    $_ff_h264dsp_init,0x1c(%esi)
> >> 0000bc92        movl    $_ff_h264dsp_init,0x18(%esi)
> >> 0000bc99        popl    %esi
> >> 0000bc9a        popl    %ebx
> >> 0000bc9b        ret
> >> 0000bc9c        nopl    _ff_h264dsp_init(%eax)
> >>
> >> As you see, no division or anything weird, the compiler knows what to do.
> >
> > gcc on x86 does in this case, yes
> > still IMHO it would be better not to depend on the compiler
> > optimizing the division out ...
> 
> Since original code does it too, can we do this in a separate commit?

sure


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The real ebay dictionary, page 2
"100% positive feedback" - "All either got their money back or didnt complain"
"Best seller ever, very honest" - "Seller refunded buyer after failed scam"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130211/15539c67/attachment.asc>