[FFmpeg-devel] [PATCH] NEON: put_pixels_clamped
David Conrad
lessen42
Wed Apr 29 10:57:37 CEST 2009
On Apr 16, 2009, at 4:23 PM, M?ns Rullg?rd wrote:
> David Conrad <lessen42 at gmail.com> writes:
>
>> On Apr 16, 2009, at 3:44 PM, David Conrad wrote:
>>
>>> On Apr 16, 2009, at 3:32 PM, M?ns Rullg?rd wrote:
>>>
>>>> David Conrad <lessen42 at gmail.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> Apparently this is used for some wmv3 files in addition to the
>>>>> signed
>>>>> variant.
>>>>> < 1% faster decode.
>>>>>
>>>>>
>>>>> commit 38cac0d21d8308e077bb762d712ab7c19e8c826d
>>>>> Author: David Conrad <davedc at Kozue.local>
>>>>> Date: Thu Apr 16 14:30:29 2009 -0400
>>>>>
>>>>> NEON: put_pixels_clamped
>>>>>
>>>>> diff --git a/libavcodec/arm/dsputil_neon.c b/libavcodec/arm/
>>>>> dsputil_neon.c
>>>>> index 37425a3..9b95130 100644
>>>>> --- a/libavcodec/arm/dsputil_neon.c
>>>>> +++ b/libavcodec/arm/dsputil_neon.c
>>>>> @@ -42,6 +42,7 @@ void ff_put_pixels8_xy2_no_rnd_neon(uint8_t *,
>>>>> const uint8_t *, int, int);
>>>>> void ff_avg_pixels16_neon(uint8_t *, const uint8_t *, int, int);
>>>>>
>>>>> void ff_add_pixels_clamped_neon(const DCTELEM *, uint8_t *, int);
>>>>> +void ff_put_pixels_clamped_neon(const DCTELEM *, uint8_t *, int);
>>>>> void ff_put_signed_pixels_clamped_neon(const DCTELEM *, uint8_t *,
>>>>> int);
>>>>>
>>>>> void ff_put_h264_qpel16_mc00_neon(uint8_t *, uint8_t *, int);
>>>>> @@ -180,6 +181,7 @@ void ff_dsputil_init_neon(DSPContext *c,
>>>>> AVCodecContext *avctx)
>>>>> c->avg_pixels_tab[0][0] = ff_avg_pixels16_neon;
>>>>>
>>>>> c->add_pixels_clamped = ff_add_pixels_clamped_neon;
>>>>> + c->put_pixels_clamped = ff_put_pixels_clamped_neon;
>>>>> c->put_signed_pixels_clamped =
>>>>> ff_put_signed_pixels_clamped_neon;
>>>>>
>>>>> c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_neon;
>>>>> diff --git a/libavcodec/arm/dsputil_neon_s.S b/libavcodec/arm/
>>>>> dsputil_neon_s.S
>>>>> index f16293d..159ee64 100644
>>>>> --- a/libavcodec/arm/dsputil_neon_s.S
>>>>> +++ b/libavcodec/arm/dsputil_neon_s.S
>>>>> @@ -273,6 +273,30 @@ function ff_put_h264_qpel8_mc00_neon,
>>>>> export=1
>>>>> pixfunc2 put_ pixels8_y2, _no_rnd, vhadd.u8
>>>>> pixfunc2 put_ pixels8_xy2, _no_rnd, vshrn.u16, 1
>>>>>
>>>>> +function ff_put_pixels_clamped_neon, export=1
>>>>> + vld1.64 {d16-d19}, [r0,:128]!
>>>>> + vqmovn.u16 d0, q8
>>>>> + vld1.64 {d20-d23}, [r0,:128]!
>>>>> + vqmovn.u16 d1, q9
>>>>> + vqmovn.u16 d2, q10
>>>>> + vld1.64 {d24-d27}, [r0,:128]!
>>>>> + vqmovn.u16 d3, q11
>>>>> + vqmovn.u16 d4, q12
>>>>> + vld1.64 {d28-d31}, [r0,:128]!
>>>>> + vqmovn.u16 d5, q13
>>>>> + vqmovn.u16 d6, q14
>>>>> + vst1.64 {d0}, [r1,:64], r2
>>>>> + vqmovn.u16 d7, q15
>>>>> + vst1.64 {d1}, [r1,:64], r2
>>>>> + vst1.64 {d2}, [r1,:64], r2
>>>>> + vst1.64 {d3}, [r1,:64], r2
>>>>> + vst1.64 {d4}, [r1,:64], r2
>>>>> + vst1.64 {d5}, [r1,:64], r2
>>>>> + vst1.64 {d6}, [r1,:64], r2
>>>>> + vst1.64 {d7}, [r1,:64], r2
>>>>> + bx lr
>>>>> + .endfunc
>>>>
>>>> Shouldn't those be vqmovun.s16? I'd also try to interleave them
>>>> with
>>>> the loads and stores a bit more for better dual-issue
>>>> opportunities.
>>>
>>> Unsigned pixels; MMX does the same (packuswb for put_pixels_clamped
>>> vs. packsswb for put_signed_pixels_clamped)
>>
>> Oops, you're right, I didn't read packuswb. New patch attached.
>>
>>> Also, the loads take two issue cycles since they're loading 4
>>> registers; shouldn't they be able to dual issue on both cycles?
>
> On Cortex-A8 NEON instructions with more than one issue cycle can
> dual-issue on the first or the last cycle but not both.
As discussed on IRC, all the multicycle NEON instructions I've checked
can dual issue on both their first and last cycles.
I can't measure any speed difference doing it this way, but it looks a
bit more consistent with the other put_/add_ pixels_clamped.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ffmpeg-neon-put_pixels_clamped.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090429/fd4bba0a/attachment.txt>
-------------- next part --------------
More information about the ffmpeg-devel
mailing list