[FFmpeg-devel] Some IWMMXT functions for libavcodec
Siarhei Siamashka
siarhei.siamashka
Mon May 19 15:48:39 CEST 2008
On Saturday 17 May 2008, Siarhei Siamashka wrote:
> On Saturday 17 May 2008, Dmitry Antipov wrote:
> > Michael Niedermayer wrote:
> > > So write code which is near perfect on both
> >
> > As we're investigated,
> >
> > wldrd wr2, [%1, #8]
> > wldrd wr1, [%1], %2
> >
> > is much better than
> >
> > wldrd wr1, [%1]
> > wldrd wr2, [%1, #8]
> > add %1, %1, %2
> >
> > but the first version will work on PXA3xx cores only.
[...]
> Now about the performance. The following code is perfectly fine until we
> take potential cache misses into account:
> wldrd wr2, [%1, #8]
> wldrd wr1, [%1], %2
>
> But this code reads memory "backwards" and may (or may not) result in worse
> performance.
[...]
Well, to be on a safe side, it is probably better to generally use
register pre-increment in the code whenever possible, such as:
wldrd wr1, [%1, %2]!
wldrd wr2, [%1, #8]
and for WMMX1:
add %1, %1, %2
wldrd wr1, [%1]
wldrd wr2, [%1, #8]
The code for loading WMMX register with register pre-increment could be
wrapped into a macro and expanded differently for WMMX1 and WMMX2 to keep
the rest of code identical and avoid copy-paste.
I wanted to avoid comparing XScale with ARM11, but similar construction would
have pipeline stall on ARM11 (because of base register update in the
instruction immediately preceeding data load):
add %1, %1, %2
wldrd wr1, [%1]
Most likely it should be ok for XScale, but it is still better to check this
by running benchmarks to be sure.
--
Best regards,
Siarhei Siamashka
More information about the ffmpeg-devel
mailing list