[FFmpeg-devel] Some IWMMXT functions for libavcodec

Mon May 19 15:48:39 CEST 2008

On Saturday 17 May 2008, Siarhei Siamashka wrote:
> On Saturday 17 May 2008, Dmitry Antipov wrote:
> > Michael Niedermayer wrote:
> > > So write code which is near perfect on both
> >
> > As we're investigated,
> >
> >      wldrd wr2, [%1, #8]
> >      wldrd wr1, [%1], %2
> >
> > is much better than
> >
> >      wldrd wr1, [%1]
> >      wldrd wr2, [%1, #8]
> >      add %1, %1, %2
> >
> > but the first version will work on PXA3xx cores only.

[...]

> Now about the performance. The following code is perfectly fine until we
> take potential cache misses into account:
>       wldrd wr2, [%1, #8]
>       wldrd wr1, [%1], %2
>
> But this code reads memory "backwards" and may (or may not) result in worse
> performance.

[...]

Well, to be on a safe side, it is probably better to generally use
register pre-increment in the code whenever possible, such as:
      wldrd wr1, [%1, %2]!
      wldrd wr2, [%1, #8]
and for WMMX1:
      add %1, %1, %2
      wldrd wr1, [%1]
      wldrd wr2, [%1, #8]

The code for loading WMMX register with register pre-increment could be
wrapped into a macro and expanded differently for WMMX1 and WMMX2 to keep
the rest of code identical and avoid copy-paste.

I wanted to avoid comparing XScale with ARM11, but similar construction would
have pipeline stall on ARM11 (because of base register update in the
instruction immediately preceeding data load):
      add %1, %1, %2
      wldrd wr1, [%1]

Most likely it should be ok for XScale, but it is still better to check this
by running benchmarks to be sure.

-- 
Best regards,
Siarhei Siamashka