[FFmpeg-devel] [RFC] Loop unrolling in C code for 'vector_fmul_*' functions
Alexander Strange
astrange
Mon Apr 21 00:08:22 CEST 2008
On Sun, Apr 20, 2008 at 6:01 PM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
> [..]
>
> Getting back to this issue.
>
> It is good that I did not submit a report to the gcc devels, otherwise I would
> make an idiot out of myself submitting invalid report :)
>
> The problem is that
>
> void vector_fmul_c_unrolled(float *dst, const float *src, int len)
> {
> int i;
> for(i = 0; i < len; i += 8) {
> dst[i + 0] *= src[i + 0];
> dst[i + 1] *= src[i + 1];
> dst[i + 2] *= src[i + 2];
> dst[i + 3] *= src[i + 3];
> dst[i + 4] *= src[i + 4];
> dst[i + 5] *= src[i + 5];
> dst[i + 6] *= src[i + 6];
> dst[i + 7] *= src[i + 7];
> }
> }
>
> and
>
> void vector_fmul_c_other_unrolled(float *dst, const float *src, int len)
> {
> int i;
> register float tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7;
> for(i = 0; i < len; i += 8) {
> tmp0 = src[i + 0];
> tmp1 = src[i + 1];
> tmp2 = src[i + 2];
> tmp3 = src[i + 3];
> tmp4 = src[i + 4];
> tmp5 = src[i + 5];
> tmp6 = src[i + 6];
> tmp7 = src[i + 7];
> dst[i + 0] *= tmp0;
> dst[i + 1] *= tmp1;
> dst[i + 2] *= tmp2;
> dst[i + 3] *= tmp3;
> dst[i + 4] *= tmp4;
> dst[i + 5] *= tmp5;
> dst[i + 6] *= tmp6;
> dst[i + 7] *= tmp7;
> }
> }
>
> are not actually identical.
>
> The compiler needs to take into account the case when 'dst' and
> 'src' buffers overlap and it is impossible to optimize the code
> from 'vector_fmul_c_unrolled' function scheduling instructions just
> like in 'vector_fmul_c_other_unrolled'.
>
> The fact that 'dst' and 'src' buffers don't overlap is one more useful
> constraint which can be exploited when doing optimizations.
>
> Those who are interested in this issue, can look at '-fargument-alias',
> '-fargument-noalias' and '-fargument-noalias-global' gcc options.
>
> Too bad that I did not find any gcc function attribute that could be used to
> tell the compiler that pointer arguments from some particular function do not
> alias without using this setting for all the project risking to break
> something.
>
> Anyway, at least it in this case gcc was not at fault :)
The C keyword "restrict" will do this. gcc has some problems with it -
it's ignored for char*, so we can't use it to fix cases like
get_cabac* where char* aliasing causes a lot of unnecessary stores -
but it might work here. If not, you can sometimes fix it by inlining
the function into somewhere where the original definition of src/dst
are both visible.
More information about the ffmpeg-devel
mailing list