[Ffmpeg-devel] SSE load and store doubts
Roberto Pariset
r.pariset
Thu Oct 6 11:10:59 CEST 2005
hello everyone.
please consider the following example:
/* imagine long_array filled with floats here */
float long_array[MANY] __attribute__((aligned(16)));
i often[1] see this kind of code:
__m128 *reg = (__m128 *) long_array;
for(i=0; i<MANY; i+=4)
{
do_stuff();
r++; /* skip to next 4 floats of long_array */
}
while i'd expect the following:
__m128 reg;
for(i=0; i<MANY; i+=4)
{
reg = _mm_load_ps( &long_array[i] );
reg = do_stuff();
_mm_store_ps( &long_array[i], reg );
}
if i compile and deassemble a simple example as the one before, i see
the first doesn't actually use XMMn registers, while the second does:
reg = _mm_load_ps( &long_array[i] );
400548: 48 8d 7d 80 lea 0xffffffffffffff80(%rbp),%rdi
40054c: e8 67 00 00 00 callq 4005b8 <_mm_load_ps> 400551:
0f 29 45 e0 movaps %xmm0,0xffffffffffffffe0(%rbp)
__m128 *reg = (__m128 *) long_array;
400555: 48 8d 85 70 ff ff ff lea 0xffffffffffffff70(%rbp),%rax
40055c: 48 89 45 f0 mov %rax,0xfffffffffffffff0(%rbp)
so, basically, i am not sure if this is an error or not, as i am just a
n00b with SSE. to me, it seems that the first syntax is not taking
advantage of sse register, so it'd not make things faster. i might be
wrong, of course. i just wanted to point it out, and would appreciate
much if i could get some explanations, as i haven't found any on the web
(all the code i have found use either load/store or pointer with no
apparent difference, and none explains motivation of the choice). thanks
alot,
roberto
[1] as in ffmpeg-0.4.9-pre1/libavcodec/i386/
More information about the ffmpeg-devel
mailing list