Re: [linux-audio-dev] [i686] xmm regs + gcc inline assembly

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] [i686] xmm regs + gcc inline assembly
From: Simon Jenkins (sjenkins_AT_blueyonder.co.uk)
Date: Fri Feb 13 2004 - 02:51:41 EET


Tim Goetze wrote:

>Simon Jenkins wrote:
>
>
>>>for a simplified example, i'm using
>>>
>>> float t[4];
>>> ...
>>> asm ("movaps %%xmm1, %0" : : "m" (t[0]));
>>>
>>>to move 4 packed floats from xmm1 into 't'.
>>>
>>>
>>I couldn't get this to fail in practice - though I didn't
>>try all that hard - unless t isn't on a 16 byte boundary
>>in which case it segfaults.
>>
>
>it failed here just a minute ago, with g++ -O6. not a segfault, but
>gcc seemed to think that some members of t are zero and omitted them
>from the final summation in my code (r = t[0] + t[1] + t[2] + t[3]).
>
>
>>In theory however your code is telling the compiler that
>>array element t[0] is in memory from which the instruction
>>reads. It should be more like:
>>
>> asm ("movaps %%xmm1 %0" : "=m" (t) );
>>
>>which now tells the compiler that the entire array t
>>is in memory to which the instruction writes. This
>>*ought* to discourage the optimiser from doing
>>anything too drastic. (Maybe/AFAIK/IANAL/etc).
>>
>
>you're right of course, 't' should be an input, not an output.
>however,
>
> asm ("movaps %%xmm1 %0" : "=m" (t));
>
>segfaults, but
>
> asm ("movaps %%xmm1 %0" : "=m" (t[0]));
>
>works. think i'll have to resort to 128 bit wide data types, a
>simple cast should do. all this gcc inline asm stuff is ugly anyway,
>and what's another cast among friends.
>
I can definitely get

    asm ("movaps %%xmm1 %0" : "=m" (t[0]));

to exhibit the optimisation problem (the one I couldn't get your
original line to show) and then fix it again by removing the [0].

I was getting a segfault on about 50% of compiles, as I modified
the code, because the array was being aligned to 8 byte boundaries
but not to 16 bytes. Declaring it as

float t[4] __attribute__ ((aligned(16)));

got rid of those. Note though that this attribute doesn't work for
automatic variables.

Simon Jenkins
(Bristol, UK)


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri Feb 13 2004 - 01:48:12 EET