Re: [linux-audio-dev] [i686] xmm regs + gcc inline assembly

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] [i686] xmm regs + gcc inline assembly
From: Simon Jenkins (sjenkins_AT_blueyonder.co.uk)
Date: Fri Feb 13 2004 - 23:12:53 EET


Tim Goetze wrote:

>Simon Jenkins wrote:
>
>>I can definitely get
>>
>> asm ("movaps %%xmm1 %0" : "=m" (t[0]));
>>
>>to exhibit the optimisation problem (the one I couldn't get your
>>original line to show) and then fix it again by removing the [0].
>>
>>[snip]
>>
>
>ok, here is a distilled test of how i allocate and use the
>instructions:
>
>int main (int argc, char ** argv)
>{
> char scratch [128 + 15];
> float f = 2.3;
>
> int s = (int) scratch;
> s &= 0xF;
> if (s)
> s = 16 - s;
> float * d = (float *) (((char *) scratch) + s);
> fprintf (stderr, "%p\n", d);
>
> asm ("movss %0, %%xmm0" : : "m" (f));
> asm ("shufps $0, %xmm0, %xmm0");
> asm ("movaps %%xmm0, %0" : "=m" (d[0]));
>
> printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
>}
>
>you'll agree that the program should print "2.30 2.30 2.30 2.30".
>it does if you use "=m" (d[0]). if you say "=m" (d), it doesn't.
>
>here's what the assembly block compiles to with "=m" (d):
>
>#APP
> movss -148(%ebp), %xmm0
> shufps $0, %xmm0, %xmm0
> movaps %xmm0, -156(%ebp)
>#NO_APP
>
>and here's with "=m" (d[0]):
>
>#APP
> movss -148(%ebp), %xmm0
> shufps $0, %xmm0, %xmm0
>#NO_APP
> movl -156(%ebp),%eax
>#APP
> movaps %xmm0, (%eax)
>#NO_APP
>
>so saying "=m" (d) causes xmm0 to be written to &d, not d, as
>intended. if &d isn't 128-bit aligned, it will segfault now.
>even if it is, that's not where we wanted the numbers from xmm0
>to go ...
>
The discrepency here is because you originally said you were trying to
get the data into a named array of floats:

    float t[4];

but it turns out you're actually trying to get them into some memory
to which you have a named pointer:

  float *d;

Now, there are a great many circumstances in which you could treat
such names interchangeably, but this isn't one of them.

The following code demonstrates

  asm ("movaps %%xmm0, %0" : "=m" (d));

working correctly if d is an aligned array of floats. Also, if
you change the d to d[0], it exhibits the optimization problem.

/* start */

float d[4] __attribute__ ((aligned(16))) = { 1.1f, 1.1f, 1.1f, 1.1f };

int main (int argc, char ** argv)
{
  float z = 1.1f;
  float f = 2.3f;

  z += 3.3f;

  asm ("movss %0, %%xmm0" : : "m" (f));
  asm ("shufps $0, %xmm0, %xmm0");
  asm ("movaps %%xmm0, %0" : "=m" (d));

  z += d[1];

  printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
  printf ("z is %.2f\n", z );
}
/* end */

We're expecting (and we get):

2.30 2.30 2.30 2.30
z is 6.70

but using d[0] instead of d we end up getting:

2.30 2.30 2.30 2.30
z is 5.50

Simon Jenkins
(Bristol, UK)


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri Feb 13 2004 - 22:10:34 EET