Re: [linux-audio-dev] [i686] xmm regs + gcc inline assembly

New Message Reply About this list Date view Thread view Subject view Author view Other groups

Subject: Re: [linux-audio-dev] [i686] xmm regs + gcc inline assembly
From: Tim Goetze (tim_AT_quitte.de)
Date: Fri Feb 13 2004 - 02:38:25 EET


Simon Jenkins wrote:

>I can definitely get
>
> asm ("movaps %%xmm1 %0" : "=m" (t[0]));
>
>to exhibit the optimisation problem (the one I couldn't get your
>original line to show) and then fix it again by removing the [0].
>
>I was getting a segfault on about 50% of compiles, as I modified
>the code, because the array was being aligned to 8 byte boundaries
>but not to 16 bytes. Declaring it as
>
>float t[4] __attribute__ ((aligned(16)));
>
>got rid of those. Note though that this attribute doesn't work for
>automatic variables.

ok, here is a distilled test of how i allocate and use the
instructions:

int main (int argc, char ** argv)
{
  char scratch [128 + 15];
  float f = 2.3;

  int s = (int) scratch;
  s &= 0xF;
  if (s)
    s = 16 - s;
  float * d = (float *) (((char *) scratch) + s);
  fprintf (stderr, "%p\n", d);

  asm ("movss %0, %%xmm0" : : "m" (f));
  asm ("shufps $0, %xmm0, %xmm0");
  asm ("movaps %%xmm0, %0" : "=m" (d[0]));

  printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
}

you'll agree that the program should print "2.30 2.30 2.30 2.30".
it does if you use "=m" (d[0]). if you say "=m" (d), it doesn't.

here's what the assembly block compiles to with "=m" (d):

#APP
  movss -148(%ebp), %xmm0
  shufps $0, %xmm0, %xmm0
  movaps %xmm0, -156(%ebp)
#NO_APP

and here's with "=m" (d[0]):

#APP
  movss -148(%ebp), %xmm0
  shufps $0, %xmm0, %xmm0
#NO_APP
  movl -156(%ebp),%eax
#APP
  movaps %xmm0, (%eax)
#NO_APP

so saying "=m" (d) causes xmm0 to be written to &d, not d, as
intended. if &d isn't 128-bit aligned, it will segfault now.
even if it is, that's not where we wanted the numbers from xmm0
to go ...

tim


New Message Reply About this list Date view Thread view Subject view Author view Other groups

This archive was generated by hypermail 2b28 : Fri Feb 13 2004 - 02:44:52 EET