I want to use the AltiVec function for performing a scalar product (dot product) of two vectors declared:
extern float vSdot(
SInt32 n,
const vector float x[],
const vector float y[]);
I am trying test the dot product function with a data type is declared as:
typedef struct {
float real;
float imag;
float dummy[6];
} imaginary;
/**********************************************'/
unsigned int avec_iterate2(imaginary *initVal, imaginary *constant, float escapeMagSquared, unsigned int escapeIterations)
/**********************************************'/
{
imaginary val, val1, val2;
vFloat *pVal1, *pVal2;
float tempReal;
imaginary temp1, temp2;
pVal1 = (vFloat *)&val1; pVal2 = (vFloat *)&val2;
val1.real = (float)100.0; val1.imag = (float)100.0; val1.dummy[0] = (float)100.0; val1.dummy[1] = (float)100.0; val1.dummy[2] = (float)0.0; val1.dummy[3] = (float)0.0; val1.dummy[4] = (float)0.0; val1.dummy[5] = (float)0.0;
val2.real = (float)1.0; val2.imag = (float)1.0; val2.dummy[0] = (float)1.0; val2.dummy[1] = (float)1.0; val2.dummy[2] = (float)0.0; val2.dummy[3] = (float)0.0; val2.dummy[4] = (float)0.0; val2.dummy[5] = (float)0.0;
temp2.real = vSdot(4, pVal1, pVal2); // Why is the result stored here 300? // temp2.imag = 2tempRealval.imag + constant->imag ;
return(temp2.real); } </code>
Thanks for any assistance.
WayneContello
You are running into trouble because your input data must be 16-byte aligned, but is not guaranteed to be the way it is written (casting your struct into a vFloat). Try something like:
typedef union {
vFloat vectors[2];
struct {
float real;
float imag;
float dummy[6];
} elements;
} imaginary;
–LN
The issue is that all Altivec functions require 16-byte alignment, which is not guaranteed when creating a struct in the manner that you do. However, if you use the union type that I declared above, GCC is smart enough to figure out that you are going to be using vectors, and will byte-align for you. Alternately, you can use malloc() get correctly byte-aligned allocations. Here’s a quick test which shows the correct result, 400.0. To prove this is working, just compile it like: gcc test.m -faltivec -framework Cocoa -framework Accelerate
#include <Cocoa/Cocoa.h> #include <Accelerate/Accelerate.h>
typedef union { vFloat vectors[2]; struct { float real; float imag; float dummy[6]; } elements; } imaginary;
int main() { imaginary x; imaginary y;
x.elements.real = (float)100.0;
x.elements.imag = (float)100.0;
x.elements.dummy[0] = (float)100.0;
x.elements.dummy[1] = (float)100.0;
x.elements.dummy[2] = (float)0.0;
x.elements.dummy[3] = (float)0.0;
x.elements.dummy[4] = (float)0.0;
x.elements.dummy[5] = (float)0.0;
y.elements.real = (float)1.0;
y.elements.imag = (float)1.0;
y.elements.dummy[0] = (float)1.0;
y.elements.dummy[1] = (float)1.0;
y.elements.dummy[2] = (float)0.0;
y.elements.dummy[3] = (float)0.0;
y.elements.dummy[4] = (float)0.0;
y.elements.dummy[5] = (float)0.0;
printf("%f\n", vSdot(4, x.vectors, y.vectors));
return 0; }
–LN
for more discussion see AltivecOptimization in which this point is mentioned peripherally
I assume you are planning on using Altivec for multiplying sequences of complex numbers, because otherwise the latency of getting stuff into + out of the Altivec unit is going to totally negate any benefit over just using x.elements.real*y.elements.real + x.elements.imag*y.elements.imag
- not to mention the horrors of maintenance you’re invoking :)
A quick suggestion that may prove useful when you move from this toy code to something that really uses the power of AltiVec: if you’re not sure whether your input arrays are aligned or not, just do the start and end elements the non-AltiVec way and only start AltiVec up at the first 16-byte-aligned position.