Below i have written parallel codes for Matrix Multiplication(C = AxB), i used OpenCL and OpenMP for this. Though at this stage i am beginner in OpenMP, therefore OpenMP code is bit trivial. Beside this i have tabulated their respective speed ups on three different machines. Serial code is same as OpenMP code except #pragma line. The order of matrix A and B is 1000x1000 each initialized to 1.
// OpenCL Matrix Multiplication kernel
__kernel void matrixMultiply(__global float* A,\
__global float* B,\
__global float* C,\
const uint Ndim,\
const uint Mdim,\
const uint Pdim)
{
int idx = get_global_id(0);
float A_private[1000];
float tmp;
if(idx < Ndim)
{
for(int k=0; k<Pdim; k++)
A_private[k] = A[idx*Pdim + k];
for(int j=0; j<Mdim ;j++)
{
tmp = 0.0;
for(int k=0; k<Pdim; k++)
{
tmp += A_private[k]*B[k*Mdim + j];
}
C[idx*Mdim + j] = tmp;
}
}
}
// OpenMP Matrix Multiplication
#pragma omp parallel for shared(A, B, C, ROWA, COLA, COLB)
for(int i = 0; i < ROWA; i++)
{
for(int j = 0; j < COLB; j++)
{
C[j + i*COLB] = 0;
for(int k = 0; k < COLA; k++)
{
C[j + i*COLB] += A[i*COLA + k]*B[j + COLB*k]; // dot-product
}
}
}
Processor | Serial | OpenMP | OpenCL |
---|---|---|---|
Intel Xeon E5 Max Threads – 32 | 5.93 sec | .750 sec | .258 sec |
Intel Core i7 Max Threads – 8 | 9.84 sec | 2.49 sec | 1.90 sec |
Intel Atom N550 Max Threads – 4 | 87.06 sec | 34.237 sec | 11.34 sec |
No comments:
Post a Comment