Tuesday, 26 February 2013

Matrix Multiplication serial and parallel codes(OpenCL and OpenMP) and their respective speed ups

Below i have written parallel codes for Matrix Multiplication(C = AxB), i used OpenCL and OpenMP for this. Though at this stage i am beginner in OpenMP, therefore OpenMP code is bit trivial. Beside this i have tabulated their respective speed ups on three different machines. Serial code is same as OpenMP code except #pragma  line. The order of matrix A and B is 1000x1000 each initialized to 1.
 // OpenCL Matrix Multiplication kernel
 __kernel void matrixMultiply(__global float* A,\ 
                __global float* B,\ 
                __global float* C,\ 
                const uint Ndim,\ 
                const uint Mdim,\ 
                const uint Pdim) 
 { 
  int idx = get_global_id(0); 
  float A_private[1000]; 
  float tmp; 
  if(idx < Ndim) 
  { 
   for(int k=0; k<Pdim; k++) 
    A_private[k] = A[idx*Pdim + k]; 
   for(int j=0; j<Mdim ;j++) 
   { 
    tmp = 0.0;  
    for(int k=0; k<Pdim; k++) 
    { 
     tmp += A_private[k]*B[k*Mdim + j]; 
    } 
    C[idx*Mdim + j] = tmp; 
   } 
  } 
 } 

 // OpenMP Matrix Multiplication   
 #pragma omp parallel for shared(A, B, C, ROWA, COLA, COLB)  
  for(int i = 0; i < ROWA; i++)  
  {  
   for(int j = 0; j < COLB; j++)  
   {  
    C[j + i*COLB] = 0;  
    for(int k = 0; k < COLA; k++)  
    {  
     C[j + i*COLB] += A[i*COLA + k]*B[j + COLB*k];  // dot-product  
    }  
   }  
  }  

ProcessorSerialOpenMPOpenCL
Intel Xeon E5 Max Threads – 325.93 sec.750 sec.258 sec
Intel Core i7 Max Threads – 89.84 sec2.49 sec1.90 sec
Intel Atom N550 Max Threads – 487.06 sec34.237 sec11.34 sec

No comments:

Post a Comment