Abnormally slow execution of loops

[Old posts from the commercial version of ArrayFire] Discussion of ArrayFire using CUDA or OpenCL.

Moderator: pavanky

Abnormally slow execution of loops

Postby danielpengzuo » Tue Feb 18, 2014 1:19 pm

Hi everyone,

I am pretty new to ArrayFire and I have the following piece of code (nested loops), which did not speed up and even seemed to run slower than before:
Code: Select all
for (im=0; im<nm; im++)
    {
        for (iz=0; iz<nz; iz++)
        {   
            omega=zgrid(iz);
                       
            LIc=LIL(iz,im);
                       
            kzc = -(1 - eta) * pow(R, (a / (a - 1))) * (-0.1e1 + exp(-k * zL(iz, im)) + exp(-k * zL(iz, im)) * k * zL(iz, im)) / k + pow(R, (a / (a - 1))) * (0.1e1 + k * zL(iz, im)) * exp(-k * zL(iz, im)) / k;
         
            gzc = -(-0.1e1 + exp(-zL(iz, im) * k) + exp(-zL(iz, im) * k) * zL(iz, im) * k) / k + pow(R, 0.1e1 / (-1 + a)) * (0.1e1 + zL(iz, im) * k) * exp(-zL(iz, im) * k) / k;
         
            bc=chi * beta * ((kzc * (1 - a) * phi * lambda * exp(omega) * pc * pow(LIc / lambda / gzc, a) + eta * lambda  * Fzc * vLold(iz,im) + beta * hLold(iz,im)) / (beta * phi * pc + beta * vLold(iz,im) + beta * hLold(iz,im)) - 0.1e1);
         
            hzc = -(-0.1e1 + exp(-zL(iz, im) * k) + exp(-zL(iz, im) * k) * zL(iz, im) * k) / k + pow(R, a / (-1 + a)) * (0.1e1 + zL(iz, im) * k) * exp(-zL(iz, im) * k) / k;
                   
           iwc= lambda * pow(pow(LIc, (1 - bb)) * pow(yIL(iz,im), bb) * pow(uL(iz,im), bb), a) / pow(lambda, a) / pow(gzc, a) * hzc * exp(omega);

          gzc_arr(iz,im) = gzc;
          iwc_arr(iz,im) = iwc;
          bc_arr(iz,im) = bc;
          hzc_arr(iz, im) = hzc;
     
        }
    }   


The outer loop runs 500 times and the inner loop runs 400 times.
In those super long lines, some of the variables are of type array, all the others are of type double. I would really appreciate some explanation on why the code runs slowly and some optimization tips.

Thank you!

Daniel Zuo
danielpengzuo
 
Posts: 10
Joined: Mon Jan 27, 2014 10:43 pm

Re: Abnormally slow execution of loops

Postby pavanky » Tue Feb 18, 2014 1:54 pm

Hi Daniel,

You do not need for loops! All of the operations you are using are element wise operations. You can just do something like the following.

Code: Select all
array omega = tile(zgrid, 1, im);
array kzc = -(1 - eta) * pow(R, (a / (a - 1))) * (-0.1e1 + exp(-k * zL) + exp(-k * zL) * k * zL) / k + pow(R, (a / (a - 1))) * (0.1e1 + k * zL) * exp(-k * zL) / k;
array gzc = -(-0.1e1 + exp(-zL * k) + exp(-zL * k) * zL * k) / k + pow(R, 0.1e1 / (-1 + a)) * (0.1e1 + zL(iz, im) * k) * exp(-zL * k) / k;
array bc = chi * beta * ((kzc * (1 - a) * phi * lambda * exp(omega) * pc * pow(LIc / lambda / gzc, a) + eta * lambda  * Fzc * vLold + beta * hLold) / (beta * phi * pc + beta * vLold + beta * hLold) - 0.1e1);
array hzc = -(-0.1e1 + exp(-zL * k) + exp(-zL * k) * zL * k) / k + pow(R, a / (-1 + a)) * (0.1e1 + zL * k) * exp(-zL * k) / k;
array iwc = lambda * pow(pow(LIc, (1 - bb)) * pow(yIL, bb) * pow(uL, bb), a) / pow(lambda, a) / pow(gzc, a) * hzc * exp(omega);


For future reference
Code: Select all
for (int i = 0; i < n; i++)
  r(i) = a(i) * (i) + b(i) * b(i);


is the same as

Code: Select all
// This is faster
array r = a * a + b  *b ;


The second code snippet is faster because the operation is being done on all the elements of the array in parallel instead of in sequence like the for loop.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Abnormally slow execution of loops

Postby danielpengzuo » Tue Feb 18, 2014 6:02 pm

Thanks a lot. This is really helpful.

If I need to manipulate single element of the array on different conditions, I assume loops would be required?

Code: Select all
        for (iz=1; iz<nz-1; iz++)
        {
         //#pragma omp parallel for  private(omega,muc,gzc,hzc,ghzc,kzc,pc,jc0,LIc,yIc,uc,LCc,iwc,hJc,deltaL,zJ,pp2,pm2,pp1,pm1,p1J,vp2,vm2,vp1,vm1,v1J,hp2,hm2,hp1,hm1,h1J,jp2,jm2,jp1,jm1,j1J,Fzc,bc,it,niter,gxL,gxH,dx,xL,xH,dL,rtn,Nc)
         gfor(array im, 1, nm-1) //parrellel for-loop
            {
                omega = zgrid(iz);
                pc = pLold(iz, im);
            uc = uL(iz, im);
            jc0 = jLold(iz, im);
            Nc = (1 - LIL(iz,im) - ((1-LIL(iz, im)) *  (1 - phi) / (psi + 1 - phi)));

                gzc = (-(-0.1e1 + exp(-zL(iz,im) * k) + exp(-zL(iz, im) * k) * zL(iz, im) * k) / k + pow(R, 0.1e1 / (double) (-1 + a)) * (0.1e1 + zL(iz, im) * k) * exp(-zL(iz, im) * k) / k);
               
            gxL = TolX;
                gxH = 0.6-TolX;
                niter = 0;
                dx = (gxH-gxL)/10;
                xL = gxL;
                xH = gxH;

                for (it=1; it<11; it++)
            {
               gxL_itr = gxL+dx*it;
               gxL_itr_1 = gxL+dx*(it+1);
                    tmpL=FFOCLI(pc, uc, jc0, Nc, gxL_itr,  omega,  gzc,  phi,  a,  lambda,  psi, bb, gam, theta); 
                    tmpH=FFOCLI(pc, uc, jc0, Nc, gxL_itr_1,  omega,  gzc,  phi,  a,  lambda,  psi, bb, gam, theta); 
                   
                    if(tmpL<0 && tmpH>0)
                    {
                        xL=gxL+dx* it;
                        xH=gxL+dx*(it+1);
                        LIL(iz, im) = (xH+xL)/2;
                    }
                }
               
                xxL(iz, im) = xL;
                xxH(iz, im) = xH;
               
                do
                {
                    niter++;
                    dL = -FFOCLI(pc, uc,  jc0,  Nc, LIL(iz, im),  omega,  gzc,  phi,  a,  lambda,  psi, bb, gam, theta)/FFOCLIp(pc, uc, jc0,  Nc, LIL(iz, im),  omega,  gzc,  phi,  a,  lambda,  psi, bb, gam, theta);
                    rtn = LIL(iz, im).scalar<double>() + dL;
                    if(rtn<=xL)
                    {
                        LIL(iz, im)=xL;
                    }
                    else if(rtn>=xH)
                    {
                        LIL(iz, im)=xH;
                    }
                    else
                    {
                        LIL(iz, im)=rtn;
                    }
                    FFTMP = FFOCLI(pc, uc,  jc0,  Nc, LIL(iz, im),  omega,  gzc,  phi,  a,  lambda,  psi, bb, gam, theta);
                    if(fabs(FFTMP)<TolX||fabs(dL)<TolX)
                    {
                        niter=103;
                    }
                }
                while (niter < 100);

                FOCLIL(iz, im) = FFOCLI(pc, uc,  jc0,  Nc, LIL(iz, im),  omega,  gzc,  phi,  a,  lambda,  psi, bb, gam, theta);
         
            
            yIL(iz, im) = LIL(iz, im) * bb * (psi + 1 - phi) / phi / (1 - bb) / (1 - LIL(iz, im) + LIL(iz, im) * bb * (psi + 1 - phi) / phi / (1 - bb));
            if(bb==0){
               yIL(iz, im) = 0;
            }
               
            }
        }


I don't quite see a way transforming above code because the IFs. Any advice?
Thanks a lot!

Daniel
danielpengzuo
 
Posts: 10
Joined: Mon Jan 27, 2014 10:43 pm

Re: Abnormally slow execution of loops

Postby pavanky » Thu Feb 20, 2014 6:35 pm

Hi Daniel,

This is too much code to debug for us. However it does not look like you changed the code to incorporate the suggestions we made.

As for the while loop, on the GPU it is better to do extra calculations on the entire array rather than creating convergence.

For example instead of doing a do while loop, you can do a for loop that alway loops for 100 iterations. From the code you pasted it does not look like the algorithm will break if you do more operations than necessary. On the CPU it is a performance optimization, but on the GPU this is a hindrance.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA


Return to [archive-commercial] Programming & Development with ArrayFire

cron