Slow GFOR

[Old posts from the commercial version of ArrayFire] Discussion of ArrayFire using CUDA or OpenCL.

Moderator: pavanky

Slow GFOR

Postby DThoris » Fri Mar 14, 2014 2:45 pm

Hi,

thanks for replying to my previous posts.
I have a question regarding GFOR.
I am trying to reduce the computational time of my code, and hence, I am using parallel computations. However, as I increase the number of parallel computations the computational time also increases.
Here is an example that should help you understand what I mean (the code doesn't really do anything interesting):

int x = 512;
int y = 512;
int z = 64;
int z2 = 256;
array A = randu(x,y,z);
array A2 = randu(x,y,z);
array A3 = randu(x,y,z);
array A4 = randu(x,y,z);
array B = randu(x,y,z2);
array B2 = randu(x,y,z2);
array B3 = randu(x,y,z2);
array B4 = randu(x,y,z2);


timer::start();
gfor(array k, z){
A2(span,span,k) = A(span,span,k)*A(span,span,k) ;
A3(span,span,k) = A(span,span,k)*A(span,span,k) ;
A4(span,span,k) = A(span,span,k)*A(span,span,k) ;
}
cout << "time 1. z = 64 gpu timer = "<< timer::stop() << ' s' <<endl;

timer::start();
for (int k = 0; k< z; k++){
A2(span,span,k) = A(span,span,k)*A(span,span,k) ;
A3(span,span,k) = A(span,span,k)*A(span,span,k) ;
A4(span,span,k) = A(span,span,k)*A(span,span,k) ;
}
cout << "time 2. z = 64 cpu timer = "<< timer::stop() << ' s' << endl;

timer::start();
gfor(array k, z2){
B2(span,span,k) = B(span,span,k)*B(span,span,k) ;
B3(span,span,k) = B(span,span,k)*B(span,span,k) ;
B4(span,span,k) = B(span,span,k)*B(span,span,k) ;
}
cout << "time 3. z = 256 gpu timer = "<< timer::stop() << ' s' <<endl;

timer::start();
for (int k = 0; k< z2; k++){
B2(span,span,k) = B(span,span,k)*B(span,span,k) ;
B3(span,span,k) = B(span,span,k)*B(span,span,k) ;
B4(span,span,k) = B(span,span,k)*B(span,span,k) ;
}
cout << "time 4. z = 256 cpu timer = "<< timer::stop() << ' s' << endl;


OUTPUT:
time 1. z = 64 gpu = 0.0619928307
time 2. z = 64 cpu = 0.0680218307
time 3. z = 256 gpu = 0.1168248307
time 4. z = 256 cpu = 0.2153428307

My questions are:
1. Why is time 3 much longer than time 1?
2. Shouldn't we expect to see a bigger time difference between time 1 and time 2 (and even time 3 and 4.)?
3. I would except time 1 to be at least of the order of 1e-3. Am I wrong?

I would appreciate if you could give me any advice on how to speed up my code.
Thanks a lot for your help!
DThoris
 
Posts: 6
Joined: Thu Feb 20, 2014 7:42 am

Re: Slow GFOR

Postby pavanky » Fri Mar 14, 2014 2:49 pm

You should do af::eval(A1, A2, A3); af:sync() before timer::stop to get more accurate results.

Also run these in a for loop for multiple times to accurate better results.

The fastest way is neither for nor gfor.

You need to use A1 = A * A; that does multiplication as needed. You don't HAVE to use gfor. It should only be used if and when necessary.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Slow GFOR

Postby DThoris » Sun Mar 16, 2014 10:31 am

Thanks for your reply.
This was just an example, I can email you the code if necessary.
I need to use a for or gfor loop because my matrix A is being calculated within the loop and changes with k. I still don't understand why the computational time increases with the number of iterations when gfor is used. I expect that for a for loop but not when using parallel computations.
Thanks again.
DThoris
 
Posts: 6
Joined: Thu Feb 20, 2014 7:42 am

Re: Slow GFOR

Postby pavanky » Sun Mar 16, 2014 4:42 pm

The small sized functions (like element wise operations) are memory bound. Most of the time taken is for reading the data from global memory into registers for calculations. When the size of the matrices increase, the time taken to read the data also increases.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Slow GFOR

Postby DThoris » Mon Mar 17, 2014 10:56 am

Thanks for your reply.
Is there any way around it? For example, can I choose the number of blocks being used?
DThoris
 
Posts: 6
Joined: Thu Feb 20, 2014 7:42 am

Re: Slow GFOR

Postby pavanky » Mon Mar 17, 2014 11:01 am

This is a hardware limitation. If you can show us the code may be we can help optimize to do more work per global memory read. But for the code you showed here there is no alternative.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Slow GFOR

Postby pavanky » Mon Mar 17, 2014 11:01 am

By the way are you using CUDA or OpenCL ?
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Slow GFOR

Postby DThoris » Mon Mar 17, 2014 12:15 pm

I'm using CUDA.
I'm implementing a filtered back projection algorithm. Could I send it to you via e-mail?

Thanks for your help !
DThoris
 
Posts: 6
Joined: Thu Feb 20, 2014 7:42 am

Re: Slow GFOR

Postby pavanky » Mon Mar 17, 2014 12:40 pm

Hi,

you can email support@accelereyes.com. Please include the link to this forum post.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA


Return to [archive-commercial] Programming & Development with ArrayFire