CUDA device ptr and Array wrapper

[Old posts from the commercial version of ArrayFire] Discussion of ArrayFire using CUDA or OpenCL.

Moderator: pavanky

CUDA device ptr and Array wrapper

Postby GBenito » Sun Nov 10, 2013 4:12 pm

Hello,

I'm currently using CUDA and Arrayfire on a work for college involving image processing. My question is as how to wrapp a device memory pointer that points towards a CudaMallocPitch-type of memory space in which i saved the results for a filter kernel. This data is float type of 32 rows by 32 columns.
I've tried to wrap the pointer into an array with:

-af::array CKernelR = af::array(32,32, d_ComplexKernelR, af::afDevice);

d_ComplexKernelR = CKernelR.device<float>();

But wasn't able to succesfully convert the device memory to an array, data was not the same....

How could i put this data into an array so i can go on using arrayfire on it?


Here is my cuda kernel that generates the filter:

__global__ void kernel_matrixReal (float *gaborR, float *gaborI, int pitchR, int pitchI, double pi, float lambda, float gama, float theta, float sigma,
int width, int height){

int idx = blockIdx.x * blockDim.x + threadIdx.x;
int x=idx - height/2;
int idy = blockIdx.y * blockDim.y + threadIdx.y;
int y=-(idy - width/2);
float menos05 = -0.5;

double theta_dev = theta;
double x_theta=x*cos(theta_dev)+y*sin(theta_dev);
double y_theta=-x*sin(theta_dev)+y*cos(theta_dev);

double pot_sigma=pow(sigma,2);

double pot_x=pow(x_theta, 2);
double div_pot_x=pot_x / pot_sigma;

double pot_y=pow(y_theta, 2);
double div_pot_y=pot_y / pot_sigma;


gaborR[idy*pitchR + idx] = (float) (1/(sqrt(pi)*sigma)) * exp(menos05 *( div_pot_x + gama *div_pot_y))*cos(2*pi/lambda*x_theta);
gaborI[idy*pitchI + idx] = (float) (1/(sqrt(pi)*sigma)) * exp(menos05 *( div_pot_x + gama *div_pot_y))*sin(2*pi/lambda*x_theta);

}
GBenito
 
Posts: 32
Joined: Sun Nov 10, 2013 3:58 pm

Re: CUDA device ptr and Array wrapper

Postby GBenito » Wed Nov 13, 2013 6:30 pm

If it's too far fetched, then how would it be to make the same in arrayfire as i'm currently doing in my kernel....

Obviously, i'm working on a Gabor filter...

Thanks for any ideas
GBenito
 
Posts: 32
Joined: Sun Nov 10, 2013 3:58 pm

Re: CUDA device ptr and Array wrapper

Postby shehzan » Tue Nov 19, 2013 3:36 pm

Hi

If I am understanding your problem correctly, you are trying to copy data from other than global memory to an ArrayFire array and then retrieve the device pointer of the array.
The following answer is based on the assumption mentioned.
ArrayFire allocated memory in global memory space. It assumes that any pointer passed using the constructor is a global memory pointer if af::afDevice is used. Hence you cannot copy memory from anywhere other than global memory (or host memory using afHost).
What you could do is explicitly copy the memory using CUDA commands into global memory space and then assign it to an array, following which you can retrieve the device pointer for the array.

Also, all the functions you have in your kernel are available in ArrayFire. You can take a look at the ArrayFire documentation and convert your kernel into ArrayFire code.

I hope this helps. Please do let me know if I misunderstood the problem.
----
Shehzan
Developer
AccelerEyes
User avatar
shehzan
 
Posts: 121
Joined: Tue Feb 12, 2013 7:20 pm

Re: CUDA device ptr and Array wrapper

Postby GBenito » Tue Nov 19, 2013 9:25 pm

Hi Shezan, first of all thanks for your reply!

My idea was to use the memory space created with CudaMallocPitch within my kernel (so i thought i was operating on device global memory), and wrap that memory device pointer in an array so i could have it already started with my filter values.

Once this first aproach failed, I tried to create an array with a device pointer associated to it, and after that, use the device pointer in my kernel to fill the memory with my filter kernel elements. I had created the memory space with af::alloc.

Lastly, I tried something like what you said.

Due to my lack of experience programming CUDA code and Arrayfire, I might be as you say using other than device memory (which is not what I intend).

Could you tell me how to perform that explicit memory copy into globlal memory space and the following array assignation, with the array device pointer retrieval?
I ask you this in order to make sure I don't make any mistakes in the code this time.

I'd also be interested in the pure Arrayfire approach. I'm a quite clueless with Arrayfire but I guess I should be making something like a meshgrid to apply the formula to each point of the filter kernel...

I must add, I'm studying Electronic Engineering, so that's why i work on this, but don't have much programming skills... :oops:
GBenito
 
Posts: 32
Joined: Sun Nov 10, 2013 3:58 pm

Re: CUDA device ptr and Array wrapper

Postby shehzan » Wed Nov 20, 2013 11:34 am

Hi,

If you are using ArrayFire, then I would suggest that you create a gabor filter function completely using ArrayFire. This would probably be the best approach to take if you are not well-versed with CUDA.

I will show you a small example of how you could use ArrayFire. You can read more about the functions in our documentation page http://www.accelereyes.com/arrayfire/c/.

This is sample CPU code:
Code: Select all
for(i.....) {
    for(j....) {
        x_theta[i][j]=x[i][j]*cos(theta)+y[i][j]*sin(theta);
        y_theta[i][j]=-x[i][j]*sin(theta)+y[i][j]*cos(theta);
    }
}


This would map to ArrayFire code as:
Code: Select all
//array x, y;
//Do work
array x_theta = x * cos(theta) + y * sin(theta);
array y_theta =-x * sin(theta) + y * cos(theta);


In the ArrayFire code above, cos(theta) and sin(theta) return float values. All elements x and y are multiplied by the values returned by the sin and cos functions. These operations take place completely on the GPU.

Similarly, you can apply a pow function as:
Code: Select all
array pot_x = pow(x_theta, 2);    //where x_theta is a array.

This will square all values in the array x_theta and store them in pot_x (again on gpu).

So I suggest that you use ArrayFire functions. Your kernel looks fairly straight forward to map to ArrayFire code.
----
Shehzan
Developer
AccelerEyes
User avatar
shehzan
 
Posts: 121
Joined: Tue Feb 12, 2013 7:20 pm

Re: CUDA device ptr and Array wrapper

Postby GBenito » Wed Nov 20, 2013 11:52 am

Hi

Thanks for the tip Shezan! Now i get a grasp on how to turn my kernel to Arrayfire. Of course, i should use some offset with x and y values in order to make it centered in the middle of the matrix, but this will get me a start point for that.

I'll see how it goes...
GBenito
 
Posts: 32
Joined: Sun Nov 10, 2013 3:58 pm

Re: CUDA device ptr and Array wrapper

Postby shehzan » Wed Nov 20, 2013 11:55 am

Sounds good. You can use offsets using the indexing functions http://www.accelereyes.com/arrayfire/c/group__indexing.htm.

Let me know how it goes.
Down the line, may be we can look at adding Gabor filters into ArrayFire.

Shehzan
----
Shehzan
Developer
AccelerEyes
User avatar
shehzan
 
Posts: 121
Joined: Tue Feb 12, 2013 7:20 pm

Re: CUDA device ptr and Array wrapper

Postby GBenito » Fri Nov 22, 2013 6:32 pm

Hi,

I've been converting my kernel to arrayfire as suggested, still there's something I don't quite get:

I create 2 arrays x, y using seq() command as follows:

af::array x(af::seq(32)-16), y(af::seq(32)-16);

So these two arrays are of dimension 32 by 1.

You said that I could map part of my kernel to:

af::array x_theta = x * cos(theta) + y * sin(theta);
af::array y_theta = -x * sin(theta) + y * cos(theta);


where all elements of x and y would be multiplied by cos() and sin() respectively. Still, the resulting arrays x_theta and y_theta are of dimensions 32 by 1 and not 32 by 32, and I know it is so entirely due to how i set up x and y. Of course, x and y are coordinates in kernel's space meaning row and column index.
I believe my mistake was to think that the instructions below would result in each component in x being combined with every component in y, generating 32 by 32 result elements.

How could i make this?:

array x:
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5

array y:
-6 -6 -6 -6 -6 -6 -6 -6 -6 -6 -6 -6
-5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4
-3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3
-2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5
GBenito
 
Posts: 32
Joined: Sun Nov 10, 2013 3:58 pm

Re: CUDA device ptr and Array wrapper

Postby pavanky » Fri Nov 22, 2013 6:44 pm

Hi,

You can do something like this:

Code: Select all
    array x = seq(0, nx + 1);
    array y = seq(0, ny + 1);
    array X = tile(x.T(), y.elements(), 1);
    array Y = tile(y    , 1, x.elements());
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: CUDA device ptr and Array wrapper

Postby shehzan » Fri Nov 22, 2013 6:47 pm

I am not completely sure I understand it completely.
But from the array x and array y your showed, I would suggest you look up two functions: join and tile. Both can be found here: http://www.accelereyes.com/arrayfire/c/ ... p__mat.htm.

Also, all arithmetic operations are elements-wise.
So if you had array x + array y, they would be similar to
for all i
x[i] + y[i] (representative code).

Please also look at the examples and try playing around with them.
----
Shehzan
Developer
AccelerEyes
User avatar
shehzan
 
Posts: 121
Joined: Tue Feb 12, 2013 7:20 pm

Re: CUDA device ptr and Array wrapper

Postby GBenito » Mon Nov 25, 2013 6:21 pm

Thanks for answering, i'm currently following your tips. Using tile has made the trick, so i'll keep going from here
GBenito
 
Posts: 32
Joined: Sun Nov 10, 2013 3:58 pm

Re: CUDA device ptr and Array wrapper

Postby GBenito » Mon Dec 02, 2013 7:28 pm

Well, I could successfully generate my gabor filter kernel entirely on Arrayfire, so as a closure to this topic, i'd like to thank you guys for your help and patience...

Still have new questions, but those are on another topic so i'll open the corresponding post.
GBenito
 
Posts: 32
Joined: Sun Nov 10, 2013 3:58 pm


Return to [archive-commercial] Programming & Development with ArrayFire

cron