Example Doesn't Compile

[Old posts from the commercial version of ArrayFire] Discussion of ArrayFire using CUDA or OpenCL.

Moderator: pavanky

Example Doesn't Compile

Postby shaklee3 » Sun Dec 15, 2013 8:51 pm

You have an example for convolution here:

http://www.accelereyes.com/docs_alpha/g ... tarted.htm

Code: Select all
int main(void)
{
    // generate random values
    int m = 10000; // Length of Signal
    int n = 500;   // Length of Kernel
    int k = 10;    // Number of Signals to convolve
    float *d_signal, *d_filter, *d_result;
    // Generate 'k' random signals each of length m
    af_randu_S(&d_signal, m * k);
    // Generate one random kernel to convolve the signals with
    af_randu_S(&d_filter, n * 1);
    // Allocate space for result
    af_malloc(&d_result, (m + n - 1) * k * sizeof(float));
    // Perform the convolutions
    af_conv_SS(d_result,       // output
               m, d_signal, k, // Signal size, pointer, number of signals
               n, d_filter, 1, // Kernel size, pointer, number of kernels
               1);             // (FULL: 1, SAME: 0, VALID: -1)
    return 0;
}


This doesn't compile because ad_randu_S takes a single pointer as the first argument. I'm assuming that it doesn't do the allocation for those variables, so I don't know how this example would have worked. Can you post a working example of convolution using these functions instead of the convolution() wrapper? Thanks
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 9:38 am

HI,

You are right about the reasons the example is not compiling. This is leftover from earlier version of ArrayFire and we haven't updated the document to reflect that. We will be fixing as soon as possible.

For allocating memory you can use cudaMalloc.

Also, please use the following for documentation instead of the alpha version you linked to.
http://www.accelereyes.com/arrayfire/c/ ... tarted.htm
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 9:39 am

Also, can you tell us why you are using the C interface and not the C++ interface ?
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby shaklee3 » Mon Dec 16, 2013 11:48 am

I was using the C interface because I already have an array of cfloats in memory. I couldn't find any examples of how to populate an "array" with complex numbers as I'm reading them in from a file.
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 11:52 am

You can create arrays using cfloats in the following manner.

Code: Select all
cfloat *vals;

// if vals is in the host memory
array V(rows, cols, vals);

// if vals is in device memory
array V(rows, cols, vals, afDevice);
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby shaklee3 » Mon Dec 16, 2013 1:21 pm

I'll give that a shot. I noticed print doesn't seem to work either:

cc -m64 -Wall -Werror -I/opt/arrayfire-2.0/include -I/usr/local/cuda-5.5/targets/x86_64-linux/include -pthread -O2 -DNDEBUG -DAFCL -lrt -Wl,--no-as-needed -L/opt/arrayfire-2.0/lib64 -lpthread -lstdc++ -lm -Wl,-rpath,../../lib64,-rpath,/opt/arrayfire-2.0/lib64 -lafcl test.cpp -o test
test.cpp: In function âint main(int, char**)â:
test.cpp:70:17: error: âprintâ was not declared in this scope


Am I missing something I need? arrayfire.h is included.
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 1:24 pm

We moved print to a stand alone header file so that it does not clash with other implementations when "arrayfire.h" is included.

To use print now, you will need to include <af/utils.h>

For the complete list of changes, please refer to the release notes available here: http://www.accelereyes.com/arrayfire/c/releasenotes.htm
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby shaklee3 » Mon Dec 16, 2013 4:16 pm

Ok, I got printing working, but now a convolution of a 10x1 with a 2048x1 throws an exception saying convolutions of that size are not supported. Maybe it's easier if I give you the application: I have 5 filters each of length 9 that I need to convolve with a vector over 2000 in length. What's the fastest way to do this?
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 4:28 pm

Hi,

Are you trying to use the OpenCL version or the CUDA version ? The limitation currently exists in the OpenCL version, but the CUDA version of arrayfire should run just fine.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby shaklee3 » Mon Dec 16, 2013 4:38 pm

I didn't realize there were two different versions. I'm compiling with the cuda switches now and it works, albeit slower than I was hoping (n=2048 filter=10 takes 230ms). Thanks
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 5:10 pm

Hi Shaklee,

Are you using filter or convolve ?
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby shaklee3 » Mon Dec 16, 2013 5:14 pm

I was using fir, but convolve is even longer at 256ms. Here is the code:

Code: Select all
   
    array a = randu(10);
    array b = randu(2048);

    timer::start();

    try {
        convolve(a,b);
    } catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
    }
    printf("Convolution time = %g seconds\n", timer::stop());
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 5:29 pm

Hi Shaklee,

You are seeing the time taken for warm up. The time taken for subsequent runs after the first one decreases dramatically.

You will also need to do af::sync() to get the right timings. I am including the modified code below.

Code: Select all
int main(int argc, char **argv)
{
    try {

        array a = randu(10);
        array b = randu(2048);
        af::sync();

        for (int ii = 0; ii < 10; ii++) {
            timer::start();
            try {
                convolve(a,b);
            } catch (af::exception& e) {
                fprintf(stderr, "%s\n", e.what());
            }
            af::sync();
            printf("Convolution time = %g seconds\n", timer::stop());
        }

    } catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
    }

    return 0;
}
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 5:30 pm

Including the benchmarks on my laptop (GT 650m)

Convolution time = 0.066969 seconds
Convolution time = 9.1e-05 seconds
Convolution time = 7.6e-05 seconds
Convolution time = 8.5e-05 seconds
Convolution time = 9.2e-05 seconds
Convolution time = 0.000106 seconds
Convolution time = 0.000107 seconds
Convolution time = 0.000104 seconds
Convolution time = 9.5e-05 seconds
Convolution time = 7.6e-05 seconds
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby shaklee3 » Mon Dec 16, 2013 6:17 pm

What constitutes the warm up time? What if I do a whole bunch of other GPU work in between calls to that? is it dependent on caches?
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Mon Dec 16, 2013 7:05 pm

There are many things that eventually decide the run time.

1) The GPU needs to increase its clock speed to change from idle mode to performance mode.
2) There are times taken for initializing CUDA contexts
3) The time taken by the NVIDIA driver to convert the kernels from intermediate format to the right "assembly" for your GPU (sort of final step of compilation).
4) There is also time taken for allocating require memory. ArrayFire has a memory manager so allocations and frees are not frequent.

(1) and (2) usually affect the *first* cuda call. In your case it is randu.
(3) is the biggest culprit when we are talking about "warm up". The first run may include a compilation step. So if you are using the same kernel multiple times, the successive runs in the future are faster.
(4) adds a very small overhead, but only for the first few calls.

And no, the successive runs are not faster because of caching. You can test that by running other GPU calls in between convolve to see the performance is very similar to what you are seeing without them.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby shaklee3 » Fri Dec 20, 2013 12:17 pm

Thanks Pavan. Can I have you do one more test for me since I don't have the pro version (yet)? I want to see how fast arrayfire can take a 1480x1480 sparse matrix (toeplitz with about 13320 elements non-zero) and multiply it by a dense 1480x1 vector. I'm seeing pretty poor results with cusparse, and wanted to know if you guys can beat their time. Thanks!
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Fri Dec 20, 2013 12:19 pm

Hi Shaklee,

If you download the final version of arrayfire 2.0, you will get a 15 day trial license that contains ArrayFire pro features. Please go ahead and give it a shot.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Example Doesn't Compile

Postby shaklee3 » Fri Dec 20, 2013 12:22 pm

Ah, ok I didn't see that. I don't see any functions specific to sparse matrices besides generating a CSR matrix. Do you have any functions that take advantage of most of the matrix being sparse?
shaklee3
 
Posts: 9
Joined: Sun Dec 15, 2013 8:47 pm

Re: Example Doesn't Compile

Postby pavanky » Fri Dec 20, 2013 12:24 pm

Matrix multiplication, transpose, solve for triangular and tri diagonal matrices. They use the same API as their dense counter parts.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA


Return to [archive-commercial] Programming & Development with ArrayFire

cron