matmul() speed

[Old posts from the commercial version of ArrayFire] Discussion of ArrayFire using CUDA or OpenCL.

Moderator: pavanky

matmul() speed

Postby sebktm » Wed Dec 11, 2013 12:20 pm

Hi, I'm developing newly with the arrayfire library. I want to increase the speed of an image reconstruction program. Therefore I have an existing code and rewrite the existing routines with the af routines. The problem is that instead of a speed up I have a decrease in speed of a factor 1000.... It should be a simple routine which just fills an array of variable size with specific values. The array size varies from 64x64 to 4096x4096. Here are the routines for gpu and cpu:

Code: Select all
array hanming(int nx, int ny, float alpha)
{
        float twopi = 2 * Pi;
        array w1 = seq(nx);

        w1 = alpha + (alpha - 1.) * cos(w1 * twopi / (float) nx);
        if (ny > 1) {
                array w2 = seq(ny);
                w2 = alpha + (alpha - 1.) * cos(w2 * twopi / (float) ny);
                return matmul(w1, w2, B_transpose);
        }

        return w1;
}

void hanmingcpu(float *res, int nx, int ny, float alpha)
{
    int i, j;
    float fact;
    float *w1, *w2;

    /* Initialise constants */
    double pi = 3.141592654;
    double two_pi = 2.0 * pi;

    nx = (nx > 0) ? nx : 1;
    ny = (ny > 0) ? ny : 1;

    /* Initialize window arrays */
    w1 = (float *) (malloc(nx * sizeof(float)));
    for (i = 0; i < nx; i++) {
        w1[i] = alpha +
            (alpha -
             1.) *
            ((float)
             (cos((double) (two_pi * ((float) i) / ((float) nx)))));
    }
    if (ny > 1) {
        w2 = (float *) (malloc(ny * sizeof(float)));
        for (i = 0; i < ny; i++) {
            w2[i] = alpha +
                (alpha -
                 1.) *
                ((float)
                 (cos((double) (two_pi * ((float) i) / ((float) ny)))));
        }
    }

    /* Set window */
    for (j = 0; j < ny; j++) {
        fact = (ny > 1) ? w2[j] : 1.0;
        for (i = 0; i < nx; i++) {
            res[j * nx + i] = fact * w1[i];
        }
    }

    free(w1);
    if (ny > 1)
        free(w2);
}
sebktm
 
Posts: 7
Joined: Wed Dec 11, 2013 12:07 pm

Re: matmul() speed

Postby shehzan » Wed Dec 11, 2013 12:34 pm

Hi

Can you please let me know how you are timing this code? I would suggest you to use the timeit function. You can read more about it here: http://www.accelereyes.com/arrayfire/c/gettingstarted.htm#gettingstarted_timing
----
Shehzan
Developer
AccelerEyes
User avatar
shehzan
 
Posts: 121
Joined: Tue Feb 12, 2013 7:20 pm

Re: matmul() speed

Postby sebktm » Wed Dec 11, 2013 1:05 pm

Hi,

I was timing the code with the timer from arrayfire, not with the timeit function. Now i just wrote it with timeit see code below, and get a speedup thanks :) But what was the problem? The array allocation of matmul?

Code: Select all
using namespace std;

static int nx2 = pow(2, 12);
static int ny2 = pow(2, 12);

static float *tempcpu2;
static void cpu() {
         tempcpu2 = (float *) malloc(nx2*ny2*sizeof(float));
         hanmingcpu(tempcpu2, nx2, ny2, 0.5);}
static void gpu() {af::array B = hanming(nx2, ny2, 0.5);}

int main (int argc, char *argv[])
{
af::deviceset(0);
af::info();

float * tempcpu;

int nx = pow(2,12);
int ny = pow(2,12);

tempcpu = (float *) malloc(nx*ny*sizeof(float));

af::array B;
af::timer time = timer::start();
B = hanming(nx, ny, 0.5);
std::cerr << timer::stop(time) << " seconds\n";

time = timer::start();std::cerr << timer::stop(time) << " seconds\n";
hanmingcpu(tempcpu, nx, ny, 0.5);
std::cerr << timer::stop(time) << " seconds\n";

printf("timeit: \n");
printf("cpu %f\n", af::timeit(cpu));
printf("gpu %f\n", af::timeit(gpu));
}
sebktm
 
Posts: 7
Joined: Wed Dec 11, 2013 12:07 pm

Re: matmul() speed

Postby shehzan » Wed Dec 11, 2013 1:28 pm

The reason the code may have been slow is because of the initial overhead associated with GPU programming (sometimes called "warmup").
It isn't exactly because of either allocation or matmul, both are quite fast. It is just that doing a very small program just once on the GPU isn't very efficient because of the initial warmup.
Timeit runs the function multiple times and then takes out an average of all the runs. This allow the initial warm up period to be averaged out.
----
Shehzan
Developer
AccelerEyes
User avatar
shehzan
 
Posts: 121
Joined: Tue Feb 12, 2013 7:20 pm

Re: matmul() speed

Postby sebktm » Wed Dec 11, 2013 1:36 pm

Ok the code is just minimal because I'm testing each function seperately. The later code is big so I wanted to first speedup everything and then combine. I did not think of an inital warmup I just had memory allocations and copies in mind. Thanks for the fast reply!
sebktm
 
Posts: 7
Joined: Wed Dec 11, 2013 12:07 pm


Return to [archive-commercial] Programming & Development with ArrayFire

cron