How can I speed up this matlab code?

[Old posts from the commercial version of ArrayFire] Discussion of ArrayFire using CUDA or OpenCL.

Moderator: pavanky

How can I speed up this matlab code?

Postby bingo » Mon Jun 02, 2014 7:03 pm

I'm trying to port the following for loop from matlab to Arrayfire but without success, please help.

Code: Select all
for i = 1 : N
   x(i) = y(find(rand <= w,1));
end


Assume that x, y and w are the same lengths e.g. N = 1e6 and that w in the range of 0 to 1 and is monotonically non-decreasing.

Any help would be a great help.

Thanks in advance!
bingo
 
Posts: 3
Joined: Mon Jun 02, 2014 6:53 pm

Re: How can I speed up this matlab code?

Postby pavanky » Tue Jun 03, 2014 4:50 pm

Hi,

We don't have the equivalent of the function you are looking for.

However you can get the same behavior in the following method.

Code: Select all
for (int i = 0; i < N; i++) {
     array rnd = randu(1, 1);
     array cmp = diff((rnd <= w).as(f32));
     array val, array idx;
     max(val, idx, cmp);
     x(i) = y(idx + 1);
}


You can speed up the process by running a parallel algorithm that uses NxN memory. But since N is big (1e6), you can use a combination of for loop and the parallel version in the following manner.

Code: Select all
array get_idx(array w, int n)
{
    int N = w.elements();
    array wn = tile(w, 1, n);
    array rnd = tile(randu(1, n), N, 1);
    array cmp = diff1((rnd <= wn).as(f32));
    array val, idx;
    max(val, idx, cmp);
   return idx + 1;
}

int main()
{
/// other code

for (int i = 0; i < N - 1000; i += 1000) {
   array idx = get_idx(w, 1000);
   x(i +  seq(1000)) = y(idx);
}

int n = N % 1000;
int off = N - n;
array idx = get_idx(w, n);
x(off + seq(n)) = y(idx);
/// other code
}
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: How can I speed up this matlab code?

Postby bingo » Tue Jun 03, 2014 6:10 pm

Hi,

Wow, thanks for the quick response. A ha! Tiling! of course!

I did try out your code but I got hit with a memory limit error (which I thought odd), anyway, in the spirit of thinking differently, I've revisited my data and I think that if I add a few low cost transformations on the data in w as a preprocessing step all I'll need to do is a single lower bound with a N random numbers to derive the indexes and I can ditch that slow for loop forever!

I didn't see a lower bound function in the library docs does ArrayFire have one?

I was thinking that af::setintersect could work but I suspect not because the logic op in that function is '==' and I need '<='.

I did see that Thrust implement lower_bound but my code with ArrayFire is so pretty that I really don't want to start adding untidy Thrust calls.

Thanks again!
bingo
 
Posts: 3
Joined: Mon Jun 02, 2014 6:53 pm

Re: How can I speed up this matlab code?

Postby pavanky » Wed Jun 04, 2014 2:15 pm

You can use the diff1 + max method I showed earlier to get the lower bound in the following manner.

Code: Select all
int lower_bound(array &in, double val)
{   
    array tmp = diff1((val <= in ).as(f32));
    float res;
    int idx;
    max<float>(&res, &idx, tmp);
    return idx + 1;
}

int upper_bound(array &in, double val)
{   
    array tmp = diff1((val < in).as(f32));
    float res;
    int idx;
    max<float>(&res, &idx, tmp);
    return idx + 1;
}

int main(int argc, char **argv)
{   
    try {
        array a = round(10 * sort(randu(20, 1)));
        print(a);
        printf("lower: %d, upper: %d\n", lower_bound(a, 4), upper_bound(a, 4));

    } catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
    }
    return 0;
}


Code: Select all
a [20 1] =
        0.0000
        1.0000
        2.0000
        3.0000
        3.0000
        3.0000
        4.0000
        4.0000
        4.0000
        5.0000
        5.0000
        7.0000
        7.0000
        7.0000
        7.0000
        8.0000
        8.0000
        9.0000
        9.0000
       10.0000

lower: 6, upper: 9


diff1((val < in).as(f32)) returns a vector of all zeros except for the location showing the cross over point. max<float>() gets the index of that value.

Ideally diff1 would not be necessary if max<float>(..) was stable i.e. if a == b, and indx(a) < indx(b)), indx(a) would be returned. However internally we don't use the stability condition and the order is not preserved.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: How can I speed up this matlab code?

Postby pavanky » Wed Jun 04, 2014 2:17 pm

As for the memory error, i miscalculated the memory requirements. If you use 50 instead of 1000, the code will work.
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: How can I speed up this matlab code?

Postby bingo » Thu Jun 05, 2014 6:35 pm

Hi,

Thank you for the further code examples.

I ran your code again (combination of for loop and the parallel version) with 50 instead of 1000 and it took a very long time to run with 1e6 items of data (I left it running for 5+ mins before I quit my app), so I switched the dataset size to 1e5 and 1e3 and I got the following error in both instances:

src/cuda/diff_single.cu:34: CUDA runtime error: unknown error (30)

Any ideas?
bingo
 
Posts: 3
Joined: Mon Jun 02, 2014 6:53 pm

Re: How can I speed up this matlab code?

Postby pavanky » Fri Jun 06, 2014 3:51 pm

Can you send some code that I can reproduce the problem with ?

BTW were you able to get the lower_bound version to work ?
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA


Return to [archive-commercial] Programming & Development with ArrayFire

cron