Maximum FFT Size?

[Old posts from the commercial version of ArrayFire] Discussion of ArrayFire using CUDA or OpenCL.

Moderator: pavanky

Maximum FFT Size?

Postby neuralPanther » Mon Feb 24, 2014 2:18 pm

Hi again,

My code uses variable size FFT up to a system maximum (I had assumed 33554432 complex points) - I get an exception at 8 million points (8,388,608).

"src/cuda/fft_double.cu/fft_double.cu:27: CUFFT failure (execution failed)"

My platform specs are:
CUDA toolkit 5.5, Driver 320.57
0: GeForce GTX 580M, 2048 MB, CUDA compute 2.1
2048 MB of memory.

Note that when I use straight CuFFT I can do 16 - 33 million point - complex double FFTs (one of the reasons I switched to arrayFire was to simplify the changing of FFT size).

Here's some code:
Code: Select all
// in myClass constructor:
//maxFFTsize = 33554432
try{
   my_signal =  af::array(maxFFTSize,1,af::f64);
   }catch(af::exception& e)
   {
      fprintf(stderr, "%s\n", e.what());
      
      throw;
   }

// in myClass::processingFunction(long long fftSize)
try{
   
   FFT_my_signal = af::fft(my_signal, fftSize);

    }catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
   }

//myClass->ProcessingFunction(size)  is called from a main loop- fftSize starts at 2048 and increases by 2 every time
// fftSizes are: 2048, 4096, 8192, ..., 8388608, ...


if I let the it go past 8388608 it hits the exception
~ NP
neuralPanther
 
Posts: 25
Joined: Fri Feb 14, 2014 8:03 pm

Re: Maximum FFT Size?

Postby pavanky » Mon Feb 24, 2014 2:22 pm

It is likely that you are running out of memory. Are you doing anything else in the program ?
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA

Re: Maximum FFT Size?

Postby neuralPanther » Mon Feb 24, 2014 6:17 pm

Yes, lots,

This is a large openGL application
Here's my entire data processing function (essentially) (broken up for ease of reading)

Allocate Variables:
Code: Select all
// NOTE: Most of the variables are allocated in a different function
        cdouble *d_ptr;
   size_t numBytes;
   af::array x_signal=af::constant(0,signalSize,1,af::f64);

   cdouble* fftResult = NULL;
   af::array max_value = af::constant(0,1,1,af::f64);

run custom kernels:
Code: Select all

   // reset the data
   my_signal = af::constant(0,signalSize,1,af::f64);
   // point my device pointer to the data
   d_signal = (double*) my_signal.device<double>();
   
   // d_Data is filled (from the host) with a cudamemcpy in a different function
   cudaKernel1(d_Data,d_signal,signalSize);

   cudaKernel2(d_signal,signalSize);

   my_signal.unlock();
   

Transform and massage the data:
Code: Select all
   try{
         FFT_my_signal = af::fft(my_signal, signalSize);

    }catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
   }
   
   // magnitude of the transformed data
   try{      
      my_signal = abs(FFT_my_signal);
      max_value = af::max(my_signal);      
   }catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
   }
   
   // Normalize the magnitude by the max value
   try{
      my_signal = my_signal/max_value;
   }catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
   }
   // 20*log10(data)
   try{
      my_signal = 20.0 *(af::log10(my_signal));
   }catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
   }
   //fftshift(data)
   try{
      my_signal = af::shift(my_signal,(signalSize/2)+1);
   }catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
   }



generate the X vector for plotting against:
Code: Select all
   gfor ( af::array i, 0,signalSize)
   {
      x_signal(i) = (increment*i)-(offset);
   }
   //pack it back into our complex array
   FFT_my_signal = af::complex(x_signal, my_signal);

put data into the openGL interop:
Code: Select all

   try
   {
       fftResult =FFT_my_signal.device<cdouble>();
   }catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
   }
   checkCudaErrors(cudaGraphicsMapResources(1,&cuda_GL_resource,0));
   checkCudaErrors(cudaGraphicsResourceGetMappedPointer((void **)&d_ptr, &numBytes, cuda_GL_resource));
   
   // copy the formatted data into the openGL buffer(s)
   checkCudaErrors(cudaMemcpy(d_ptr,fftResult,(signalSize)*2*sizeof(double),cudaMemcpyDeviceToDevice));
   
   //release for plotting
   checkCudaErrors(cudaGraphicsUnmapResources(1,&cuda_GL_resource,0));



I think I have too many arrays floating around.

Is it possible to do more of this "in place"?
neuralPanther
 
Posts: 25
Joined: Fri Feb 14, 2014 8:03 pm

Re: Maximum FFT Size?

Postby pavanky » Mon Feb 24, 2014 6:35 pm

I can not think of any immediate improvements. The problem is that CUFFT uses a lot of scratch space internally as well.

The only suggestion I can give right now is to not create FFT_MY_SIGNAL before the final memcopy. You can instead copy the arrays to d_ptr using two memory copies. And if OpenGL permits, use (d_ptr[i], d_ptr[i + signalSize) instead of (d_ptr[i], d_ptr[i + 1]).
Pavan Yalamanchili,
ArrayFire
--
~ If it is not broken, you have not tried hard enough ~
User avatar
pavanky
Site Admin
 
Posts: 1123
Joined: Mon Mar 15, 2010 7:39 pm
Location: Atlanta, GA


Return to [archive-commercial] Programming & Development with ArrayFire

cron