Cuda Driver Error and Memory Error w/ Jacket v.1.8

[Old posts from the commercial version of ArrayFire] Issues and comments for download and installation. Getting up and running.

Moderators: melonakos, pavanky

Cuda Driver Error and Memory Error w/ Jacket v.1.8

Postby cbcase » Tue Jul 19, 2011 2:31 pm

Hi,

I and some of my colleagues have a strange bug with the latest version of Jacket, and we unfortunately have not been able to drill down on the specific cause.

Back with version 1.7.1, we had an issue where the following occurred:
1) "Most" code works fine
2) Large matrix operations trigger memory errors
3) If you try and use GSELECT to switch GPUs, it gives a "GPU Error" but appears to switch

After reinstalling and then modifying LD_LIBRARY_PATH to have /path/to/jacket/engine/lib64 these errors went away.

We upgraded the system-wide installation to v.1.8 at the end of the week last week, and these same errors are back again. Furthermore, modifying LD_LIBRARY_PATH has no effect. The only visible change I can think might have (re)-triggered this issue is the move from CUDA toolkit 3.2 to 4.0.

Some more details:
GINFO output:
Jacket v1.8 (build 2966384) by AccelerEyes
License Type: Designated Computer (on 64-bit Linux)
Licensed Addons: none
Multi-GPU: Licensed for 2 GPUs

Detected CUDA-capable GPUs:
CUDA driver 275.09.07, CUDA toolkit 4.0
GPU0 GeForce GTX 580, 3072 MB, Compute 2.0 (single,double) (in use)
GPU1 GeForce GTX 580, 3072 MB, Compute 2.0 (single,double)
GPU2 GeForce GTX 580, 3072 MB, Compute 2.0 (single,double)
GPU3 Tesla C1060, 4096 MB, Compute 1.3 (single,double)

GPU Memory Usage: 3000 MB free (3072 MB total)


(GACTIVATE output is the same, of course, except for the installation checks, which all pass).


If I try GSELECT(2), then I get the following error:
src/matlab/gpgpu.cpp:535: CUDA driver error: driver: invalid context
??? Error using ==> gpu_entry
GPU failure (http://accelereyes.com/faq?q=116)

Error in ==> /usr/local/jacket/engine/gselect.p>gselect at 5
% Supported Syntax


But if you run GINFO again, it now says GPU2 is in use.

Next, if I run a standard large-matrix operation for us (ie, one that worked no trouble w/ 1.7.1 after modifying LD_LIBRARY_PATH), I get:
??? Error using ==> sum
Unable to allocate memory (http://accelereyes.com/faq?q=112) (src/matlab/calls/sum.cpp:91)

Error in ==> [function name] at 59
cost = 0.5 * sum ( (h(:) - data(:)).^2) * (1/m) + ...


Other system information:
Linux 10.04 64-bit
CPU: 16 cores Intel Xeon 2.67GHz
GPU info: (see above)
CUDA info (from above): CUDA driver 275.09.07, CUDA toolkit 4.0

Thanks in advance for any help,
cbcase
 
Posts: 1
Joined: Tue Jul 19, 2011 2:12 pm

Re: Cuda Driver Error and Memory Error w/ Jacket v.1.8

Postby vishy » Wed Jul 20, 2011 10:33 am

cbcase,

cbcase wrote:We upgraded the system-wide installation to v.1.8 at the end of the week last week, and these same errors are back again. Furthermore, modifying LD_LIBRARY_PATH has no effect. The only visible change I can think might have (re)-triggered this issue is the move from CUDA toolkit 3.2 to 4.0.


That makes sense - Jacket 1.8 would probably have erased your changes to LD_LIBRARY_PATH.
Do you have a separate CUDA toolkit [apart from the one packaged with Jacket] that you add to LD_LIBRARY_PATH?

cbcase wrote:If I try GSELECT(2), then I get the following error:
src/matlab/gpgpu.cpp:535: CUDA driver error: driver: invalid context
??? Error using ==> gpu_entry
GPU failure (http://accelereyes.com/faq?q=116)
But if you run GINFO again, it now says GPU2 is in use.


Following a 'CUDA Driver Error', the ability of the GPU (and Jacket) to report information correctly is compromised. As a result, we recommend that users restart MATLAB. Thus, your 2nd ginfo output cannot be trusted.

It is possible that there's something wrong with the NVIDIA driver on your system. Can you try reinstalling the driver and report to me any issues you face?
--------------------------------------------------
Vish Venugopalakrishnan
Software Engineer (Q/A)
AccelerEyes LLC
vishy.v@accelereyes.com

--------------------------------------------------
Resources:
Getting Started, FAQ, Tips, Syntax
User avatar
vishy
 
Posts: 411
Joined: Thu Apr 16, 2009 11:46 am

Re: Cuda Driver Error and Memory Error w/ Jacket v.1.8

Postby wooyounglee » Sat Sep 03, 2011 8:28 pm

Hi, is there any update on this issue?
I'm having a similar problem with v1.8 and it'd be really helpful if you could let me know who to fix this issue.
wooyounglee
 
Posts: 2
Joined: Wed Aug 17, 2011 2:48 pm

Re: Cuda Driver Error and Memory Error w/ Jacket v.1.8

Postby vishy » Tue Sep 06, 2011 5:20 pm

Wooyoung,
I take it that the problem you are facing is what we have been discussing in email?
If so, let us continue over email - I want to avoid multiple lines of communication.
--------------------------------------------------
Vish Venugopalakrishnan
Software Engineer (Q/A)
AccelerEyes LLC
vishy.v@accelereyes.com

--------------------------------------------------
Resources:
Getting Started, FAQ, Tips, Syntax
User avatar
vishy
 
Posts: 411
Joined: Thu Apr 16, 2009 11:46 am


Return to [archive-commercial] Download & Installation

cron