Not Recognizing Multiple GPUs

[Old posts from the commercial version of ArrayFire] Issues and comments for download and installation. Getting up and running.

Moderators: melonakos, pavanky

Not Recognizing Multiple GPUs

Postby amaas » Fri Jan 20, 2012 8:18 pm

I have 64 bit linux machines with 2 GTX 4/570's each and Jacket seems to not recognize the presence of 2 cards. All are running Jacket 2.0

Code: Select all
>> ginfo
Jacket v2.0 (build b71ce48) by AccelerEyes (64-bit Linux)
License Type: Designated Computer (/afs/cs.stanford.edu/package/jacket/2.0-20111221/jacket/engine/jlicense.dat)
Addons: none
CUDA toolkit 4.0, driver 280.13
GPU1 GeForce GTX 570, 1280 MB, Compute 2.0 (single,double)
Memory Usage: 1187 MB free (1280 MB total)

>> ginfo('all')
      system: '64-bit Linux'
       driver: 280.1300
      toolkit: 4
      version: '2.0 (build b71ce48)'
      default: 'single'
    gpu_count: 1
      current: 0
         name: []
    gpu_total: 0
     gpu_free: 0
      compute: 0
        flops: 0

However, if I look at /proc/driver/nvidia/gpus/ the system has cards 0 and 1, suggesting the driver properly recognizes both cards. Is there a known reason why Jacket might not recognize a linux system with 2 cards?
amaas
 
Posts: 10
Joined: Wed Apr 07, 2010 12:47 am

Re: Not Recognizing Multiple GPUs

Postby jaideep » Fri Jan 20, 2012 10:09 pm

This is a known issue and has been fixed already. Please download the latest nightly from http://www.accelereyes.com/nightly/ and give it a try.

Thanks!
Jaideep Singh
Software Engineer
AccelerEyes
User avatar
jaideep
 
Posts: 207
Joined: Tue Oct 11, 2011 2:40 pm

Re: Not Recognizing Multiple GPUs

Postby amaas » Sun Jan 22, 2012 5:10 pm

I installed the nightly build and it seems the problem is not fixed.
Code: Select all
>> ginfo
Jacket v2.0 (build a795154) by AccelerEyes (64-bit Linux)
License Type: Designated Computer (/usr/local/jacket/engine/jlicense.dat)
Addons: MGL16, SDK, DLA, SLA
CUDA toolkit 4.0, driver 280.13
GPU1 GeForce GTX 570, 1280 MB, Compute 2.0 (single,double)
Memory Usage: 1177 MB free (1280 MB total)
>> ginfo('all')

ans =

       system: '64-bit Linux'
       driver: 280.1300
      toolkit: 4
      version: '2.0 (build a795154)'
      default: 'single'
    gpu_count: 1
      current: 1
         name: 'GeForce GTX 570'
    gpu_total: 1.3418e+09
     gpu_free: 1.2339e+09
      compute: 2
        flops: 702720000

>> ginfo(2)
??? Error using ==> ginfo at 83
gpu specified exceeds available gpus(1)


My /proc/driver/nvidia/gpus/ still shows 2 gpus
amaas
 
Posts: 10
Joined: Wed Apr 07, 2010 12:47 am

Re: Not Recognizing Multiple GPUs

Postby jaideep » Mon Jan 23, 2012 1:26 pm

Please upgrade the driver from http://www.nvidia.com/Download/index.aspx?lang=en-us and see if this fixes the issue. Thanks!
Jaideep Singh
Software Engineer
AccelerEyes
User avatar
jaideep
 
Posts: 207
Joined: Tue Oct 11, 2011 2:40 pm

Re: Not Recognizing Multiple GPUs

Postby amaas » Thu Jan 26, 2012 12:56 am

Just tried the 290.10 driver on one of our machines and it doesn't appear to fix the problem.


Code: Select all
>> ginfo
Jacket v2.0 (build a795154) by AccelerEyes (64-bit Linux)
License Type: Designated Computer (/usr/local/jacket/engine/jlicense.dat)
Addons: MGL16, SDK, DLA, SLA
CUDA toolkit 4.0, driver 290.10
GPU1 GeForce GTX 470, 1280 MB, Compute 2.0 (single,double)
Memory Usage: 1216 MB free (1280 MB total)
>> ginfo('all')

ans =

       system: '64-bit Linux'
       driver: 290.1000
      toolkit: 4
      version: '2.0 (build a795154)'
      default: 'single'
    gpu_count: 1
      current: 1
         name: 'GeForce GTX 470'
    gpu_total: 1.3418e+09
     gpu_free: 1.2747e+09
      compute: 2
        flops: 544320000

So it appears that Jacket it seeing only 1 card on the system. However if I look at /proc it seems the driver is properly interfacing with both cards:
Code: Select all
$ cat /proc/driver/nvidia/gpus/0/information
Model:        GeForce GTX 470
IRQ:          177
Video BIOS:     70.00.21.00.03
Card Type:     PCI-E
DMA Size:     39 bits
DMA Mask:     0x7fffffffff
Bus Location:     0000:02.00.0
$ cat /proc/driver/nvidia/gpus/1/information
Model:        GeForce GTX 470
IRQ:          185
Video BIOS:     ??.??.??.??.??
Card Type:     PCI-E
DMA Size:     32 bits
DMA Mask:     0xffffffff
Bus Location:     0000:03.00.0
amaas
 
Posts: 10
Joined: Wed Apr 07, 2010 12:47 am

Re: Not Recognizing Multiple GPUs

Postby amaas » Mon Feb 06, 2012 1:18 pm

Resolved.

Turns out that even though things show up properly in /proc (i.e. 2 gpus show there), the cuda devices weren't being created properly. This is a known issue when running on machines that do not start X, and is covered in the nvidia release notes. We now run a script to setup the /dev/nvidia* devices and both gpus are correctly recognized within Jacket.
amaas
 
Posts: 10
Joined: Wed Apr 07, 2010 12:47 am


Return to [archive-commercial] Download & Installation

cron