Moderator: jacket_guy
>> ginfo
Jacket v1.1 (build XXXX) data: 0 CPU-used, 0 GPU-used, 3987 GPU-free (in MB)
GPU0 (enabled) Quadro FX 5800, 1265 MHz, 4095 MB VRAM, Capability 1.3
>> A = gsingle(1:4);
>> B = gdouble(1:4);
>> C = A > 2;
>> D = guint32(B);
>> E = B + 1i;
>> whos
Name Size Bytes Class Attributes
A 1x4 508 gsingle
B 1x4 508 gdouble
C 1x4 1048 glogical
D 1x4 984 guint32
E 1x4 1052 gdouble complex>> N %%% standard double-precision CPU Matlab variable
N =
1.407449003374228 -1.303193638472176 -0.582186939674376
-0.381293350245835 0.656453029287915 -1.658488523670904
1.292504363290397 0.079473728930311 1.179880057296001
>> N_cpu_single = single(N)
N_cpu_single =
1.4074490 -1.3031937 -0.5821869
-0.3812934 0.6564530 -1.6584885
1.2925043 0.0794737 1.1798800
>> N_gpu_single = gsingle(N)
N_gpu_single =
1.4074490 -1.3031937 -0.5821869
-0.3812934 0.6564530 -1.6584885
1.2925043 0.0794737 1.1798800
>> N_gpu_double = gdouble(N)
N_gpu_double =
1.407449003374228 -1.303193638472176 -0.582186939674376
-0.381293350245835 0.656453029287915 -1.658488523670904
1.292504363290397 0.079473728930311 1.179880057296001
%% Compute the norm differences between some of these:
>> norm(N(:) - N_gpu_double(:))
ans =
0 %% double precision GPU result
>> norm(N(:) - N_gpu_single(:))
ans =
8.302859345298218e-08 %% double precision GPU result
>> norm(N(:) - N_cpu_single(:))
ans =
8.3028596e-08 %% single precision CPU result(4 core Intel Core i7 CPU 975 @ 3.33GHz, 12 GB RAM, with Tesla C1060 4GB GPU)
Jacket v1.2.2 (build 3170) data: 0 CPU-used, 0 GPU-used, 4044 GPU-free (in MB)
GPU0 (enabled) Tesla C1060, 1265 MHz, 4095 MB VRAM, Compute 1.3 (single,double) (in use)
GPU1 (enabled) Quadro FX 380, 1074 MHz, 255 MB VRAM, Compute 1.1 (single)
n=3000;
A=rand(n,n);
G=gdouble(A);
tic
R=A*A;
toc
tic
S=G*G;
toc
sum(sum(abs(R-double(S))))
Elapsed time is 2.440815 seconds.
Elapsed time is 0.028268 seconds.
ans =
8.3630e-006
n=3000;
A=single(rand(n,n));
G=gsingle(A);
tic
R=A*A;
toc
tic
S=G*G;
toc
sum(sum(abs(R-single(S))))
Elapsed time is 1.211856 seconds.
Elapsed time is 0.019458 seconds.
ans =
4.4995e+003
n=10000;
A=rand(n,n);
G=gdouble(A);
tic
R=A*A;
toc
tic
S=G*G;
toc
sum(sum(abs(R-double(S))))
Elapsed time is 55.508999 seconds.
Elapsed time is 0.382455 seconds.
ans =
5.7276e-004
[CPUtime', GPUtime', Resid']
ans =
0.0959, 0.0063, 0.0000
0.7258, 0.0155, 0.0000
1.6517, 0.0255, 0.0000
4.5068, 0.0652, 0.0000
6.8876, 0.0932, 0.0001
15.9968, 0.1334, 0.0001
19.5711, 0.1863, 0.0002
28.1169, 0.3501, 0.0002
49.7120, 0.3253, 0.0004
60.9479, 0.3726, 0.0006
vitaly wrote:Hi!
Thanks for the full and useful answer.
I am engaged in field of neuratechnologies. The operating time of my algorithm on processor Core2Duo 4Gz makes 1 month. Therefore possibility to reduce this time in 10 times (with GT300) for me desirable result. I already tried to pass on single type of the data and have thus received catastrophic falling of accuracy. Therefore unique possibility to use capacity GPU it to work with double the data.
I offer, if it is possible, to try this code.
n=3000;
A=rand(n,n);
G=gdouble(A);
tic
R=A*A;
toc
tic
S=G*G;
toc
sum(sum(abs(R-double(S))))
It is clear that speed in gdouble should be essentially more low rather than in gsingle, whether but it will be interesting it above in comparison with usual CPU.
Inform, what ways are available for acceleration of work CPU+GPU. How I understand, speed CPU not strongly influences for the speed? And speed of system memory should be increased (for acceleration of data transmission from CPU to GPU)? Three-channel memory Ci7 will be the best variant?
I will wait for possibility to check up work v1.1 on the GTX250
Thanks.
n=3000;
A=rand(n,n);
G=gdouble(A);
tic
R=A*A;
toc
tic
S=G*G;
toc
sum(sum(abs(R-double(S))))
n=3000;
A=rand(n,n);
G=gdouble(A);
gforce(G); gforce; % Ensures that G is formed on the GPU and any CPU/GPU sync is done
tic
R=A*A;
toc
tic
S=G*G;
gforce(S); % Need to force computation of the matrix S here
toc
sum(sum(abs(R-double(S))))