- Code: Select all
// BEFORE
gfor (array k, 400) {
array B = A(span,k);
C(span,span,k) = B * B.T(); // outer product expansion runs out of memory
}
// AFTER
for (int kk = 0; kk < 400; kk += 100) {
gfor (array k, kk, kk+99) { // four batches of 100
array B = A(span,k);
C(span,span,k) = B * B.T(); // now several smaller problems fit in card memory
}
}
Are there some general guidelines in how big the segments can be. Or does this have to be done by trial and error?
More specific, in the example the loop of 400 is broken up into 4 batches of 100. I assume the increment (or size of batch) of 100 is dependant on the size of array B and device memory. Is there some way to deduce from this information that 100 is the ideal (or an acceptable) batch size?
Thank you