1.
For a tiled 1D convolution, if the output tile width is 250 elements and mask width is 7 elements, what is the input tile width loaded in to shared memory?
Correct Answer
C. 256
Explanation
In a tiled 1D convolution, the output tile width is given as 250 elements and the mask width is given as 7 elements. To efficiently perform the convolution, the input tile width loaded into shared memory needs to be a multiple of the mask width. Since the mask width is 7, the input tile width needs to be a multiple of 7. Among the given options, the only value that is a multiple of 7 is 256. Therefore, the input tile width loaded into shared memory is 256.
2.
For the work inefficient scan kernel based on reduction trees, assume that we have 1024 elements, which of the following gives the closest approximation of the number of add operations performed?
Correct Answer
D. 1024*10
Explanation
The given correct answer, 1024*10, gives the closest approximation of the number of add operations performed in the work inefficient scan kernel based on reduction trees. This is because the number of add operations performed in the scan kernel is equal to the number of elements minus one, multiplied by two. In this case, there are 1024 elements, so the number of add operations would be (1024-1) * 2 = 2046. Therefore, the closest approximation to 2046 is 1024*10, which equals 10240.
3.
Barrier synchronizations should be used whenever we want to ensure all threads have completed a common phase of their execution_____________
Correct Answer
A. Before any of them start the next pHase
Explanation
Barrier synchronizations should be used whenever we want to ensure all threads have completed a common phase of their execution before any of them start the next phase. This means that the barrier synchronization will block the threads until all of them have reached the barrier, ensuring that they all finish the current phase before moving on to the next one. This helps in coordinating the execution of multiple threads and ensures that they all reach a specific point before proceeding further.
4.
Each time a DRAM location is accessed, then __________
Correct Answer
C. All the locations that include the requested location are actually accessed
Explanation
When a DRAM location is accessed, all the locations that include the requested location are actually accessed. This is because DRAM operates in blocks, and when a specific location is accessed, the entire block that contains that location is read or written to. This is known as the "row buffer" or "page" in DRAM, and it helps to improve efficiency by accessing multiple locations at once. Therefore, all the locations within the block are accessed, not just the requested location.
5.
Consider performing a 1D convolution on array N= {4,1,3,2,3} with mask M={2,1,4}. What is the resulting output array?
Correct Answer
B. {8,21,13,20,7}
6.
Correct Syntax to declare constant memory is:
Correct Answer
A. CudaMemcpyToSymbol (dest, src, size)
Explanation
The correct syntax to declare constant memory is cudaMemcpyToSymbol (dest, src, size). This function is used to copy data from the host memory to the constant memory on the device. The "dest" parameter specifies the destination symbol in the constant memory, the "src" parameter specifies the source data in the host memory, and the "size" parameter specifies the size of the data to be copied.
7.
Consider performing a 1D convolution on an array of size n with a mask of size m. How many halo cells are there in total?
Correct Answer
B. M-1
Explanation
When performing a 1D convolution on an array of size n with a mask of size m, the halo cells are the extra cells that are added to the array to ensure that the convolution is properly calculated at the edges. The number of halo cells required is equal to the size of the mask minus one (m-1). This is because the mask needs to extend one cell beyond each end of the array to cover all the elements in the convolution.
8.
How many multiplications are performed if halo cells are treated as multiplications (by 0) for an array of size n and mask of size m in case of 1-D convolution?
Correct Answer
C. M*n
Explanation
In 1-D convolution, each element of the mask is multiplied with the corresponding element in the array, and then all the products are summed up. Since the mask has size m and the array has size n, there will be m*n multiplications. Additionally, there is one extra multiplication because of the halo cells being treated as multiplications by 0. Therefore, the total number of multiplications is m*n+1.
9.
Which of the memory is referred to as “scratchpad memory”-
Correct Answer
C. Shared memory
Explanation
Shared memory refers to a type of memory that is shared among multiple threads within a block in a GPU. It is referred to as "scratchpad memory" because it can be used as a temporary storage space for threads to quickly exchange data and communicate with each other. This type of memory is faster to access compared to global memory, making it ideal for frequently accessed data that needs to be shared among threads. Constant memory, global memory, and registers are different types of memory in a GPU, but they do not specifically serve the purpose of scratchpad memory.
10.
Do CUDA memory architecture have cache and cache levels?
Correct Answer
A. True
Explanation
CUDA memory architecture does have cache and cache levels. The GPU's memory hierarchy includes multiple levels of cache, such as L1 and L2 caches, which are used to store frequently accessed data and improve memory access latency. These caches help to reduce the time it takes to fetch data from the main memory, thereby improving overall performance.