I think the biggest difference between programming with CUDA and traditional procedure-oriented language is the concurrent. As the traditional language, both OOP and POP, are executed step by step with one CPU. So the relationship(or order) among functions(objects of classes), data flows and data changes are very intuitive. For example, if you are programming a C program which helps you to process images. There is two ways to transfer data, input parameters and global variables. In some extreme cases, you only use parameters to transfer and share data. So the only thing you need to do is to lock your eye on the parameter, and keep in mind how it is copied, when it is transferred to other procedures and how the return values are merged. That is all and you will find the bug, sooner or later. However, in CUDA, it is a little different. As the GPU has a number of functional unit, you code are executed in the same time. It sounds like all your classmates, each of them holding a recipe, are trying to cook a complicated Chinese food. If your recipe is not well organized, the only thing you would get is 30 copies of the same dish. Or worse, one dish with 30 times more salt and none for the others. How to divide the task in CUDA is important. Here, I cannot tell you how to make your division better, but remind you that please handle it carefully.
The next thing needs attention is the memory copy. If you have used MEX between C++ and Matlab or some other interface between two different languages, you may understand this question better than me. There are two things you need to care, the data type and data size. We often use float2 in CUDA and copy it from device to a float pointer. Or we write 200 float1 in CUDA but copy only 100 back. The result would be extremely difficult to be examined. On one hand, you may do not have a debugger as powerful as GDB and Visual Studio debuger, on the other hand, you will be confusing about it is caused by your algorithm or other thing…
Good luck, CUDAer
No comments:
Post a Comment