I’ve been working on a simulator for a plasma engine. It uses a back-to-basics method of calculating the effect of each charged particle in a plasma with each other. Higher level simulations are possible as long as your ideas fit into the standard differential equations. This is analogous to using machine language for ultimate flexibility instead of a full web framework – the framework only works if you decide to build the same things that have already been built.
Particle-to-particle simulations are classic O(N^2) problems, getting 100x harder when only 10x more particles are used. Add to the basic interaction a need to check for collisions with walls and interact with neutral gas also floating around, and the problem gets to be quite large. A good way to measure the performance of such an application is by the number of particle-particle interactions per second that it can perform. My original simulator in C++ got over 34 million interactions per second on a dual core 3.4GHz 64-bit machine using OpenMP to manage its threading. Sounds pretty good, right? At that pace, a certain simulation I needed still took 6 hours!
Enter CUDA. CUDA is NVIDIA’s C-based technology for using their graphics cards (GPUs) to do actual computation. It’s a major mind shift to think in terms that work with GPUs. Everything isn’t a texture or bitmap, so there isn’t that issue. I’m talking about the massive parallelism involved. A normal CPU would run very, very, [very] poorly if two arrays were added by spawning a thread for each entry in the output array. A GPUs runs poorly if you don’t do this.
Suffice it to say, a complete rewrite of the simulator was necessary, but ultimately worth it. It now runs 748 million interactions per second (that’s 22x faster), completing that 6 hour simulation in just 16 minutes!