CUDA has a small amount of memory available for its threads called shared memory. As the name already suggests is that this memory is available to all threads within a block simultaneously. We want to use this property to make threads read memory from global memory to shared memory in a block, use the memory together, and afterwards write the result back into global memory to avoid multiple accesses to global memory. Nevertheless, there are some rules one need to respect for high performance.
How to setup and configure an ArchLinux based workstation for scientific computing in a mulit-user environment.
Welcome to my personal website. Beside presenting my portfolio of projects and latest research, I will post updates about my private projects and latest journeys into rabbit holes of every kind. The motivation of most of those sometimes long lasting problem solving trips is to find the most simple solution by breaking down each complex problem into manageable bits and pieces