Table 2. Performance increase for each FullMonteCUDA optimization over FullMonteSW.
Optimization | Incremental Speedup |
---|---|
Naive | 2x |
CUDA vector datatypes and math operations | 2.5x |
Materials constant cache | 1.6x |
Thread local accumulation cache | 1.3x |
Optimization | Incremental Speedup |
---|---|
Naive | 2x |
CUDA vector datatypes and math operations | 2.5x |
Materials constant cache | 1.6x |
Thread local accumulation cache | 1.3x |