Skip to main content
. Author manuscript; available in PMC: 2013 Jun 1.
Published in final edited form as: IEEE Trans Med Imaging. 2012 Feb 15;31(6):1250–1262. doi: 10.1109/TMI.2012.2188039

Figure 2.

Figure 2

The four-level hierarchy of modern parallel systems. Nodes contain disjoint DRAM address spaces, and communicate over a message-passing network in the CPU case, or over a shared PCI-Express network in the GPU case. Sockets within a node (only one shown) share DRAM but have private caches – the L3 cache in CPU systems and the L2 cache in Fermi-class systems. Similarly Cores share access to the Socket-level cache, but have private caches (CPU L2, GPU L1/scratchpad). Vector-style parallelism within a core is leveraged via Lanes – SSE-style SIMD instructions on CPUs, or the SIMT-style execution of GPUs.