Computer Systems A Programmer's Perspective Second Edition

To this point in our study of systems, we have relied on a simple model of a


computer system as a CPU that executes instructions and a memory system that


holds instructions and data for the CPU. In our simple model, the memory system


is a linear array of bytes, and the CPU can access each memory location in a


constant amount of time. While this is an effective model as far as it goes, it does


not reflect the way that modern systems really work.


 


In practice, a memory system is a hierarchy of storage devices with different


capacities, costs, and access times. CPU registers hold the most frequently used


data. Small, fast cache memories nearby the CPU act as staging areas for a subset


of the data and instructions stored in the relatively slow main memory. The main


memory stages data stored on large, slow disks, which in turn often serve as


staging areas for data stored on the disks or tapes of other machines connected by


networks.


 


Memory hierarchies work because well-written programs tend to access the


storage at any particular level more frequently than they access the storage at the


next lower level. So the storage at the next level can be slower, and thus larger


and cheaper per bit. The overall effect is a large pool of memory that costs as


much as the cheap storage near the bottom of the hierarchy, but that serves data


to programs at the rate of the fast storage near the top of the hierarchy.


 


As a programmer, you need to understand the memory hierarchy because it


has a big impact on the performance of your applications. If the data your program


needs are stored in a CPU register, then they can be accessed in zero cycles during


the execution of the instruction. If stored in a cache, 1 to 30 cycles. If stored in main


memory, 50 to 200 cycles. And if stored in disk tens of millions of cycles!


 


Here, then, is a fundamental and enduring idea in computer systems: if you


understand how the system moves data up and down the memory hierarchy, then


you can write your application programs so that their data items are stored higher


in the hierarchy, where the CPU can access them more quickly.


 


This idea centers around a fundamental property of computer programs


known as locality. Programs with good locality tend to access the same set of


data items over and over again, or they tend to access sets of nearby data items.


Programs with good locality tend to access more data items from the upper levels


of the memory hierarchy than programs with poor locality, and thus run faster. For


example, the running times of different matrix multiplication kernels that perform


the same number of arithmetic operations, but have different degrees of locality,


can vary by a factor of 20!