Reaves.dev

v0.1.0

built using

Phoenix v1.7.12

Memory Systems Refresher

Stephen M. Reaves

::

2024-01-13

Refresher about Memory Systems for CS-6210

Summary

Naive Memory Model

CPU can talk to memory or disk.

Memory faster than disk

Cache Motivation

for (i = 0; i < N; i++)
  sum += a[i];

sum is a local variable, in cpu register, fast to access

a[i] may be in main memory, slower to access

We can put a faster cache, closer to cpu and store a[i] there

If data is in cache, we call it a “hit”, otherwise its a “miss”

Data Costs

Suppose data retrieving costs

How many cycles does a process with a 99% hit rate spend on avg?

90%?

Memory Hierarchy

There’s always a trade-off between speed and capacity

Locality and Cache Blocks

Locality assumes if a program needs one particular memory address, it will need the nearby addresses soon

Temporal locality refers to the tendency to access the same memory close in time

Spatial locality refers to the tendency to access the same memory close in terms of address

Direct Mapping

Direct memory to cache mapping

Suppose a 5 bit address space, fill in the table

IndexTagIndexOffset
00010xFC0x31
01100x120x82
10100x570x48
11000x630x99
AddressHit/MissValue
1000100
0011100
1010000
0111100
answer
AddressHit/MissValueReason
10001Miss
00111Hit0x99Index of 11, tag of 00, offset of 1
10100Hit0x57Index of 10, tag of 10, offset of 0
01111Miss

How many bits

22 bit adress space

512 byte cache

64 byte block size

How many bits are used for the index?

How many bits are used for the tag?

Set Associative Mapping

Associating memory addresses with more than one block in the cache.

Ignores most significant bit of index (for 2-way)

Allows a cache block to be stored in two possible places

Set assoc. cache mapping

Fully Associative Mapping

Fully assoc. cache mapping

Write Policy

Hit:

Miss:

Virtual Memory Abstraction

Key benefit is to trick process into thinking it has a memory space all to itself

Adress Translation

A layer of indirection allows processes to not overwrite variables

Also allows for more virtual memory than phsyical memory, by using disk

Paging

Paging is the granularity to which we grant memory to processes

Consider a 16-bit virtual address space, and 512 byte page size, how many virtual page numbers are there?

log2(512)=9 log_2(512) = 9 bits for offset

2169=27=128 2^{16-9} = 2^7 = 128

Page Table Implementation

Page tables are layered in a hierarchy of page tables, with the top one called the page directory, to save on memory space

Accelerating Address Translation

Table Lookaside Buffer (TLB) is special cache on CPUs to house page table

Page Fault

If a process tries to read something that’s not in memory, a page fault happens

Putting it all together

Psuedocode

def load(virtual_addr):
  try:
    physical_addr = translate_addr(virtual_addr)
  except page_fault:
    handle_page_fault(virtual_addr)
    return load(virtual_addr)

  if is_in_cache(physical_addr):
    return read_from_cache(phyiscal_addr)
  else:
    return read_from_memory(physical_addr)


def translate_addr(virtual_addr):
  if is_in_tlb(virtual_addr):
    return read_from_tlb(virtual_addr)
  elif is_access_violation(virtual_addr):
    raise access_violation
  elif is_present(virtual_addr):
    return read_from_page_table(virtual_addr)
  else:
    raise page_fault

Virtually Indexed, Physically Tagged Caches

virtuallyIndexedPhysicallyTaggedCaches

Suppose 32-bit virtual and physical address spaces. Pages are 4KB and cache blocks are 64B. What’s the maximum number of entries in a virtually indexed, physically tagged cache?

4k page size => 12 bit page offset

64B block size => 6 bit cache offset

12 - 6 = 6, 2^6 = 64