CSC374 Oct04

slide version

single file version

Contents

  1. Today
  2. L1 and L2 Caches
  3. L1 and L2 Cache Memory Organization
  4. Cache Memory Parameters
  5. Cache Memory Lookup
  6. Example
  7. Memory Management Before Virtual Memory
  8. Memory Allocation/Deallocation
  9. Memory Management Data Structures
  10. Internal Fragmentation
  11. External Fragmentation
  12. Memory Allocation Algorithms
  13. Example
  14. Summary of Memory Management (without Virtual Memory)
  15. Virtual Memory (using Pages)
  16. Address Translation
  17. Example
  18. Page Table for the Example
  19. Notation
  20. Example: Address Translation
  21. Example: Address Formats
  22. VPN and VPO
  23. TLB Lookup
  24. VPN is the index into Page Table
  25. Physical Address
  26. Data at Physical Address
  27. L1 Cache Lookup for Physical Address
  28. Page Faults
  29. Page Faults
  30. Page Fault Handling
  31. Heap Management
  32. Heap Size
  33. malloc and free
  34. Heap Management
  35. Requirements

Today[1] [top]

L1 and L2 Caches[2] [top]

Level 1 and level 2 caches (L1 and L2) can provide a very fast cpu with a copy of a portion of main memory currently being accessed.

L1 and L2 Cache Memory Organization[3] [top]

Cache Memory Parameters[4] [top]

Lookup in a cache for contents of a given address depends on:

Notation Item
S Number of sets
B Cache line block size
E Number of lines per cache set

The number of lines per set determines (and is determined by) the total data capacity of the cache, C:

        C = S * E * B  (Sets * Lines/Set * Block Data size of each Line )
      

Cache Memory Lookup[5] [top]

To check if the cache holds the contents of an address, the address is partitioned into three parts.

For example, for a direct mapped cache with 512 sets and a block size of 32 bytes and 32 bit addresses, the cache data capacity is 512 * 1 * 32 = 4096 = 4K bytes and

Parameter Size Number of address bits
S 512 9
B 32 5
Tag - 18

The address is partioned into

Tag Set Block Offset
bits 31 - 14 bits 13 - 5 bits 4 - 0

Example[6] [top]

Lookup a 4 byte integer at address 0x08048088. Assume the machine is little endian.

	 hex > 0x08048088
      binary > 0000 1000 0000 0100 1000 0000 1000 1000
      T:S:B  > 0000 1000 0000 0100 10:00 0000 100:0 1000
	  T  > 00 0010 0000 0001 0010 = 0x02012
	  S  > 0 0000 0100 = 0x004 = 4
	  B  > 0 1000 = 0x08
      

So look in set S = 4 and compare the tag in the (only) line of set 4 with 0x02012.

  1. Suppose the first 16 bytes of the data block of the line in set 4 of the cache is

    Tag: 0x2012 Block data: 01 02 00 00 FF FF FF FF EF BE AD DE 01 02 03 04

    So address 0x08048090 is a cache hit

    Since B = 8, the integer data starts at byte offset 8 in the block data.

    Integers are 4 bytes and the bytes are stored in reverse order on little endian machines.

    So the integer value is 0xDEADBEEF

  2. If the cache line had been:

    Tag: 0x2080 Block data: 01 02 00 00 FF FF FF FF EF BE AD DE 01 02 03 04

    The tag would not match and address 0x08048090 would be a cache miss.

Memory Management Before Virtual Memory[7] [top]

Memory Allocation/Deallocation[8] [top]

Programs must therefore be loaded into contiguous memory blocks large enough to hold the entire program.

For an operating system embedded in a single purpose device with no external storage, this may be ok.

In a multiprogramming operating system, many processes will be loaded into memory.

Memory allocation occurs when a process is created and deallocation occurs when the process terminates.

However, the order of deallocation is not predictable from the order of allocation!

Memory Management Data Structures[9] [top]

So in general, there must be a free list of the memory blocks not in use and available for allocation.

This list needs to keep track of the beginning address of each free memory block and the size of the block.

Memory allocation will select a free memory block (or part of one) which is big enough for the memory request and remove the block (or part of it) from the free list.

Process termination requires returning the memory to the free list.

Internal Fragmentation[10] [top]

Internal fragmentation is memory that is allocated to a process but not used (for data or code).

This may occur if memory is allocated in minimum units such as multiples of 4K bytes. A program that requries 150K + 1 bytes would be allocated 154K bytes. Of this 4095 bytes (4k - 1 bytes) would be internal fragmentation.

Internal fragmentation represents some inefficiency in the use of memory.

External Fragmentation[11] [top]

External fragmentation is free memory that is broken into contiguous memory blocks so small that they cannot contain any program and so cannot be allocated.

Memory Allocation Algorithms[12] [top]

Three typical memory allocation algorithms are:

        First Fit  Memory blocks are ordered. 
                   Choose the first block large enough.

        Best Fit   Choose the smallest free block which is large
                   enough.

        Worst Fit  Choose the largest block.
    

In all cases, if the chosen free block is not the exact size, split off the amount needed, leaving a smaller free block of the extra amount. This eliminates all internal fragmentation.

Unfortunately, these allocation strategies cannot avoid external fragmentation.

Example[13] [top]

A memory management system is working with a total memory size of 700K and uses contiguous allocation and variable sized partitions so that no internal fragmentation occurs. It currently has three free blocks available. Their sizes are 50K, 155K, and 100K, respectively, and their location in the memory is as below. The order of the free list is just the increasing order of the beginning addresses of the free blocks.

0      --- increasing addresses -->          700K
+-------+-----+------+------+------+------+------+
| used  | 50K | used | 155K | used | 100K | used |
+-------+-----+------+------+------+------+------+
      
RequestFirst-FitBest-Fit
 50,155,10050,155,100
9050,65,10050,155,10
10050,6550,55,10
6050, 5wait

At the third request (for 60K), both First Fit and Best Fit has a free list with a total size of 115K, more that enough. However, for Best Fit this 115K is broken (fragmented) into three pieces of size 50K, 55K, and 10K. None of these 3 blocks is big enough for the request. So the request must wait until some process terminates and releases more memory.

Summary of Memory Management (without Virtual Memory)[14] [top]

Memory must be allocated in contiguous memory blocks large enough to hold the entire program (or at least each entire segment - code, data, etc.)

Internal and external memory fragmentation can result. Each of these represents unused memory which limits the number and/or size of simultaneous processes.

Memory allocation algorithms that split off the exact memory request size from free blocks can eliminate internal fragmentation.

However, these algorithms (first fit, best fit, worst fit) all suffer from external fragmentation.

Virtual Memory (using Pages)[15] [top]

Virtual memory removes the requirement that segments must be in contiguous memory blocks.

A consequence is that external fragmentation is eliminated.

Main idea: Physical memory is though of as being split into fixed sized pages. All pages have the same size. Typically this is on the order of 1K bytes.

If a request for 60K is made, then this would require 60 pages (of size 1K each). Any 60 free pages can be allocated. The 60 free pages do not have to form one contiguous block of 60K bytes. They can be scattered anywhere in physical memory.

Address Translation[16] [top]

The key to making this scheme possible is:

1. Compilers still generate code as if the program (segments) are to
   be loaded into contiguous blocks of storage.
2. This means the PC will still work as before - after fetching an
   instruction, the PC is incremented.
3. The executable program is also thought of being split into pages
   of the same size as used for physical memory (e.g., 1K byte
   pages). These are called virtual pages as opposed to the
   physical pages of memory.
4. A virtual page is loaded into any free physical page. (A
   data structure, the page table must record for each virtual
   page number, the physical page number where it was stored.)
5. The virtual address in the PC is not simply copied into the MAR,
   however. It is translated by the hardware memory management unit
   (MMU) in the cpu to the corresponding physical address where the
   code instructin is actually located. The MMU must have the page
   table information in order to do this. 
      

Example[17] [top]

Assume the page size is 100 bytes for this example. We divide the program into pages and memory is already divided into page frames. Suppose that free list of frames (i.e., physical pages) is (0,1,2,4,6,7,8). We can use the first five to hold our program:


Virtual   Program                     Physical Memory
page                         page frame  contents
        +--------+                      +--------+
 0      | page 0 |              0       | page 1 |
        +--------+                      +--------+
 1      | page 1 |              1       | page 0 |
        +--------+                      +--------+
 2      | page 2 |              2       | page 2 |
        +--------+                      +--------+
 3      | page 3 |              3       |        |
        +--------+                      +--------+
 4      | page 4 |              4       | page 3 |
        +--------+                      +--------+
                                5       |        |              
                                        +--------+
                                6       | page 4 |
                                        +--------+
                                7       |        |
                                        +--------+
                                8       |        |
                                        +--------+

      

Page Table for the Example[18] [top]

Virtual page 0 is in physical page (or page frame) 1
Virtual page 1 is in physical page (or page frame) 0
Virtual page 2 is in physical page (or page frame) 2
Virtual page 3 is in physical page (or page frame) 4
Virtual page 4 is in physical page (or page frame) 6

               Page Table
  Page    Frame No. Protection*
         +---------+---------+
    0    |    1    |    N    |
         +---------+---------+  
    1    |    0    |    R    |
         +---------+---------+
    2    |    2    |    R    |
         +---------+---------+
    3    |    4    |    W    |
         +---------+---------+
    4    |    6    |    W    |
         +---------+---------+

Protection: R = Read only  (e.g. code)
            W = Read or Write allowed for this page (e.g. data)
            N = Neither Read or Write access (to catch address errors)

      

Notation[19] [top]

The following notation is used in connection with translation from virtual to physical addresses:

Address Cache
virtual TLB
VPN Virtual page number
VPO Virtual page offset (in bytes)
TLBI TLBI index
TLBT TLB tag

After translating the virtual address to a physical address, another cache is checked to see if it contains the memory contents of the physical address. This notation is used:

Address Cache
physical L1
(physical adddresses)
PPN physical page number
PPO physical page offset (PPO = VPO)
CI Cache index
CO Byte offset in cache block
CT Cache tag

Example: Address Translation[20] [top]

This example uses the following assumptions (See practice problem 10.4):

Problem: Translate virtual address 0x03d7 to a physical address

Example: Address Formats[21] [top]

Virtual Address Format

13 12 11 10  9  8  7  6  5  4  3  2  1  0
                           
  VPN   VPO

Physical Address Format

11 10  9  8  7  6  5  4  3  2  1  0
                       
  PPN   PPO

VPN and VPO[22] [top]

First write 0x03d7 in binary:

      0000 0011 1101 0111 (but this is 16 bits, so discard left 2 bits)
    

and partition the bits into the VPN and VPO parts:

         VPN      VPO
       00001111 010111
    

Now convert VPN and VPO back to hex

      VPN = 0000 1111 = 0x0F
      VPO =   01 0111 = 0x17
    

TLB Lookup[23] [top]

The VPN is the index into the page table for the current process.

However, the page table is in memory.

We do not want to have to access memory just to translate the virtual address!

So first see if the page table entry we need is in the TLB cache in the CPU.

      VPN = 00001111 = TLBT : TLBI = 000011 : 11
    

There are 4 sets: 0,1,2,3. The right 2 bits of the VPN form the TLB index, which is the same as the set number.

So the TLBI = 11 (binary) = 3 (in decimal).

The TLB tag must be compared with the 4 entries in set 3.

(Remember that the TLB is a 4-way-associative cache of page table entries.)

      TLBT = 000011 = 00 0011 = 0x03
    

The four tags for set 3 are

      Tag PPN Valid
      07   -    0
      03   0D   1
      0A   34   1
      02   -    0
    

So the physical page number, PPN= 0x0D.

This information is also in the page table in memory, but if we have a TLB hit, we avoid having to access memory for the PTE.

VPN is the index into Page Table [24] [top]

VPN = 0x0F, VPO = 0x17

Entry at index 0x0F is valid, so PPN = 0x0D

PPO always = VPO, so the physical address is PPN:PPO.

VPN PPN Valid
00 28 1
01 - 0
02 33 1
03 02 1
04 - 0
05 16 1
06 - 0
07 - 0
08 13 1
09 17 1
0A 09 1
0B - 0
0C - 0
0D 2D 1
0E 11 1
0F 0D 1

Physical Address[25] [top]

PPN = 0x0D, PPO = VPO = 0x17.

But we have to concatenate these bits to get the physical address = PPN:PPO and remember that PPN is 6 bits and PPO is 6 bits

      PPN = 0x0D = 0000 1101 (but discard left 2 bits) = 00 1101
      PPO = 0x17 = 0001 0111 (but discard left 2 bits) = 01 0111

      Physical address = PPN:PPO = 00 1101 01 0111 =  0011 0101 0111 = 0x357
     0000 11
    

Data at Physical Address[26] [top]

The physical address 0x357 was determined by the MMU hardware in the cpu from the virtual address since there was a hit in the TLB for the page table entry.

Now a lookup in the L1 cache would check to see if the contents of the physical address are available (a hit in the L1 cache).

Summary:

If a cache miss occurs in either case, memory must be accessed. (In this case the corresponding cache is updated.)

L1 Cache Lookup for Physical Address[27] [top]

Problem: Look in the L1 cache for the byte contents of the physical address just found: 0x357

Physical address: 0x357 = 0011 0101 0111

CT = 0011 01 = 0x0D
CI = 01 01 = 0x5
CO = 11 = 0x3

L1 cache:

Idx Tag Valid Blk 0 Blk 1 Blk 2 Blk 3
0 19 1 99 11 12 11
1 15 0 - - - -
2 1B 1 00 02 04 08
3 36 0 - - - -
4 32 1 43 6D 8F 09
5 0D 1 36 72 F0 1D
6 31 0 - - - -
7 16 1 11 C2 DF 03
8 24 1 3A 00 51 89
9 2D 0 - - - -
A 2D 1 93 15 DA 3B
B 0B 0 - - - -
C 12 0 - - - -
D 16 1 04 96 34 15
E 13 1 83 77 1B D3
F 14 0 - - - -

Page Faults[28] [top]

A page fault occurs (1) when a page is referenced which is part of the process's code or data regions, but is not currently loaded in memory or (2) when a page is referenced which is not in the process's code or data regions.

If VPN had been 07 and set 3, we would have had a TLB miss.

The page table entry for VPN 07 would have to be fetched from memory.

But that page table entry is

      VPN  PPN  Valid
      07    -     0
    

The page is NOT in memory. So the MMU hardware generates a page fault.

The operating system page fault handler then must determine if the page belongs to any of the process's code or data regions. For now assume it does.

The page fault handler then must fetch the page from disk! (~100,000 times slower)

Then the Page Table must be updated and the entry changed to valid.

The TLB cache is also updated.

Then what?

Page Faults[29] [top]

A page fault happens in the fetch step of the cpu cycle when the next instruction is to be loaded from a memory address and the page containing that part of the code is not in memory.

It can also occur when an instruction's operands are being fetched just prior to execution of the instruction.

Or it can happen when an instruction is being excuted in the cpu cycle - for example, an store instruction that moves the value in a cpu register to a (virtual) memory location, but that virtual memory location is not currently in memory.

What happens next?

Page Fault Handling[30] [top]

The sequence of actions associated with a page fault are:

  1. The page fault is detected by the MMU hardware.

  2. The MMU hardware loads page fault handler PC/PSW from the interrupt vector (similar to interrupts)

  3. After the page fault handler has fetched the page (possibly replacing some page) and recorded the changes in the page table, it must adjust the interrupted user's PC. It must be set back to the beginning of the instruction since the instruction did not execute.

  4. Finally, the page fault handler can return control to the user, program, which will again attempt to execute the instruction that caused the page fault.

We would clearly like to minimize the number of page faults and avoid the extra time overhead necessary to handle page faults.

Heap Management[31] [top]

Before virtual memory, physical memory had to be contiguous and memory management needed to minimize external fragmentation.

With virtual memory, physical memory for a segment no longer has to be contiguous.

So the problem of fragmentation of physical memory goes away!

New Problem: Each segment of virtual memory must be contiguous in the virtual address space of a process.

This is not a problem for most segments, but it is a problem for the heap.

Heap Size[32] [top]

The kernel keeps track of a break address that is the end of the heap segment for each process. (in a variable named brk).

There are system calls to increase heap segment, which increase the break address.


    #include <unistd.h>

    int brk(void *end_data_segment); // sets the "brk" value 

    void *sbrk(intptr_t increment);  // adds increment to "brk" value

malloc and free[33] [top]

The memory map function is one way to dynamically create a new chunk of memory (as a program is running) as a new virtual memory segment.

More commonly applications have (and continue to) use functions malloc and free (or new and delete) to dynamically allocate memory from the heap segment.

#include <stdlib.h>

void *calloc(size_t nmemb, size_t sz);
void *malloc(size_t sz);
void free(void *ptr);
void *realloc(void *ptr, size_t sz);

The memory allocation functions (calloc, malloc, and realloc) all simply specify a size (in bytes) or in the case of calloc, an array of nmemb elements, each of size sz.

Heap Management[34] [top]

The implementation of calloc is not significantly different than malloc.

The realloc function does require a few additional details beyond malloc.

The main issues in heap management arise with the two functions:

  1. malloc
  2. free

Requirements[35] [top]

These two requirements are in conflict!