For 32 bit Linux, the virtual memory layout of a process:
Kernel
Virtual Memory Address Range: 0xc0000000 - 0xffffffff
Process Virtual
Memory Address Range: 0x00000000 - 0xbfffffff
|
Process specific data structures (Different for each process) |
|
Physical memory
(Identical for each process)
|
0xc0000000 ___ |
Kernel code and data
(Identical for each process)
|
|
User stack |
|
(Unmapped)
|
|
Memory mapped region for shared libraries |
brk ___ |
(Unmapped) |
|
Run-time heap (via malloc) |
|
Uninitalized data (.bss) |
|
Initialized data (.data) |
0x08048000__ |
Program text (.text) |
0x00000000___ |
(Unmapped) |
How can multiple process
ALL exist with their program text, stack, etc. at the same virtual
addresses?
Answer: Virtual addresses are NOT
the same as physical addresses.
Each of the logical units (program text, stack, heap, etc) are
are grouped by address ranges (call virtual pages) of fixed size (such as
4K bytes).
Physical memory is also grouped into the same fixed size ranges
(called physical pages).
Then when a virtual page is loaded into physical memory, it can
be loaded into any physical page.
Note that the physical address where a virtual page is loaded
is typically NOT the same as the virtual address.
|
Virtual pages process 1 |
|
vir add: 0x08048000 |
|
vir add: 0x08049000 |
|
vir add: 0x0804a000 |
|
... |
|
Virtual pages process 2 |
|
vir add: 0x08048000 |
|
vir add: 0x08049000 |
|
vir add: 0x0804a000 |
|
... |
Physical Address |
Physical Pages |
0x06000000___ |
proc 1: 0x08048000 |
0x06001000___ |
proc 1: 0x08049000 |
0x06002000___ |
proc 2: 0x08049000 |
0x06003000___ |
free |
0x06004000___ |
proc 2: 0x08048000 |
0x06005000___ |
free |
0x06006000___ |
proc 1: 0x0804a000 |
0x06007000___ |
proc 2: 0x0804a000 |
... ___ |
... |
The operating system's virtual memory management routines are
responsible for loading virtual pages into physical pages.
For each virtual page, it must be possible to find which
physical page contains that virtual page.
Associated with each process is a
page table in which the virtual memory management records this
information.
For the previous example, part of the page tables for process 1
and process 2:
Process 1 |
Virtual page |
Physical page |
0x0804800 |
0x0600000 |
0x0804900 |
0x0601000 |
0x0804a00 |
0x0606000 |
Process 2 |
Virtual page |
Physical page |
0x0804800 |
0x0604000 |
0x0804900 |
0x0602000 |
0x0804a00 |
0x0607000 |
Using virtual addresses in this way involves
translating a virtual address to a physical address on every
processor instruction and data operand fetch.
In order for this to be viable, this translation can't be done
soley in software, which would require additional processor
instructions which would also require translation.
Processors have a memory management unit (MMU) that translates
virtual addresses to the physical address.
So the processor effectively executes machine programs that are
compiled into instructions and data at virtual addresses.
The MMU on the processor chip has to translate virtual addresses
for every process and needs the information in each process's
page table do so.
There isn't enough room on the chip to store each process
table of every processes.
This is a typical problem that calls for using a cache.
The cache for the MMU will contain a portion of the page table
for the currently executing process.
This cache is traditionally called the TLB - translation
lookaside buffer.
+----------------------------------------+
| prog counter |
| (virtual address) |
| | |
processor | MMU ---------->MAR (physical address) |
| |\ |\ |
| | \ | \ |
| TLB \ L1 \ |
| \ \ |
------------+--------\-----------+-\-----------------+
| \ | \ |
physical memory | | | | |
| page tables | instr/data |
| | | |
+-- kernel memory ---+-- process pages --+
+----------------------------------------+
| prog counter |
| (virtual address) |
| | |
processor | MMU ---------->MAR (physical address) |
| |\ |\ |
| | \ | \ |
| TLB \ L1 \ |
| \ \ |
------------+--------\-----------+-\-----------------+
| \ | \ |
physical memory | | | | |
| page tables | instr/data |
| | | |
+-- kernel memory ---+-- process pages --+
+----------------------------------------+
| prog counter |
| (virtual address) |
| | |
processor | MMU ---------->MAR (physical address) |
| |\ |\ |
| | \ | \ |
| TLB \ L1 \ |
| \ \ |
------------+--------\-----------+-\-----------------+
| \ | \ |
physical memory | | | | |
| page tables | instr/data |
| | | |
+-- kernel memory ---+-- process pages --+
+----------------------------------------+
| prog counter |
| (virtual address) |
| | |
processor | MMU ---------->MAR (physical address) |
| |\ |\ |
| | \ | \ |
| TLB \ L1 \ |
| \ \ |
------------+--------\-----------+-\-----------------+
| \ | \ |
physical memory | | | | |
| page tables | instr/data |
| | | |
+-- kernel memory ---+-- process pages --+
+----------------------------------------+
| prog counter |
| (virtual address) |
| | |
processor | MMU ---------->MAR (physical address) |
| |\ |\ |
| | \ | \ |
| TLB \ L1 \ |
| \ \ |
------------+--------\-----------+-\-----------------+
| \ | \ |
physical memory | | | | |
| page tables | instr/data |
| | | |
+-- kernel memory ---+-- process pages --+
+----------------------------------------+
| prog counter |
| (virtual address) |
| | |
processor | MMU ---------->MAR (physical address) |
| |\ |\ |
| | \ | \ |
| TLB \ L1 \ |
| \ \ |
------------+--------\-----------+-\-----------------+
| \ | \ |
physical memory | | | | |
| page tables | instr/data |
| | | |
+-- kernel memory ---+-- process pages --+
+----------------------------------------+
| prog counter |
| (virtual address) |
| | |
processor | MMU ---------->MAR (physical address) |
| |\ |\ |
| | \ | \ |
| TLB \ L1 \ |
| \ \ |
------------+--------\-----------+-\-----------------+
| \ | \ |
physical memory | | | | |
| page tables | instr/data |
| | | |
+-- kernel memory ---+-- process pages --+
Level 1 and level 2 caches (L1 and L2) can provide a very
fast cpu with a copy of a portion of main memory
currently being accessed.
-
L1 cache is more expensive/byte that main memory and takes
up valuable space on the cpu chip.
Accessing the L1 cache is typically almost as fast as
accessing registers.
-
The L2 cache is off the cpu chip. It may take a few cpu
clock cycles to access, but may be connected to
it by a special bus and accessing it is on the order of
100 times faster than accesssing main memory.
-
Cache memories are orgainized into a fixed number of
sets.
-
Each set has a fixed number of blocks of size B bytes.
Each block can hold a copy of B bytes from main
memory.
Each block also has an storage for an associated
tag and a valid bit.
- A cache line consists of the valid bit, the tag and
the data block.
-
The number of cache lines per cache set determines the
associativity of the cache.
A direct mapped cache has 1 line per
set.
A 2-way associative cache has 2 lines per
set.
A fully associative cache has only 1 set and as
many lines as the total cache memory can hold.
Lookup in a cache for contents of a given address depends
on:
Notation |
Item |
S |
Number of sets |
B |
Cache line block size |
E |
Number of lines per cache set |
The number of lines per set determines (and is determined by)
the total data capacity of the cache, C:
C = S * E * B (Sets * Lines/Set * Block Data size of each Line )
To check if the cache holds the contents of an address, the
address is partitioned into three parts.
For example, for a direct mapped cache with 512 sets and a
block size of 32 bytes and 32 bit addresses, the cache data capacity
is 512 * 1 * 32 = 4096 = 4K bytes and
Parameter |
Size |
Number of address bits |
S |
512 |
9 |
B |
32 |
5 |
Tag |
- |
18 |
The address is partioned into
Tag |
Set |
Block Offset |
bits 31 - 14 |
bits 13 - 5 |
bits 4 - 0 |
Lookup a 4 byte integer at address 0x08048088. Assume the
machine is little endian.
hex > 0x08048088
binary > 0000 1000 0000 0100 1000 0000 1000 1000
T:S:B > 0000 1000 0000 0100 10:00 0000 100:0 1000
T > 00 0010 0000 0001 0010 = 0x02012
S > 0 0000 0100 = 0x004 = 4
B > 0 1000 = 0x08
So look in set S = 4 and compare the tag in the (only) line
of set 4 with 0x02012.
Suppose the first 16 bytes of the data block of the line
in set 4 of the cache is
Tag: 0x2012 |
Block data: 01 02 00 00 FF FF FF FF EF BE AD DE 01 02 03 04 |
So address 0x08048090 is a cache hit
Since B = 8, the integer data starts at byte offset 8 in the
block data.
Integers are 4 bytes and the bytes are stored
in reverse order on little endian machines.
So the integer value is 0xDEADBEEF
-
If the cache line had been:
Tag: 0x2080 |
Block data: 01 02 00 00 FF FF FF FF EF BE AD DE 01 02 03 04 |
The tag would not match and address 0x08048090 would
be a cache miss.
The key to making this scheme possible is:
1. Compilers still generate code as if the program (segments) are to
be loaded into contiguous blocks of storage.
2. This means the PC will still work as before - after fetching an
instruction, the PC is incremented.
3. The executable program is also thought of being split into pages
of the same size as used for physical memory (e.g., 1K byte
pages). These are called virtual pages as opposed to the
physical pages of memory.
4. A virtual page is loaded into any free physical page. (A
data structure, the page table must record for each virtual
page number, the physical page number where it was stored.)
5. The virtual address in the PC is not simply copied into the MAR,
however. It is translated by the hardware memory management unit
(MMU) in the cpu to the corresponding physical address where the
code instructin is actually located. The MMU must have the page
table information in order to do this.
The following notation is used in connection with translation
from virtual to physical addresses:
Address |
Cache |
virtual |
TLB |
VPN |
Virtual page number |
VPO |
Virtual page offset (in bytes) |
TLBI |
TLBI index |
TLBT |
TLB tag |
After translating the virtual address to a physical address,
another cache is checked to see if it contains the memory contents of the
physical address. This notation is used:
Address |
Cache |
physical |
L1 |
(physical adddresses) |
PPN |
physical page number |
PPO |
physical page offset (PPO = VPO) |
CI |
Cache index |
CO |
Byte offset in cache block |
CT |
Cache tag |
This example uses the following assumptions (See practice
problem 10.4):
- Virtual address size: 14 bits
- Physical address size: 12 bits
- Page size: 64 bytes
-
4-way-associative TLB with 4 sets (each line contains one
page table entry - PTE)
Problem: Translate virtual address 0x03d7 to a physical address
Virtual Address Format
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
VPN |
|
VPO |
Physical Address Format
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
PPN |
|
PPO |
First write 0x03d7 in binary:
0000 0011 1101 0111 (but this is 16 bits, so discard left 2 bits)
and partition the bits into the VPN and VPO parts:
VPN VPO
00001111 010111
Now convert VPN and VPO back to hex
VPN = 0000 1111 = 0x0F
VPO = 01 0111 = 0x17
The VPN is the index into the page table for the current
process.
However, the page table is in memory.
We do not want to have to access memory just to translate the
virtual address!
So first see if the page table entry we need is in the TLB
cache in the CPU.
VPN = 00001111 = TLBT : TLBI = 000011 : 11
There are 4 sets: 0,1,2,3. The right 2 bits of the VPN form the
TLB index, which is the same as the set number.
So the TLBI = 11 (binary) = 3 (in decimal).
The TLB tag must be compared with the 4 entries in set 3.
(Remember that the TLB is a 4-way-associative cache of page
table entries.)
TLBT = 000011 = 00 0011 = 0x03
The four tags for set 3 are
Tag PPN Valid
07 - 0
03 0D 1
0A 34 1
02 - 0
So the physical page number, PPN= 0x0D.
This information is also in the page table in memory, but if we
have a TLB hit, we avoid having to access memory for the PTE.
VPN = 0x0F, VPO =
0x17
Entry at index 0x0F is valid, so PPN = 0x0D
PPO always = VPO, so the physical address is PPN:PPO.
VPN |
PPN |
Valid |
00 |
28 |
1 |
01 |
- |
0 |
02 | 33 | 1 |
03 | 02 | 1 |
04 | - | 0 |
05 | 16 | 1 |
06 | - | 0 |
07 | - | 0 |
08 | 13 | 1 |
09 | 17 | 1 |
0A | 09 | 1 |
0B | - | 0 |
0C | - | 0 |
0D | 2D | 1 |
0E | 11 | 1 |
0F | 0D | 1 |
PPN = 0x0D, PPO = VPO = 0x17.
But we have to concatenate these bits to get the physical
address = PPN:PPO and remember that PPN is 6 bits and PPO is 6 bits
PPN = 0x0D = 0000 1101 (but discard left 2 bits) = 00 1101
PPO = 0x17 = 0001 0111 (but discard left 2 bits) = 01 0111
Physical address = PPN:PPO = 00 1101 01 0111 = 0011 0101 0111 = 0x357
0000 11
The physical address 0x357 was determined by the MMU hardware
in the cpu from the virtual address since there was a hit
in the TLB for the page table entry.
Now a lookup in the L1 cache would check to see if the contents
of the physical address are available (a hit in the L1 cache).
Summary:
-
The TLB cache in the CPU is used to try to translate the
virtual address without having to access the page table
in memory.
-
After the virtual address has been translated, the L1 cache
is used to try to access the data at the physical
address without having to access the actual physical
location in main memory.
If a cache miss occurs in either case, memory must be
accessed. (In this case the corresponding cache is updated.)
Problem: Look in the L1 cache for the byte contents of the
physical address just found: 0x357
Physical address: 0x357 = 0011 0101 0111
CT = 0011 01 = 0x0D
CI = 01 01 = 0x5
CO = 11 = 0x3
L1 cache:
Idx |
Tag |
Valid |
Blk 0 |
Blk 1 |
Blk 2 |
Blk 3 |
0 |
19 |
1 |
99 |
11 |
12 |
11 |
1 |
15 |
0 |
- |
- |
- |
- |
2 |
1B |
1 |
00 |
02 |
04 |
08 |
3 |
36 |
0 |
- |
- |
- |
- |
4 |
32 |
1 |
43 |
6D |
8F |
09 |
5 |
0D |
1 |
36 |
72 |
F0 |
1D |
6 |
31 |
0 |
- |
- |
- |
- |
7 |
16 |
1 |
11 |
C2 |
DF |
03 |
8 |
24 |
1 |
3A |
00 |
51 |
89 |
9 |
2D |
0 |
- |
- |
- |
- |
A |
2D |
1 |
93 |
15 |
DA |
3B |
B |
0B |
0 |
- |
- |
- |
- |
C |
12 |
0 |
- |
- |
- |
- |
D |
16 |
1 |
04 |
96 |
34 |
15 |
E |
13 |
1 |
83 |
77 |
1B |
D3 |
F |
14 |
0 |
- |
- |
- |
- |