A process is represented by a data structure (i.e., a struct or class)- the process control block (PCB).

A thread in a process is also represented by another data structure (i.e., a struct or class) - thread control block (TCB).

A process will have have one or more threads. The PCB may have a pointer to a list of thread control blocks.

The context of a thread is represented by its

stack
register state
process memory
open files
...

The TCB will need to provide access to all components of the thread context.

Thread Switching [4] [top]

In implementing threads, it will be necessary to transfer control from one thread to another.

CurrentThread is global pointer to thread control block for executing thread and executes switch below to give control to another thread pointed to by next_thread.


 void switch(thread_t *next_thread)
 {
    CurrentThread->SP = SP;
    CurrentThread = next_thread;
    SP = CurrentThread->SP;
    return;
 }

Coroutines [5] [top]

It may be desirable for an executing thread to give up the processor to allow another thread to execute.

It might do this by calling a function: thread_yield(). Another thread executes and may itself call thread_yield() causing the first thread to execute again returning from its call to thread_yield().

The threads are not calling each other, as they are mostly independently executing. But this transfer of control is known as coroutine execution.

Which other thread executes when thread_yield() is called?

How is control (i.e. the processor) transferred to the other thread?

System Calls [6] [top]

A simple model of system call implementation on the IA32 architecture:

The operating system has a table consisting of exception handlers (and interrupt handlers)

Each entry consists of a
(1) handler function address and
(2) a set of flags (as an int)

Each system call has a corresponding system call number. For example suppose getpid() has system call number 32.

The library routine for getpid() that is linked into a user program has code like this:


 move $32,%eax   ; assume 32 is the number for getpid() - no parameters
 int  $0x80      ; invoke trap instruction

The trap instruction saves the call user's context (registers including the flags register, %esp, %eip...) AND loads the %eip and flags register from the entry in the exception table corresponding to 0x80 AND switches to the kernel stack for this user.

Note that the processor will be in kernel mode because of the new flags register value and will be executing kernel code because of the %eip register value.

The handler for entry 0x80 starts executing in kernel mode.

This handler is for system calls. It just checks %eax and then indexes into another table (using the value 32 in %eax) to determine what kernel function is being called - do_getpid() in this case and calls this function indirectly using the table entry.

The kernel is then executing do_getpid() in the context of the current thread (and process). A global pointer references the currently executing current thread. So it can look up the information (pid in this case) and arrange to return that value.

Returning is a bit different than returning from an oridnary call as the registers need to be reloaded with the user's saved registers to restore the user to the user stack and frame. But also the user's flag register value must be reloaded to return to user mode.

Interrupts [7] [top]

Interrupts use the same exception handler table.

The same mode switch occurs when an interrupt occurs. However, instead of being caused by the trap instrucution, the interrupt is signal is sent by an external device to the processor (INTR line into the processor). The hardware processor then saves the currently executing thread's context in its TCB, acknowledges the interrupt and receives an integer value indicating which handler to use. This integer is then used in the same way as the 0x80 argument to lookup the handler and begin its execution.

I/O and Device Drivers [8] [top]

The simple model describe in the text:

PIO devices (programmed I/O)
DMA devices (direct memory access)

PIO [9] [top]

programmed I/O

Write a byte to the device (e.g. a terminal)

Store the byte in the write register
Set GoW in the control register
Check the RdyW bit in status register until it is set

Store the byte in the write register
Set GoW and IEW bits in the control register
Be notified when the write completes (by device interrupt)

DMA [10] [top]

direct memory access I/O

Write contents of a buffer to the disk:

Set disk address in the device address register
Set the buffer address in the memory address register
Set the op code (write) and the GO and IE bits in the control register

Linking and Loading [11] [top]

An executable program will may be built from several source files and will also include code from library routines.

The C compiler does not know the final address of functions defined in these other files.

Similarly if a global pointer is initialized with the address of another global variable, the compiler will not know the final value of this address to use to initialize the pointer.

The object files created by the compiler must contain information about which items are missing. The linker/loader then uses this information to paste the object files together, fix the missing information, and produce the single executable file.

This describes static linking. The library routines are included by the linker/loader and are a part of the final executable file.

Share Libraries [12] [top]

Shared libraries (.so files in Linux and .dll files in Windows) are loaded dynamically at the beginning or perhaps later in execution. So they are not included directly in the executable file.

Advantages

Take up less disk space.

Take up less memory space during execution.

Less memory space is used if only one copy of the shared library code is loaded into memory when multiple processes are using the library code. The code (but not data) can be shared, right?.

But what about relocation information. Won't it be different for different processes?

Does the shared library code have to be loaded into the same address in each process's address space? (Have the linker/loader fix all missing addresses assuming the library will always be loaded at the same fixed address in each processes address space.)
Make the relocation unnecessary so that it can be loaded anywhere in memory without requiring any modification.
This is a bit tricky. Code written this way is called position-independent-code (PIC).

The technique for producing PIC is to use indirection. The PIC code refers indirectly to addresses of routines, etc. to use in the shared library through a table. Each process can now have different table values with the correct addresses of where the 1 copy of the shared library is mapped into that process's address space.

Simple Operating System Design [13] [top]

Devices, Terminals, Network Communication

(Read section 4.1.1 - 4.1.2)

Processes and Threads [14] [top]

A thread may be just created, running, blocked, terminated but not joined, etc.

Each of these different situations is referred to as the thread state. The thread control block records the state and uses it to know what transitions between states are allowed or appropriate.

Abstractly we might describe the possible thread states as in chapter 4:

Runnable
Running
Waiting
Terminated

Thread control is responsible for implementing the transitions between the states and updating the states as threads proceed from creation to destruction.

Thread Implementations [15] [top]

Creating and implementing processes requires system calls; e.g., fork() in Linux. Why?

Similarly, creating and managing threads would seem to require system calls.

This would be the one-level model for implementing threads.

All data structures (Thread Control Blocks) would be in the kernel image and functions to manage threads would all be kernel code and would require system calls by user programs.

All these operations would occur in the kernel:

creation
termination
synchronization
scheduling

Disadvantage of the One-Level Implementation [16] [top]

Overhead of pthread_mutex_lock(&m) if m is not locked. Should be very fast and return immediately.

Windows has mutex objects but it also has additional methods that are also functionally equivalent to pthread's mutexex:

EnterCriticalSection
LeaveCriticalSection

Windows essentially uses the one-level model, BUT the EnterCriticalSection and LeaveCriticalSection methods are implemented (at least partially) in user level library.

Claim: Using EnterCriticalSection/LeaveCriticalSection is 20 times faster than using Windows mutexes when no waiting is necessary.

Note: If the critical section is locked, EnterCriticalSection must make a system call to block the calling thread.

Two-Level Model - Single Kernel Thread [17] [top]

User level library code provides a substantial part of the thread implementation.

However, the operating system views a multithreaded user program (using this user library) as being a single threaded process and schedules the process.

The user library must provide its own scheduler to schedule the multiple threads within this process.

Advantage: Since the thread library schedules and synchronizes the multiple threads within the process, the overhead of system calls is avoided.

Disadvantage: If some thread in the process makes a system call (not one of the thread library calls) and blocks, then none of other threads will execute even if they are "runnable".

This model is referred to as having a single "kernel" thread. Presumably this just means that the operating system views the process as being a single threaded process and the kernel only keeps track of scheduling the process.

Multiple Kernel Threads [18] [top]

What does this mean?

In this model we still have a user level thread library but the operating system now views a user process as potentially multithreaded. So the operating system may now schedules multiple threads that area all part of one process. However, the correspondence is not necessarily one-to-one. For example, more library user level threads may be created than the kernel schedules. E.g., a user program uses the user level library to create 6 threads. The user level library makes system calls and asks the kernel for 3 threads (scheduled by the kernel).

The user library then does its own scheduling of the 6 user level threads using the 3 threads known to the kernel. If one of the 6 threads makes a system call (not a thread library call) and blocks, then kernel changes the state of the thread to blocked. But that leaves 2 threads the kernel may still schedule. So the other 5 threads in the user process may still have a chance to execute.

Solaris used a version of this model (now changed).

Uthreads Library Assignment [19] [top]

This library manages threads in the process that uses the uthread functions, including scheduling and synchronization. However, only one thread in the process executes at a time and scheduling is non-preemptive!

So this is the 2-level model but with a single "kernel" thread.

A running user uthread can yield. This causes the uthread scheduler to select another thread to execute. Scheduling is based on uthread priority. uthread condition variables are to be implemented. Unlike pthreads, wait on a uthread condition variable does not return spontaneously. This can make some code simpler (such as writing code for a barrier).

Some uthread data structures [20] [top]

These are mostly defined in file uthread.h.

The uthread_t type (aka a thread control block)


typedef struct uthread {
    list_link_t		ut_link;	/* link on waitqueue / scheduler */

    uthread_ctx_t	ut_ctx;		/* context */
    char	       *ut_stack;	/* user stack */

    uthread_id_t	ut_id;		/* thread's id */
    uthread_state_t	ut_state;	/* thread state */
    int			ut_prio;	/* thread's priority */
    int			ut_errno;	/* thread's errno */
    int			ut_has_exited;	/* thread exited? */
    int			ut_exit;	/* thread's exit value */

    int			ut_detached;	/* thread is detached? */
    struct uthread	*ut_waiter;	/* thread waiting to join with me */
} uthread_t;

list_link_t is just 2 pointers: prev and next. Used to insert the uthread_t on scheduling lists, wait lists.

Thread State [21] [top]

uthread_state_t is a enum type that holds the current state of each thread.

typedef enum
{
	UT_NO_STATE,		/* invalid thread state */
	UT_ON_CPU,		/* thread is running */
	UT_RUNNABLE,		/* thread is runnable */
	UT_WAIT,		/* thread is blocked */
	UT_ZOMBIE,		/* zombie threads eat your brains! */
	UT_NUM_THREAD_STATES
} uthread_state_t;

Current Thread [22] [top]

The thread library maintains a global pointer to the currently executing thread in the process.

uthread_t *ut_curthr;

The uthread_init() function must be called (once) by a program that uses the uthread library to initialize the uthread global data structures and to convert the initial thread of the process to a uthread.

This function should initialize the global uthreads array. Each entry should be initialized to indicate not yet in use:

ut_state should be set to UT_NO_STATE
ut_id should be initialized; e.g, uthreads[k].ut_id can be set to k.
ut_stack set to NULL
ut_prio is used as the index into an array of queues. To initialize this as unused, it could be set to -1.
ut_errno can be set to 0 (no error)
ut_waiter pointer to thread that has called uthread_join for this thread. Set to NULL.

Provided part of uthread_init [23] [top]


        /* these should go last and in this order */
        uthread_sched_init();
        reaper_init();
        create_first_thr();

uthread_sched_init [24] [top]

This should initialize the multilevel scheduling queues. You need to implement it.

The array of is global in uthread_sched.c:

utqueue_t runq_table[UTH_MAXPRIO + 1];

where UTQ_MAXPRIO is defined as 7.

The utqueue_t type is a doubly linked list. A function to initialize it (as empty) is utqueue_init in the file uthread_queue.c

reaper_init()[25] [top]

This function and the reaper() function itself are provided.

In implementing the user level thread library, you are responsible for cleaning up a thread: deallocating its stack and so on.

When does this occur? What thread does it? (It has to be a user level thread; it isn't handled by the kernel.)

A thread can call uthread_join(uid,&val) if the thread uid is not detached. And this function can clean up thread uid.
If a thread IS detached, then the thread should be cleaned up when it terminates. Since that thread is executing when it terminates, it seems the thread should clean up itself.
But this is a problem since that means delete its own stack while it is using its own stack.

This is were the reaper() function plays a role. A reaper thread (with reaper() as its start function) is to clean up detached threads.

create_first_thr [26] [top]

Test programs or any program that will use the uthread library will begin as a single threaded Linux process and not yet a uthread.

So that program needs to be "transformed" into an initial uthread.

This function is provided and will become the first uthread for this process. Any uthread in the program can then call uthread_create(...) to create additional uthreads.

uthread_create [27] [top]

Check for the same errors that pthread_create does. If an error occurs set errno to the symbolic integer and return -1.

Check pointer to caller's uthread_id_t is not NULL. (E.g. use assert()
Check priority is valid. If not set errno to EINVAL and return -1.
The uthread_alloc function (you implement) should be called to get an id for the uthread to be created. If no free entry in the uthreads table, set errno to EAGAIN and return -1.
Note that errno is redefined so that it is thread specific. When a thread is the currently executing thread, errno is the member in that thread's uthreads entry.

Create a stack for the new thread and set this member in this thread's uthreads table entry. You can use the alloc_stack() function. It allocates a stack of size UTH_STACK_SIZE (64k bytes). errno = EAGAIN if alloc_stack() fails (returns NULL).
Create the context for this thread and set its ut_ctx entry in the uthreads table. Use the uthread_makecontext function.

uthread_ctx_t [28] [top]

This entry in the thread control block (i.e., the uthread_t structure)

What is this?

It is a data structure to hold:

The thread's signal mask
Its execution stack
Machine register values

Linux provides functions for initializing and swapping this context information:

int getcontext(ucontext_t *ucp)
int setcontext(const ucontext_t *ucp)
int swapcontext(ucontext_t *olducp, ucontext_t *newucp)
void makecontext(ucontext_t *ucp, void *f(), int argc, ...)

makecontext [29] [top]

void makecontext(ucontext_t *ucp, void *f(), int argc, ...)

(See problem 3 in chapter 3?)

This creates a context and stores it in the struct pointed to by ucp. In particular, it sets up the stack so that when some thread switches to this context (how? by switching to the stack and returning) the function f will be called with the specified arguments on its stack.

A thread can switch to another context by calling swapcontext or setcontext.

getcontext [30] [top]

 int getcontext(ucontext_t *ucp);

This stores the context for the calling thread in the structure pointed to by ucp.

Get the code [31] [top]

The uthreads project runs on Linux. To get the initial code for the uthreads project use wget from Linux:

  wget  http://condor.depaul.edu/glancast/443class/hw/uthreads.tar

Then extract uthreads.tar in a directory of your choosing.

Type:

 make nyi

to see the functions not yet implemented.

	(Top of call stack)
esp->	Local variables alignment padding etc.
ebp->	saved caller's ebp
	caller's ret addr
	1st parameter
	2nd parameter
	Rest of caller's stack frame

CSC443 Feb05

slide version

single file version

Contents

Basic Concepts [1] [top]

Intel IA32 Stack Frame [2] [top]

Thread Context [3] [top]

Thread Switching [4] [top]

Coroutines [5] [top]

System Calls [6] [top]

Interrupts [7] [top]

I/O and Device Drivers [8] [top]

PIO [9] [top]

DMA [10] [top]

Linking and Loading [11] [top]

Share Libraries [12] [top]

Advantages

Simple Operating System Design [13] [top]

Devices, Terminals, Network Communication

Processes and Threads [14] [top]

Thread Implementations [15] [top]

Disadvantage of the One-Level Implementation [16] [top]

Two-Level Model - Single Kernel Thread [17] [top]

Multiple Kernel Threads [18] [top]

Uthreads Library Assignment [19] [top]

Some uthread data structures [20] [top]

Thread State [21] [top]

Current Thread [22] [top]

Provided part of uthread_init [23] [top]

uthread_sched_init [24] [top]

reaper_init()[25] [top]

create_first_thr [26] [top]

uthread_create [27] [top]

uthread_ctx_t [28] [top]

makecontext [29] [top]

getcontext [30] [top]

Get the code [31] [top]

CSC443 Feb05

slide version

single file version

Contents

Basic Concepts[1] [top]

Intel IA32 Stack Frame[2] [top]

Thread Context[3] [top]

Thread Switching[4] [top]

Coroutines[5] [top]

System Calls[6] [top]

Interrupts[7] [top]

I/O and Device Drivers[8] [top]

PIO[9] [top]

DMA[10] [top]

Linking and Loading[11] [top]

Share Libraries[12] [top]

Advantages

Simple Operating System Design[13] [top]

Devices, Terminals, Network Communication

Processes and Threads[14] [top]

Thread Implementations[15] [top]

Disadvantage of the One-Level Implementation[16] [top]

Two-Level Model - Single Kernel Thread[17] [top]

Multiple Kernel Threads[18] [top]

Uthreads Library Assignment[19] [top]

Some uthread data structures[20] [top]

Thread State[21] [top]

Current Thread[22] [top]

Provided part of uthread_init[23] [top]

uthread_sched_init[24] [top]

reaper_init()[25] [top]

create_first_thr[26] [top]

uthread_create[27] [top]

uthread_ctx_t[28] [top]

makecontext[29] [top]

getcontext[30] [top]

Get the code[31] [top]

Basic Concepts [1] [top]

Intel IA32 Stack Frame [2] [top]

Thread Context [3] [top]

Thread Switching [4] [top]

Coroutines [5] [top]

System Calls [6] [top]

Interrupts [7] [top]

I/O and Device Drivers [8] [top]

PIO [9] [top]

DMA [10] [top]

Linking and Loading [11] [top]

Share Libraries [12] [top]

Simple Operating System Design [13] [top]

Processes and Threads [14] [top]

Thread Implementations [15] [top]

Disadvantage of the One-Level Implementation [16] [top]

Two-Level Model - Single Kernel Thread [17] [top]

Multiple Kernel Threads [18] [top]

Uthreads Library Assignment [19] [top]

Some uthread data structures [20] [top]

Thread State [21] [top]

Current Thread [22] [top]

Provided part of uthread_init [23] [top]

uthread_sched_init [24] [top]

create_first_thr [26] [top]

uthread_create [27] [top]

uthread_ctx_t [28] [top]

makecontext [29] [top]

getcontext [30] [top]

Get the code [31] [top]