Operating System

Is a program that controls the execution of application programs

OS must relinquish control to user programs and regain it safely and efficiently

Tells the CPU when to execute other pgms

Is an interface between the user and hardware

Masks the details of the hardware to application programs

Hence OS must deal with hardware details

Evolution of an Operating System

Must adapt to hardware upgrades and new types of hardware. Examples:

Character vs graphic terminals

Introduction of paging hardware

Must offer new services, eg: internet support

The need to change the OS on regular basis place requirements on it’s design

modular construction with clean interfaces

object oriented methodology

Simple Batch Systems

Are the first operating systems (mid-50s)

The user submit a job (written on card or tape) to a computer operator

The computer operator place a batch of several jobs on a input device

A special program, the monitor, manages the execution of each program in the batch

Resident monitor is in main memory and available for execution

Monitor utilities are loaded when needed

The Monitor

Monitor reads jobs one at a time from the input device

Monitor places a job in the user program area

A monitor instruction branches to the start of the user program

Execution of user pgm continues until:

end-of-pgm occurs

error occurs

Causes the CPU to fetch its next instruction from Monitor

OS:

Alternates execution between user program and the monitor program

Relies on available hardware to effectively alternate execution from various parts of memory

Desirable Hardware Features

Memory protection

do not allow the memory area containing the monitor to be altered by user programs

Timer

prevents a job from monopolizing the system

an interrupt occurs when time expires

Privileged instructions

can be executed only by the monitor

an interrupt occurs if a program tries these instructions

Interrupts

provides flexibility for relinquishing control to and regaining control from user programs

manage resource contention

either hardware support through special instructions (lock, swap, …)

software support: semaphores, monitors, ….

Multiprogrammed Batch Systems

I/O operations are exceedingly slow (compared to instruction execution)

A program containing even a very small number of I/O ops, will spend most of its time waiting for them

Hence: poor CPU usage when only one program is present in memory

Multiprogrammed Batch Systems

If memory can hold several programs, then CPU can switch to another one whenever a program is awaiting for an I/O to complete

This is multitasking (multiprogramming)

Example:
Three Jobs Submitted

Almost no contention for resources

All 3 can run in minimum time in a multitasking environment (assuming JOB2/3 have enough CPU time to keep their I/O operations active)

Advantages of Multiprogramming

Time Sharing Systems (TSS)

Batch multiprogramming does not support interaction with users

TSS extends multiprogramming to handle multiple interactive jobs

Processor’s time is shared among multiple users

Multiple users simultaneously access the system through terminals

Because of slow human reaction time, a typical user needs 2 sec of processing time per minute

Then (about) 30 users should be able to share the same system without noticeable delay in the computer reaction time

The file system must be protected (multiple users…)

Pseudo-Parallelism

Operating systems strive to do many things at the same time

Run a user program

Read from or write to a disk

Print on a printer

Would like to keep as many resources busy as possible to increase the system’s efficiency

Multi-programming or time-sharing

CPU switches from program to program

Switch when

A time slice (10s or 100s milli-sec pass) or when a process does an IO operation

Gives the appearance of concurrency

Many users feel like they have exclusive access to the machine

Process model supports this approach

Process

Introduced to obtain a systematic way of monitoring and controlling program execution

A process is an executable program with:

associated data (variables, buffers…)

execution context: i.e. all the information that

the CPU needs to execute the process

content of the processor registers

the OS needs to manage the process:

priority of the process

the event (if any) after which the process is waiting

other data (that we will introduce later)

Process Model

Process model: states

Running

Blocked

Ready

Scheduler (kernel) is lowest level of the operating system

Handles all the interrupts

Does context switching

Handles interprocess communication

Rest of OS is on top of the scheduler (kernel) implemented as processes

For example

When a disk interrupt occurs

The system stops running the current process

Makes the disk process, which was blocked waiting for the interrupt, ready to run

Runs another process (possibly resuming the original process)

Processes and Interrupt Handling Implementation

To implement processes the OS maintains a process table

Examples of important information in the table

Process management    Memory management           File Management

Registers                              Pointers to segments            Root directory

Program counter                  Process id                             Working directory

Stack pointer                        Exit status                            File descriptors

Process start time                    ....                                      ....

CPU time used

Message queue pointers

Interrupt handling is accomplished as follows

1. Interrupt: hardware stacks program counter, etc.. of current program

2. Hardware loads new program counter from interrupt vector (table)

3. Assembly language routine saves registers in process table (above)

4. Assembly language routine procedures sets up new stack for C handler

5. C procedure marks interrupt service process as ready in the above table

6. Scheduler decides which process to run next (choose highest priority)

7. C procedure returns to assembly code

8. Assembly languages procedure starts up next process

Process Scheduling

Criteria for scheduling

Fairness: make sure each process gets its fair share of the CPU

Efficiency: keep the CPU busy 100% of the time

Response time: minimize response times for interactive users

Turnaround: minimize the time batch users must wait for output

Throughput: maximize the number of jobs processed per hour

Notes:

Any scheduling algorithm will favor some class of processes over others

Need to find a balance of interests for each specific environment

Maximize throughput

Minimize response times

Some combination of the two

Scheduling Methods

Examples of scheduling methods

Preemptive

Willing to context switch out current job if a higher priority job is ready to run (vs. non-preemptive)

Round robin

Define a time quantum, say x milliseconds

Preempt the current process every time quantum

Run the next ready to run process

Priority

Assign priorities to processes

Run process with the highest priority

Shortest job first

If you know how long programs will take to run, you can use this information to maximize throughput

Run the shortest job first!

Shortest job first also gives the minimum average response time (this can be proven)

Concurrency

What is concurrency

Permit many activities to proceed in parallel

The activities synchronize and communicate at well defined points

Why concurrency?

Easier to describe a system using several independent cohesive activities than to have one monolithic coupled function

Link the activities using well-defined interaction mechanisms

Exploit multiple resources

For example: many processors

How can we time share between activities?

Need to switch between concurrent activities

Switching is referred to as context switching

Examples of concurrent programming paradigms

Event-driven environments

Co-routines

Threads

Event-driven Environments:
Window Interface Environments

Windowing environments

Call back system

Runtime environment catches events

Mouse clicks

Interprocess messages

User defined events

Application programs

Define/register events with the runtime

Associate each event with a user written subprogram

When an event occurs the runtime

Causes the corresponding subprogram to execute

Subprogram may cause other events....then returns control to the runtime system

Runtime may or may not be multi-threaded

If it is it could associate each of the subprograms with a new thread

Concurrency:
Co-routines

Co-routines, a simple strategy

When the current activity is ready to transfer control it:

Saves its context locally

Context includes: registers including the stack pointer, base pointer, instruction pointer, flags etc..

Branches to an entry point of the next routine

The first instructions of the called routine restore its context then resumes execution

The schedule is often well defined and hard coded

Could also use an intermediate routine to implement a scheduler

Co-routines with a user implemented scheduler

Co-routines (cont.)

A routine gives up control explicitly by branching to another routine

In the example:

Each routine is only aware of one entry point for each other routine

There could be more

The scheduler decides on the order of execution

Could describe the scheduler using a finite state machine/state transition diagram

These details could be hard-coded into the routines

Advantage: faster execution

Disadvantage: more difficult to maintain

Communication can take place via variables in a data segment

Co-routines of this nature are usually written in assembler

Need access to registers to save context and do branching

A single stack used in a high level language would get in the way

Why? We have not explicitly distinguished threads of control

Threads

Has an execution state (running, ready, etc.)

Saves thread context when not running

Has an execution stack and some per-thread static storage for local variables

Has access to the memory address space and resources of its process

all threads of a process share this

when one thread alters a (non-private) memory item, all other threads (of the process) sees that

a file open with one thread, is available to others

Benefits of Threads vs Processes

Takes less time to create a new thread than a process

Less time to terminate a thread than a process

Less time to switch between two threads within the same process

Since threads within the same process share memory and files, they can communicate with each other without invoking the kernel

Therefore necessary to synchronize the activities of various threads so that they do not obtain inconsistent views of the data

Benefits of Threads

Example 1: a file server on a LAN

It needs to handle several file requests over a short period

Hence more efficient to create (and destroy) a single thread for each request

On a SMP machine: multiple threads can possibly be executing simultaneously on different processors

Example 2: one thread display menu and read user input while the other threads execute user commands

Problem: Inconsistent Data Views

3 variables: A, B, C which are shared by thread T1 and thread T2

T1 computes C = A+B

T2 transfers amount X from A to B

T2 must do: A = A-X and B = B+X (so that A+B is unchanged)

But if T1 computes A+B after T2 has done A = A-X but before B = B+X, T1 will not obtain the correct result for C = A + B

Solution: critical sections (later)

Threads States

Three key states: running, ready, blocked

They have no suspend state because all threads within the same process share the same address space

Indeed: suspending (i.e.: swapping) a single thread involves suspending all threads of the same process

Termination of a process, terminates all threads within the process

User-Level Threads (ULT)

The kernel is not aware of the existence of threads

All thread management is done by the application by using a thread library

Thread switching does not require kernel mode privileges (no mode switch)

Scheduling is application specific

Threads library

Contains code for:

creating and destroying threads

passing messages and data between threads

scheduling thread execution

saving and restoring thread contexts

Kernel activity for
User-Level Threads

The kernel is not aware of thread activity but it is still managing process activity

When a thread makes a system call, the whole process will be blocked

but for the thread library that thread is still in the running state

So thread states are independent of process states

Advantages and Inconveniences of ULT

Advantages

Thread switching does not involve the kernel: no mode switching

Scheduling can be application specific: choose the best algorithm.

ULTs can run on any OS. Only needs a thread library

Inconveniences

Most system calls are blocking and the kernel blocks processes. So all threads within the process will be blocked

The kernel can only assign processes to processors. Two threads within the same process cannot run simultaneously on two processors

Kernel-Level Threads (KLT)

All thread management is done by kernel

No thread library but an API to the kernel thread facility

Kernel maintains context information for the process and the threads

Switching between threads requires the kernel

Scheduling on a thread basis

Ex: Windows NT and OS/2

Advantages and Inconveniences of KLT

Advantages

the kernel can simultaneously schedule many threads of the same process on many processors

blocking is done on a thread level

kernel routines can be multithreaded

Inconveniences

thread switching within the same process involves the kernel. We have 2 mode switches per thread switch

this results in a significant slow down

Combined ULT/KLT Approaches

Thread creation done in the user space

Bulk of scheduling and synchronization of threads done in the user space

The programmer may adjust the number of KLTs

May combine the best of both approaches

Example is Solaris

Language with Support for Concurrency: Ada

Explicit language support for concurrency

Has a programming construct named a task

Each task is a thread with its own stack

Tasks are objects with methods named entries

Task Stack is

Entry Init(....);

Entry Push(...);

Entry Pop(...);

End Stack;

Task body Stack is

Begin

Loop

Select

accept Init(....) do ......

End Init;

or

accept Push(...) do .....

End Push;

....

End select;

Other work....

End Loop

End Stack;

Ada Rendezvous

Interaction mechanism

Rendezvous

Looks like a procedure call in the source code

Stack.Init(...), Stack.Push(....)

Rendezvous as a call-accept-release sequence

The serving task can only serve one caller at a time, if its busy when a call arrives the calling task is placed in a queue for the entry

Stacks in a Concurrent Program

What does the stack look like in a concurrent program?

No longer a single stack

Each thread needs its own stack

Stack space can be allocated from a heap

Issue becomes complex in a language like Ada with sophisticated scoping rules

Task A specification and body

Local variable a

Task B specification and body

Task C specification and body

Procedure D

End Task A

Local variables are placed on the stack, how are they linked together?

A complex multi-linked list data structure called a cactus stack!

What would the assembler sequences look like for accessing a local variable in main from C?

PThreads (POSIX Threads)

Instead of extensions to languages a package of routines is used

Environment usually initialized using a subroutine call in the main environment

Interaction mechanisms

Shared memory

Mutex for synchronization

Fork and join primitives

Examples of operations:

pthread_create(thread_id, attributes, start_routine, address_of_stack)

pthread_yield(void)

pthread_mutex_init(mutex, attributes)

pthread_mutex_lock(mutex)

pthread_mutex_unlock(mutex)

Others.....

Problems with Concurrent Execution

Concurrent processes (or threads) often need to share data (maintained either in shared memory or files) and resources

If there is no controlled access to shared data, some processes will obtain an inconsistent view of this data

The action performed by concurrent processes will then depend on the order in which their execution is interleaved

An Example

Process P1 and P2 are running this same procedure and have access to the same variable “a”

Processes can be interrupted anywhere

If P1 is first interrupted after user input and P2 executes entirely

Then the character echoed by P1 will be the one read by P2 !!

The Critical Section Problem

When a process executes code that manipulates shared data (or resource), we say that the process is in it’s critical section (CS) (for that shared data)

The execution of critical sections must be mutually exclusive: at any time, only one process is allowed to execute in its critical section (even with multiple CPUs)

Then each process must request the permission to enter it’s critical section (CS)

The Critical Section Problem

The section of code implementing this request is called the entry section

The critical section (CS) might be followed by an exit section

The remaining code is the remainder section

The critical section problem is to design a protocol that the processes can use so that their action will not depend on the order in which their execution is interleaved (possibly on many processors)

Framework for Analysis of Solutions

Each process executes at nonzero speed but no assumption on the relative speed of n processes

General structure of a process:

many CPU may be present but memory hardware prevents simultaneous access to the same memory location

No assumption about order of interleaved execution

For solutions: we need to specify entry and exit sections

Software Solutions: Algorithm 1

The shared variable turn is initialized (to 0 or 1) before executing any Pi

Pi’s critical section is executed iff turn = i

Pi is busy waiting if Pj is in CS: mutual exclusion is satisfied

Progress requirement is not satisfied since it requires strict alternation of CSs

Ex: if turn=0 and P0 is in it’s RS, P1 can never enter it’s CS

Software Solutions:
Algorithm 2

Keep 1 Bool variable for each process: flag[0] and flag[1]

Pi signals that it is ready to enter it’s CS by: flag[i]:=true

Mutual Exclusion is satisfied but not the progress requirement

If we have the sequence:

T0: flag[0]:=true

T1: flag[1]:=true

Both process will wait forever to enter their CS: we have a deadlock

Software Solutions:
Algorithm 3

Initialization: flag[0]:=flag[1]:=false turn:= 0 or 1

Willingness to enter CS specified by flag[i]:=true

If both processes attempt to enter their CS simultaneously, only one turn value will last

Exit section: specifies that Pi is unwilling to enter CS

This algorithm is also known as “Peterson’s Algorithm”

Semaphores

Synchronization tool (provided by the OS) that do not require busy waiting

A semaphore S is an integer variable that, apart from initialization, can only be accessed through 2 atomic and mutually exclusive operations:

wait(S)

signal(S)

To avoid busy waiting: when a process has to wait, it will be put in a blocked queue of processes waiting for the same event

Semaphores

Hence, in fact, a semaphore is a record (structure):

When a process must wait for a semaphore S, it is blocked and put on the semaphore’s queue

The signal operation removes (according to a fair policy like FIFO) one process from the queue and puts it in the list of ready processes

Semaphore Operations (atomic)

Semaphores: Observations

When S.count >=0: the number of processes that can execute wait(S) without being blocked = S.count

When S.count<0: the number of processes waiting on S is = |S.count|

Atomicity and mutual exclusion: no 2 processes can be in wait(S) and signal(S) (on the same S) at the same time (even with multiple CPUs)

This is a critical section problem: critical sections are blocks of code defining wait(S) and signal(S)

Hence: the critical sections here are very short: typically 10 instructions

Solutions:

uniprocessor: disable interrupts during these operations (ie: for a very short period). This does not work on a multiprocessor machine.

multiprocessor: use previous software or hardware schemes. The amount of busy waiting should be small.

Semaphores and Critical Sections

For n processes

Initialize S.count to 1

Then only 1 process is allowed into CS (mutual exclusion)

To allow k processes into CS, we initialize S.count to k

Process Synchronization:
Monitors

Monitors

A package of subprograms with conditions

conditions are basically semaphores

Subprograms in the monitor are protected by a single mutex variable so only one caller can enter the monitor at a time

Clients can call the subprograms and then wait on a condition within the monitor

For example: cwait (buffer_full)

When waiting, a client gives up its mutex on the monitor as well

A new caller may enter and signal the condition

For example: csignal(buffer_full)

When the new caller leaves the monitor, a single client waiting on buffer_full will resume where it left off

Note the resumed client will have mutex over the monitor’s subprograms

Monitor

Awaiting processes are either in the entrance queue or in a condition queue

A process puts itself into condition queue cn by issuing cwait(cn)

csignal(cn) brings into the monitor 1 process in condition cn queue

Hence csignal(cn) blocks the calling process and puts it in the urgent queue (unless signal is the last operation of the monitor procedure)

Monitor Example:
Producer-Consumer

Managing Memory within a Program

Three ways memory is allocated

Static

Allocated as soon as execution begins

Lasts until termination

Usually reside in fixed data segments

Note: a variable can be static yet have a local scope

Local

Created on the stack or in a register when an enclosing block or function is entered

Disappear when the block or function is exited

Dynamic

Created and destroyed by specific function calls during a program

Can be called by application source or the runtime

Operations: malloc or new, and free or delete

How does dynamic memory allocation work?

Use a datatype we call a heap

Heap

Heap:

Pre-allocated area of memory partitioned into blocks

Linked list of blocks of various sizes

Could even start with a single large block

Operations: malloc or new and free or delete

Time complexity for operations depends on the choice of data structure

Could have a single list, multiple lists, doubly linked lists

List(s) may be sorted, or not

May split blocks if they are large compared to request sizes

Can lead to fragmentation: all blocks become small

May have to merge blocks if none are large enough to satisfy a request

Garbage Collection

Garbage collection is a reorganization of the heap to decrease fragmentation

We can merge adjacent free blocks to make larger blocks (easy to do)

Theoretically you could move data and change pointers to the data in the application program to decrease fragmentation (note: this is difficult)

Tools for managing a heap

Heap statistics

Give frequency with which each block size was used

Number of times blocks are fragmented and merged

Configuration utilities

Setting an initial configuration for a heap


	Is a program that controls the execution of application programs
		OS must relinquish control to user programs and regain it safely and efficiently
		Tells the CPU when to execute other pgms
	Is an interface between the user and hardware
	Masks the details of the hardware to application programs
		Hence OS must deal with hardware details


	Must adapt to hardware upgrades and new types of hardware. Examples:
		Character vs graphic terminals
		Introduction of paging hardware
	Must offer new services, eg: internet support
	The need to change the OS on regular basis place requirements on it’s design
		modular construction with clean interfaces
		object oriented methodology


	Are the first operating systems (mid-50s)
	The user submit a job (written on card or tape) to a computer operator
	The computer operator place a batch of several jobs on a input device
	A special program, the monitor, manages the execution of each program in the batch
	Resident monitor is in main memory and available for execution
	Monitor utilities are loaded when needed


	Monitor reads jobs one at a time from the input device
	Monitor places a job in the user program area
	A monitor instruction branches to the start of the user program
	Execution of user pgm continues until:
		end-of-pgm occurs
		error occurs
	Causes the CPU to fetch its next instruction from Monitor
	OS:
		Alternates execution between user program and the monitor program
		Relies on available hardware to effectively alternate execution from various parts of memory


	Memory protection
		do not allow the memory area containing the monitor to be altered by user programs
	Timer
		prevents a job from monopolizing the system
		an interrupt occurs when time expires
	Privileged instructions
		can be executed only by the monitor
		an interrupt occurs if a program tries these instructions
	Interrupts
		provides flexibility for relinquishing control to and regaining control from user programs
	manage resource contention
		either hardware support through special instructions (lock, swap, …)
		software support: semaphores, monitors, ….


	I/O operations are exceedingly slow (compared to instruction execution)
	A program containing even a very small number of I/O ops, will spend most of its time waiting for them
	Hence: poor CPU usage when only one program is present in memory


	If memory can hold several programs, then CPU can switch to another one whenever a program is awaiting for an I/O to complete
	This is multitasking (multiprogramming)


	Almost no contention for resources
	All 3 can run in minimum time in a multitasking environment (assuming JOB2/3 have enough CPU time to keep their I/O operations active)


	Batch multiprogramming does not support interaction with users
	TSS extends multiprogramming to handle multiple interactive jobs
	Processor’s time is shared among multiple users
	Multiple users simultaneously access the system through terminals
	Because of slow human reaction time, a typical user needs 2 sec of processing time per minute
	Then (about) 30 users should be able to share the same system without noticeable delay in the computer reaction time
	The file system must be protected (multiple users…)


Operating systems strive to do many things at the same time
	Run a user program
	Read from or write to a disk
	Print on a printer
	Would like to keep as many resources busy as possible to increase the system’s efficiency
Multi-programming or time-sharing
	CPU switches from program to program
	Switch when
		A time slice (10s or 100s milli-sec pass) or when a process does an IO operation
Gives the appearance of concurrency
	Many users feel like they have exclusive access to the machine
Process model supports this approach


Introduced to obtain a systematic way of monitoring and controlling program execution
A process is an executable program with:
	associated data (variables, buffers…)
	execution context: i.e. all the information that
		the CPU needs to execute the process
			content of the processor registers
		the OS needs to manage the process:
			priority of the process
			the event (if any) after which the process is waiting
			other data (that we will introduce later)


Process model: states
		Running
		Blocked
		Ready



Scheduler (kernel) is lowest level of the operating system
	Handles all the interrupts
	Does context switching
	Handles interprocess communication
Rest of OS is on top of the scheduler (kernel) implemented as processes
For example
	When a disk interrupt occurs
		The system stops running the current process
		Makes the disk process, which was blocked waiting for the interrupt, ready to run
		Runs another process (possibly resuming the original process)


	To implement processes the OS maintains a process table
	Examples of important information in the table
	Process management Memory management File Management
	Registers Pointers to segments Root directory
	Program counter Process id Working directory
	Stack pointer Exit status File descriptors
	Process start time .... ....
	CPU time used
	Message queue pointers

	Interrupt handling is accomplished as follows
	1. Interrupt: hardware stacks program counter, etc.. of current program
	2. Hardware loads new program counter from interrupt vector (table)
	3. Assembly language routine saves registers in process table (above)
	4. Assembly language routine procedures sets up new stack for C handler
	5. C procedure marks interrupt service process as ready in the above table
	6. Scheduler decides which process to run next (choose highest priority)
	7. C procedure returns to assembly code
	8. Assembly languages procedure starts up next process


Criteria for scheduling
	Fairness: make sure each process gets its fair share of the CPU
	Efficiency: keep the CPU busy 100% of the time
	Response time: minimize response times for interactive users
	Turnaround: minimize the time batch users must wait for output
	Throughput: maximize the number of jobs processed per hour
Notes:
	Any scheduling algorithm will favor some class of processes over others
	Need to find a balance of interests for each specific environment
		Maximize throughput
		Minimize response times
		Some combination of the two


Examples of scheduling methods
	Preemptive
		Willing to context switch out current job if a higher priority job is ready to run (vs. non-preemptive)
	Round robin
		Define a time quantum, say x milliseconds
		Preempt the current process every time quantum
		Run the next ready to run process
	Priority
		Assign priorities to processes
		Run process with the highest priority
	Shortest job first
		If you know how long programs will take to run, you can use this information to maximize throughput
		Run the shortest job first!
		Shortest job first also gives the minimum average response time (this can be proven)


What is concurrency
	Permit many activities to proceed in parallel
	The activities synchronize and communicate at well defined points
Why concurrency?
	Easier to describe a system using several independent cohesive activities than to have one monolithic coupled function
		Link the activities using well-defined interaction mechanisms
	Exploit multiple resources
		For example: many processors
How can we time share between activities?
	Need to switch between concurrent activities
	Switching is referred to as context switching
Examples of concurrent programming paradigms
	Event-driven environments
	Co-routines
	Threads


Windowing environments
	Call back system
	Runtime environment catches events
		Mouse clicks
		Interprocess messages
		User defined events
Application programs
	Define/register events with the runtime
	Associate each event with a user written subprogram
When an event occurs the runtime
	Causes the corresponding subprogram to execute
	Subprogram may cause other events....then returns control to the runtime system
	Runtime may or may not be multi-threaded
		If it is it could associate each of the subprograms with a new thread


Co-routines, a simple strategy
	When the current activity is ready to transfer control it:
		Saves its context locally
			Context includes: registers including the stack pointer, base pointer, instruction pointer, flags etc..
		Branches to an entry point of the next routine
	The first instructions of the called routine restore its context then resumes execution
	The schedule is often well defined and hard coded
	Could also use an intermediate routine to implement a scheduler
Co-routines with a user implemented scheduler


A routine gives up control explicitly by branching to another routine
In the example:
	Each routine is only aware of one entry point for each other routine
		There could be more
	The scheduler decides on the order of execution
		Could describe the scheduler using a finite state machine/state transition diagram
		These details could be hard-coded into the routines
			Advantage: faster execution
			Disadvantage: more difficult to maintain
Communication can take place via variables in a data segment
Co-routines of this nature are usually written in assembler
	Need access to registers to save context and do branching
	A single stack used in a high level language would get in the way
		Why? We have not explicitly distinguished threads of control


	Has an execution state (running, ready, etc.)
	Saves thread context when not running
	Has an execution stack and some per-thread static storage for local variables
	Has access to the memory address space and resources of its process
		all threads of a process share this
		when one thread alters a (non-private) memory item, all other threads (of the process) sees that
		a file open with one thread, is available to others


	Takes less time to create a new thread than a process
	Less time to terminate a thread than a process
	Less time to switch between two threads within the same process
	Since threads within the same process share memory and files, they can communicate with each other without invoking the kernel
	Therefore necessary to synchronize the activities of various threads so that they do not obtain inconsistent views of the data


	Example 1: a file server on a LAN
	It needs to handle several file requests over a short period
	Hence more efficient to create (and destroy) a single thread for each request
	On a SMP machine: multiple threads can possibly be executing simultaneously on different processors
	Example 2: one thread display menu and read user input while the other threads execute user commands


	3 variables: A, B, C which are shared by thread T1 and thread T2
	T1 computes C = A+B
	T2 transfers amount X from A to B
		T2 must do: A = A-X and B = B+X (so that A+B is unchanged)
	But if T1 computes A+B after T2 has done A = A-X but before B = B+X, T1 will not obtain the correct result for C = A + B
	Solution: critical sections (later)


	Three key states: running, ready, blocked
	They have no suspend state because all threads within the same process share the same address space
	Indeed: suspending (i.e.: swapping) a single thread involves suspending all threads of the same process
	Termination of a process, terminates all threads within the process


	The kernel is not aware of the existence of threads
	All thread management is done by the application by using a thread library
	Thread switching does not require kernel mode privileges (no mode switch)
	Scheduling is application specific


	Contains code for:
		creating and destroying threads
		passing messages and data between threads
		scheduling thread execution
		saving and restoring thread contexts


	The kernel is not aware of thread activity but it is still managing process activity
	When a thread makes a system call, the whole process will be blocked
	but for the thread library that thread is still in the running state
	So thread states are independent of process states


	Advantages
		Thread switching does not involve the kernel: no mode switching
		Scheduling can be application specific: choose the best algorithm.
		ULTs can run on any OS. Only needs a thread library
	Inconveniences
		Most system calls are blocking and the kernel blocks processes. So all threads within the process will be blocked
		The kernel can only assign processes to processors. Two threads within the same process cannot run simultaneously on two processors


	All thread management is done by kernel
	No thread library but an API to the kernel thread facility
	Kernel maintains context information for the process and the threads
	Switching between threads requires the kernel
	Scheduling on a thread basis
	Ex: Windows NT and OS/2


	Advantages
		the kernel can simultaneously schedule many threads of the same process on many processors
		blocking is done on a thread level
		kernel routines can be multithreaded
	Inconveniences
		thread switching within the same process involves the kernel. We have 2 mode switches per thread switch
		this results in a significant slow down