Operating System
Is a program that controls the execution of application programs
OS must relinquish control to user programs and regain it safely and efficiently
Tells the CPU when to execute other pgms
Is an interface between the user and hardware
Masks the details of the hardware to application programs
Hence OS must deal with hardware details

Evolution of an Operating System
Must adapt to hardware upgrades and new types of hardware. Examples:
Character vs graphic terminals
Introduction of paging hardware
Must offer new services, eg: internet support
The need to change the OS on regular basis place requirements on it’s design
modular construction with clean interfaces
object oriented methodology

   Simple Batch Systems
Are the first operating systems (mid-50s)
The user submit a job (written on card or tape) to a computer operator
The computer operator place a batch of several jobs on a input device
A special program, the monitor, manages the execution of each program in the batch
Resident monitor is in main memory and available for execution
Monitor utilities are loaded when needed

The Monitor
Monitor reads jobs one at a time from the input device
Monitor places a job in the user program area
A monitor instruction branches to the start of the user program
Execution of user pgm continues until:
end-of-pgm occurs
error occurs
Causes the CPU to fetch its next instruction from Monitor
OS:
Alternates execution between user program and the monitor program
Relies on available hardware to effectively alternate execution from various parts of memory

Desirable Hardware Features
Memory protection
do not allow the memory area containing the monitor to be altered by user programs
Timer
prevents a job from monopolizing the system
an interrupt occurs when time expires
Privileged instructions
can be executed only by the monitor
an interrupt occurs if a program tries these instructions
Interrupts
provides flexibility for relinquishing control to and regaining control from user programs
manage resource contention
either hardware support through special instructions (lock, swap, …)
software support: semaphores, monitors, ….

Multiprogrammed Batch Systems
I/O operations are exceedingly slow (compared to instruction execution)
A program containing even a very small number of I/O ops, will spend most of its time waiting for them
Hence: poor CPU usage when only one program is present in memory

Multiprogrammed Batch Systems
If memory can hold several programs, then CPU can switch to another one whenever a program is awaiting for an I/O to complete
This is multitasking (multiprogramming)

Example:
Three Jobs Submitted
Almost no contention for resources
All 3 can run in minimum time in a multitasking environment (assuming JOB2/3 have enough CPU time to keep their I/O operations active)

Advantages of Multiprogramming

Time Sharing Systems (TSS)
Batch multiprogramming does not support interaction with users
TSS extends multiprogramming to handle multiple interactive jobs
Processor’s time is shared among multiple users
Multiple users simultaneously access the system through terminals
Because of slow human reaction time, a typical user needs 2 sec of processing time per minute
Then (about) 30 users should be able to share the same system without noticeable delay in the computer reaction time
The file system must be protected (multiple  users…)

   Pseudo-Parallelism
Operating systems strive to do many things at the same time
Run a user program
Read from or write to a disk
Print on a printer
Would like to keep as many resources busy as possible to increase the system’s efficiency
Multi-programming or time-sharing
CPU switches from program to program
Switch when
A time slice (10s or 100s milli-sec pass) or when a process does an IO operation
Gives the appearance of concurrency
Many users feel like they have exclusive access to the machine
Process model supports this approach

   Process
Introduced to obtain a systematic way of monitoring and controlling program execution
A process is an executable program with:
 associated data (variables, buffers…)
 execution context: i.e. all the information that
 the CPU needs to execute the process
content of  the processor registers
the OS needs to manage the process:
priority of the process
the event (if any) after which the process is waiting
other data (that we will introduce later)

   Process Model
Process model: states
Running
Blocked
Ready
Scheduler  (kernel) is lowest level of the operating system
Handles all the interrupts
Does context switching
Handles interprocess communication
Rest of OS is on top of the scheduler (kernel) implemented as processes
For example
When a disk interrupt occurs
The system stops running the current process
Makes the disk process, which was blocked waiting for the interrupt, ready to run
Runs another process (possibly resuming the original process)

Processes and Interrupt Handling Implementation
To implement processes the OS maintains a process table
Examples of important information in the table
Process management    Memory management           File Management
Registers                              Pointers to segments            Root directory
Program counter                  Process id                             Working directory
Stack pointer                        Exit status                            File descriptors
Process start time                    ....                                      ....
CPU time used
Message queue pointers
Interrupt handling is accomplished as follows
1. Interrupt: hardware stacks program counter, etc.. of current program
2. Hardware loads new program counter from interrupt vector (table)
3. Assembly language routine saves registers in process table (above)
4. Assembly language routine procedures sets up new stack for C handler
5. C procedure marks interrupt service process as ready in the above table
6. Scheduler decides which process to run next (choose highest priority)
7. C procedure returns to assembly code
8. Assembly languages procedure starts up next process

   Process Scheduling
Criteria for scheduling
Fairness: make sure each process gets its fair share of the CPU
Efficiency: keep the CPU busy 100% of the time
Response time: minimize response times for interactive users
Turnaround: minimize the time batch users must wait for output
Throughput: maximize the number of jobs processed per hour
Notes:
Any scheduling algorithm will favor some class of processes over others
Need to find a balance of interests for each specific environment
Maximize throughput
Minimize response times
Some combination of the two

   Scheduling Methods
Examples of scheduling methods
Preemptive
Willing to context switch out current job if a higher priority job is ready to run (vs. non-preemptive)
Round robin
Define a time quantum, say x milliseconds
Preempt the current process every time quantum
Run the next ready to run process
Priority
Assign priorities to processes
Run process with the highest priority
Shortest job first
If  you know how long programs will take to run, you can use this information to maximize throughput
Run the shortest job first!
Shortest job first also gives the minimum average response time (this can be proven)

   Concurrency
What is concurrency
Permit many activities to proceed in parallel
The activities synchronize and communicate at well defined points
Why concurrency?
Easier to describe a system using several independent cohesive activities than to have one monolithic coupled function
Link the activities using well-defined interaction mechanisms
Exploit multiple resources
For example: many processors
How can we time share between activities?
Need to switch between concurrent activities
Switching is referred to as context switching
Examples of concurrent programming paradigms
Event-driven environments
Co-routines
Threads

Event-driven Environments:
Window Interface Environments
Windowing environments
Call back system
Runtime environment catches events
Mouse clicks
Interprocess messages
User defined events
Application programs
Define/register events with the runtime
Associate each event with a user written subprogram
When an event occurs the runtime
Causes the corresponding subprogram to execute
Subprogram may cause other events....then returns control to the runtime system
Runtime may or may not be multi-threaded
If it is it could associate each of the subprograms with a new thread

Concurrency:
Co-routines
Co-routines, a simple strategy
When the current activity is ready to transfer control it:
Saves its context locally
Context includes: registers including the stack pointer, base pointer, instruction pointer, flags etc..
Branches to an entry point of the next routine
The first instructions of the called routine restore its context then resumes execution
The schedule is often well defined and hard coded
Could also use an intermediate routine to implement a scheduler
Co-routines with a user implemented scheduler

   Co-routines (cont.)
A routine gives up control explicitly by branching to another routine
In the example:
Each routine is only aware of one entry point for each other routine
There could be more
The scheduler decides on the order of execution
Could describe the scheduler using a finite state machine/state transition diagram
These details could be hard-coded into the routines
Advantage: faster execution
Disadvantage: more difficult to maintain
Communication can take place via variables in a data segment
Co-routines of this nature are usually written in assembler
Need access to registers to save context and do branching
A single stack used in a high level language would get in the way
Why? We have not explicitly distinguished threads of control

   Threads
Has an execution state (running, ready, etc.)
Saves thread context when not running
Has an execution stack and some per-thread static storage for local variables
Has access to the memory address space and resources of its process
all threads of a process share this
when one thread alters a (non-private) memory item, all other threads (of the process) sees that
a file open with one thread, is available to others

Benefits of Threads vs Processes
Takes less time to create a new thread than a process
Less time to terminate a thread than a process
Less time to switch between two threads within the same process
Since threads within the same process share memory and files, they can communicate with each other without invoking the kernel
Therefore necessary to synchronize the activities of various threads so that they do not obtain inconsistent views of the data

   Benefits of Threads
Example 1: a file server on a LAN
It needs to handle several file requests over a short period
Hence more efficient to create (and destroy) a single thread for each request
On a SMP machine: multiple threads can possibly be executing simultaneously on different processors
Example 2: one thread display menu and read user input while the other threads execute user commands

Problem: Inconsistent Data Views
3 variables: A, B, C which are shared by thread T1 and thread T2
T1 computes C = A+B
T2 transfers amount X from A to B
T2 must do: A = A-X and B = B+X (so that A+B is unchanged)
But if T1 computes A+B after T2 has done A = A-X but before B = B+X, T1 will not obtain the correct result for C = A + B
Solution: critical sections (later)

   Threads States
Three key states: running, ready, blocked
They have no suspend state because all threads within the same process share the same address space
Indeed: suspending (i.e.: swapping) a single thread involves suspending all threads of the same process
Termination of a process, terminates all threads within the process

User-Level Threads (ULT)
The kernel is not aware of the existence of threads
All thread management is done by the application by using a thread library
Thread switching does not require kernel mode privileges (no mode switch)
Scheduling is application specific

   Threads library
Contains code for:
creating and destroying threads
passing messages and data between threads
scheduling thread execution
saving and restoring thread contexts

Kernel activity for
User-Level Threads
The kernel is not aware of thread activity but it is still managing process activity
When a thread makes a system call, the whole process will be blocked
but for the thread library that thread is still in the running state
So thread states are independent of process states

Advantages and Inconveniences of ULT
Advantages
Thread switching does not involve the kernel: no mode switching
Scheduling can be application specific: choose the best algorithm.
ULTs can run on any OS. Only needs a thread library
Inconveniences
Most system calls are blocking and the kernel blocks processes. So all threads within the process will be blocked
The kernel can only assign processes to processors. Two threads within the same process cannot run simultaneously on two processors

Kernel-Level Threads (KLT)
All thread management is done by kernel
No thread library but an API to the kernel thread facility
Kernel maintains context information for the process and the threads
Switching between threads requires the kernel
Scheduling on a thread basis
Ex: Windows NT and OS/2

Advantages and Inconveniences of KLT
Advantages
the kernel can simultaneously schedule many threads of the same process on many processors
blocking is done on a thread level
kernel routines can be multithreaded
Inconveniences
thread switching within the same process involves the kernel. We have 2 mode switches per thread switch
this results in a significant slow down

Combined ULT/KLT Approaches
Thread creation done in the user space
Bulk of scheduling and synchronization of threads done in the user space
The programmer may adjust the number of KLTs
May combine the best of both approaches
Example is Solaris

Language with Support for Concurrency: Ada
Explicit language support for concurrency
Has a programming construct named a task
Each task is a thread with its own stack
Tasks are objects with methods named entries
Task Stack is
Entry Init(....);
Entry Push(...);
Entry Pop(...);
End Stack;
Task body Stack is
Begin
Loop
Select
accept Init(....) do ......
End Init;
or
accept Push(...) do .....
End Push;
....
End select;
     Other work....
End Loop
End Stack;

   Ada Rendezvous
Interaction mechanism
Rendezvous
Looks like a procedure call in the source code
Stack.Init(...),  Stack.Push(....)
Rendezvous as a call-accept-release sequence
The serving task can only serve one caller at a time, if its busy when a call arrives the calling task is placed in a queue for the entry

Stacks in a Concurrent Program
What does the stack look like in a concurrent program?
No longer a single stack
Each thread needs its own stack
Stack space can be allocated from a heap
Issue becomes complex in a language like Ada with sophisticated scoping rules
Task A specification and body
Local variable a
Task B specification and body
Task C specification and body
Procedure D
End Task A
Local variables are placed on the stack, how are they linked together?
A complex multi-linked list data structure called a cactus stack!
What would the assembler sequences look like for accessing a local variable in main from C?

   PThreads (POSIX Threads)
Instead of extensions to languages a package of routines is used
Environment usually initialized using a subroutine call in the main environment
Interaction mechanisms
Shared memory
Mutex for synchronization
Fork and join primitives
Examples of operations:
pthread_create(thread_id, attributes, start_routine, address_of_stack)
pthread_yield(void)
pthread_mutex_init(mutex, attributes)
pthread_mutex_lock(mutex)
pthread_mutex_unlock(mutex)
Others.....

Problems with Concurrent Execution
Concurrent processes (or threads) often need to share data (maintained either in shared memory or files) and resources
If there is no controlled access to shared data, some processes will obtain an inconsistent view of this data
The action performed by concurrent processes will then depend on the order in which their execution is interleaved

   An Example
Process P1 and P2 are running this same procedure and have access to the same variable “a”
Processes can be interrupted anywhere
If P1 is first interrupted after user input and P2 executes entirely
Then the character echoed by P1 will be the one read by P2 !!

The Critical Section Problem
When a process executes code that manipulates shared data (or resource), we say that the process is in it’s critical section (CS) (for that shared data)
The execution of critical sections must be mutually exclusive: at any time, only one process is allowed to execute in its critical section (even with multiple CPUs)
Then each process must request the  permission to enter it’s critical section (CS)

The Critical Section Problem
The section of code implementing this request is called the entry section
The critical section (CS) might be followed by an exit section
The remaining code is the remainder section
The critical section problem is to design a protocol that the processes can use so that their action will not depend on the order in which their execution is interleaved (possibly on many processors)

Framework for Analysis of Solutions
Each process executes at nonzero speed but no assumption on the relative speed of n processes
General structure of a process:
many CPU may be present but memory hardware prevents simultaneous access to the same memory location
No assumption about order of interleaved execution
For solutions: we need to specify entry and exit sections

Software Solutions: Algorithm 1
The shared variable turn is initialized (to 0 or 1) before executing any Pi
Pi’s critical section is executed iff turn = i
Pi is busy waiting if Pj is in CS: mutual exclusion is satisfied
Progress requirement is not satisfied since it requires strict alternation of CSs
Ex: if turn=0 and P0 is in it’s RS, P1 can never enter it’s CS

Software Solutions:
Algorithm 2
Keep 1  Bool variable for each process: flag[0] and flag[1]
Pi signals that it is ready to enter it’s CS by: flag[i]:=true
Mutual Exclusion is satisfied but not the progress requirement
If we have the sequence:
T0: flag[0]:=true
T1: flag[1]:=true
Both process will wait forever to enter their CS: we have a deadlock

Software Solutions:
Algorithm 3
Initialization: flag[0]:=flag[1]:=false turn:= 0 or 1
Willingness to enter CS specified by flag[i]:=true
If both processes attempt to enter their CS simultaneously, only one turn value will last
Exit section: specifies that Pi is unwilling to enter CS
This algorithm is also known as “Peterson’s Algorithm”

   Semaphores
Synchronization tool (provided by the OS) that do not require busy waiting
A semaphore S is an integer variable that, apart from initialization, can only be accessed through 2 atomic and mutually exclusive operations:
wait(S)
signal(S)
To avoid busy waiting: when a process has to wait, it will be put in a blocked queue of processes waiting for the same event

   Semaphores
Hence, in fact, a semaphore is a record (structure):
When a process must wait for a semaphore S, it is blocked and put on the semaphore’s queue
The signal operation removes (according to a fair policy like FIFO) one process from the queue and puts it in the list of ready processes

Semaphore Operations (atomic)

Semaphores: Observations
When S.count >=0:  the number of processes that can execute wait(S) without being blocked = S.count
When S.count<0: the number of processes waiting on S is = |S.count|
Atomicity and mutual exclusion: no 2 processes can be in wait(S) and signal(S) (on the same S) at the same time (even with multiple CPUs)
This is a critical section problem: critical sections are blocks of code defining wait(S) and signal(S)
Hence: the critical sections here are very short: typically 10 instructions
Solutions:
uniprocessor: disable interrupts during these operations (ie: for a very short period). This does not work on a multiprocessor machine.
multiprocessor: use previous software or hardware schemes. The amount of busy waiting should be small.

Semaphores and Critical Sections
For n processes
Initialize S.count to 1
Then only 1 process is allowed into CS (mutual exclusion)
To allow k processes into CS, we initialize S.count to k

Process Synchronization:
Monitors
Monitors
A package of subprograms with conditions
conditions are basically semaphores
Subprograms in the monitor are protected by a single mutex variable so only one caller can enter the monitor at a time
Clients can call the subprograms and then wait on a condition within the monitor
For example: cwait (buffer_full)
When waiting, a client gives up its mutex on the monitor as well
A new caller may enter and signal the condition
For example: csignal(buffer_full)
When the new caller leaves the monitor, a single client waiting on buffer_full will resume where it left off
Note the resumed client will have mutex over the monitor’s subprograms

   Monitor
Awaiting processes are either in the entrance queue or in a condition queue
A process puts itself into condition queue cn by issuing cwait(cn)
csignal(cn) brings into the monitor 1 process in condition cn queue
Hence csignal(cn) blocks the calling process and puts it in the urgent queue (unless signal is the last operation of the monitor procedure)

Monitor Example:
Producer-Consumer

Managing Memory within a Program
Three ways memory is allocated
Static
Allocated as soon as execution begins
Lasts until termination
Usually reside in fixed data segments
Note: a variable can be static yet have a local scope
Local
Created on the stack or in a register when an enclosing block or function is entered
Disappear when the block or function is exited
Dynamic
Created and destroyed by specific function calls during a program
Can be called by application source or the runtime
Operations: malloc or new, and free or delete
How does dynamic memory allocation work?
Use a datatype we call a heap

   Heap
Heap:
Pre-allocated area of memory partitioned into blocks
Linked list of blocks of various sizes
Could even start with a single large block
Operations: malloc or new and free or delete
Time complexity for operations depends on the choice of data structure
Could have a single list, multiple lists, doubly linked lists
List(s) may be sorted, or not
May split blocks if they are large compared to request sizes
Can lead to fragmentation: all blocks become small
May have to merge blocks if none are large enough to satisfy a request

   Garbage Collection
Garbage collection is a reorganization of the heap to decrease fragmentation
We can merge adjacent free blocks to make larger blocks (easy to do)
Theoretically you could move data and change pointers to the data in the application program to decrease fragmentation (note: this is difficult)
Tools for managing a heap
Heap statistics
Give frequency with which each block size was used
Number of times blocks are fragmented  and merged
Configuration utilities
Setting an initial configuration for a heap