|
|
|
|
|
Processor (CPU) |
|
Main Memory (aka real memory, aka primary
memory) |
|
holds data in code |
|
I/O modules (I/O controllers, I/O channels, I/O
processors...) |
|
hardware (with registers called I/O ports) that
moves data between cpu and peripherals like: |
|
secondary memory devices (eg: hard disks) |
|
keyboard, display... |
|
communications equipment |
|
System interconnection (ie: Buses) |
|
communication among processors, memory, and I/O
modules |
|
Secondary memory (aka stable memory, aka
storage) |
|
preserves contents even with computer powered
down |
|
|
|
|
|
|
|
Processor system |
|
Collection of registers with information to
manipulate |
|
A register is a memory location that may contain
program instructions and/or data, sequence of bits [0|1] |
|
|
|
|
|
Different processors have different numbers and
sizes of registers |
|
Some registers act as interface to outside world
(via pins) |
|
The pins are directly connected to special
purpose bus interface registers within the processor |
|
|
|
|
|
|
|
Active behaviour: perform instructions |
|
Instructions operate on the registers |
|
For example: add the contents of two registers
and place the result in a register |
|
Buses/lines are wires that physically connect
the pins of processors, memories, and other devices |
|
|
|
|
|
Registers are made up of and connected by
digital logic circuitry |
|
And, or, and not gates |
|
Values in registers + values on pins is input,
new set of values in registers + pins is output |
|
Clock used to activate changes in processor
state |
|
Several clock cycles are often needed to
accomplish what we think of as an instruction |
|
In general: |
|
The faster the clock the faster we can execute
instructions |
|
The more complex the instruction the more cycles
that are needed |
|
Some processors can execute parts of several
instructions in parallel (pipeline) to increase their instructions
processed per second |
|
|
|
|
|
|
Registers: hold addresses and data |
|
Fixed width memory word that hold binary values |
|
Width = M bits => 2M different
binary values |
|
16 bit register can point to 65536 locations |
|
32 bit register can point to 4294967296 |
|
64 bit register can point to 1.84467E+19
locations |
|
Hexadecimal notation 0...9, A..F |
|
16 values, 1 hex digit is 4 binary bits |
|
0000 => 0
0100 => 4 1000 =>
8 1100 => C |
|
0001 => 1
0101 => 5 1001 =>
9 1101 => D |
|
0010 => 2
0110 => 6 1010 =>
A 1110 => E |
|
0011 => 3
0111 => 7 1011 =>
B 1111 => F |
|
|
|
|
|
Example |
|
Can represent long binary values using shorter
hex notation |
|
1011 0110 0010 1101 16 digit binary value (base
2) |
|
B
6 2 D 4 digit hex value (base 16) |
|
How can we describe abstract information
relevant to a system’s objectives using binary values? |
|
Examples: integers, characters, real numbers,
colours, dates, graphics.... |
|
Must encode information into binary values |
|
Standard schemes often used, but programmers
must sometimes invent their own schemes |
|
Standard formats for data encoding |
|
Unsigned integers (counting integers), signed
integers, real numbers, characters |
|
|
|
|
|
Unsigned integers |
|
Interpret binary value as a base 2 value |
|
1
0 1 0
0 1 |
|
|---> 20 = 1 |
|
|-----> 21 = 0 |
|
|--------> 22 =
0 |
|
|-----------> 23 =
8 |
|
|---------------> 24
= 0 |
|
|-------------------> 25 = 32 |
|
Sum is 41 decimal (base 10) |
|
Size of register determines how many integers
can be encoded this way |
|
Some times systems have double integers or longs
for double the width and more integers to be represented |
|
|
|
|
|
|
Signed Integers |
|
Need encoding for both positive and negative
numbers |
|
Assume M bits in a register word |
|
Use one bit (say the left most, most significant
bit) to represent sign |
|
0 => positive |
|
1 => negative |
|
Encode magnitude with remaining M-1 bits |
|
Example for M=4 |
|
(+1) 0 001 |
|
(-1) 1 001 |
|
------- |
|
1 010 = -2
decimal?? Answer should be zero! |
|
Also, there are two representations of zero |
|
(+0)
0 000 |
|
(-0)
1 000 |
|
|
|
|
|
|
Solution: 2’s complement representation |
|
To negate a value take the 2’s complement |
|
i) complement every bit (flip) 0->1 and
1->0 |
|
ii) add 1 |
|
Consider the decimal value (+1), take its 2’s
complement |
|
0001 -> i) 1110 |
|
ii) 0001 |
|
Add to get 1111 This value represents (-1)
decimal |
|
Add the two together |
|
1 0000 |
|
We truncate to keep the 4 lowest bits => 0000 |
|
We ignore the 5th bit (ie the carry over bit for
now) |
|
In general, for 2’s complement M bit values we
have the range -2M-1 to 2M-1 -1 |
|
|
|
|
|
Characters |
|
Usually represented in ASCII or EBCDIC |
|
ASCII American National Character Set |
|
Can be found in any programming language text |
|
00 Null |
|
.... |
|
0A Line Feed (LF) |
|
OD Return
(CR) |
|
... |
|
41 A |
|
.... |
|
5A Z |
|
... |
|
61 a |
|
..... |
|
|
|
|
|
|
Real numbers cover a wide range of values |
|
Only useful computer representation is the
scientific notation |
|
Often called floating point |
|
For ex: |
|
0.231418x102=23.1418 |
|
0.7621x10-3 = 0.0007621 |
|
|
|
|
|
|
|
|
|
Computer uses binary number representation |
|
For ex. 0.110111 x 21101 |
|
Numbers are normalized which means that the
leading bit is always a one |
|
Some computer use this fact to give an extra bit
of precision (they simply don’t store the implicit leading one) |
|
|
|
|
Used on PCs, many larger machines (not all) |
|
32 bit single precision |
|
|
|
|
|
|
|
Exponent is biased, ie offset by a constant for
hardware efficiency reasons |
|
64 bit double precision |
|
|
|
|
|
|
|
For example |
|
20.5937510 =
.1010010011 x 25 |
|
Sign is 0, exponent is 5 + 127 (bias)=132,
leading 1 in mantissa is not stored |
|
|
|
|
|
|
Bias is chosen as follows |
|
Let e be the number of bits in the exponent |
|
bias = 2(e-1)-1 |
|
Gives exponents in range -bias +1 to bias |
|
e=8 => bias is 127 |
|
=> min exponent = -126 = 00000001 |
|
=> max exponent = 127 =
11111110 |
|
What about 11111111? |
|
Used to represent NaN - Not a Number |
|
Could be caused by overflow or underflow |
|
What about 00000000? |
|
If mantissa is zero => the number 0 |
|
Note this is special, since a leading one is
usually implicit |
|
If mantissa is non-zero => denormalized
numbers |
|
Numbers where leading bit is not 1 |
|
Depending on implementation, may cause an underflow |
|
Too small a number |
|
|
|
|
|
|
1. If mantissa is normalized some bit patterns
may not be used |
|
Wasteful of space (Not true on better
implementations) |
|
For example: |
|
Consider a 1 bit exponent |
|
a 2 bit normalized mantissa (no implied leading one) |
|
all signs positive |
|
To represent 0 : 0.00 x 20 |
|
1/2 :
0.10 x 20 |
|
not used : 0.00 x 21 |
|
not used : 0.01 x 21 |
|
2. Distribution of number representations is not
uniform (property of fixed size mantissa) |
|
Step sizes between numbers gets larger as
exponent increases |
|
Same example as above: |
|
0.10 x 20 to 0.11 x 20
stepsize is 0.01x20 = 0.2510 |
|
0.10 x 21 to 0.11 x 21
stepsize is 0.01x21 = 0.5010 |
|
|
|
|
|
|
3. Normalization of mantissa causes a “hole at
zero” |
|
With same example: |
|
Smallest non-normalized +ve is 0.01x2(-1+1)=0.12510 |
|
Smallest normalized +ve is 0.10x2(-1+1) =0.2510 |
|
Get a pattern as follows: |
|
|
|
|
|
|
|
4. Overflow and underflow |
|
If required exponent exceeds max exponent |
|
Error message? NaN? |
|
If required exponent for normalized number is
less than min exponent |
|
Could use denormal numbers |
|
Could set number to zero |
|
Could get an error message |
|
|
|
|
|
|
5. Rounding and truncation |
|
Let fl(x) be the floating point representation
of the real value x |
|
fl( x ) = x (1 + epsilon), where epsilon is a
small error multiplier, that depends on the exponent |
|
Consider an n bit effective mantissa |
|
Truncation |
|
-2-n <= epsilon <= 0 |
|
Note error is always negative |
|
Rounding |
|
-2-(n+1) <= epsilon <= 2-(n+1) |
|
Error is sometimes -ve, sometimes +ve |
|
6. Comparison |
|
Truncation error can be twice as large and
always negative (bad - results are skewed) |
|
Rounding errors may even out over calculations |
|
|
|
|
|
Implementations may use higher precision for
intermediate calculations |
|
Resort to published precision when storing
values back to variables |
|
This helps to reduce the likelihood of underflow
and overflow in intermediate calculations |
|
Using different/higher precision for
intermediate results may cause portability problems: JAVA requires/mandates
that intermediate results are written back to memory to force uniformity
across platforms |
|
|
|
|
|
Model for memory components |
|
|
|
|
|
|
|
|
|
|
|
|
|
Registers in memory are referred to as cells or
locations (Note: a processor’s registers are a form of local memory) |
|
Each cell has an address (numerical name) used
to select it using the address bus |
|
Number of cells determines number of address
bits |
|
220 memory locations needs 20 address
bits |
|
|
|
|
|
|
Types |
|
ROM (Read only memory) |
|
RAM (Random access memory) |
|
Alterable ROM (Erasable, Programmable ROM,
sometimes called EPROM) |
|
Caches and disks |
|
ROM (Read only Memory) |
|
Contents determined when component is fabricated |
|
Contents can be read but not modified during the
operation of the memory |
|
To read a cell: |
|
Processor sets the address bus to the address of
the desired cell |
|
Processor sets the control bus to indicate an
address is ready |
|
Memory puts the contents of cell on data bus |
|
Memory sets the control bus to indicate that the
data bus now contains the contents of the memory cell |
|
Properties |
|
Contents are not lost if power is removed |
|
ROMS are not programmable, they are fabricated! |
|
|
|
|
|
|
RAM (Random access memory) |
|
Contents of cells may be read or written during
operation of device |
|
Reading a cell is similar to ROM |
|
To write: |
|
Select cell using address bus |
|
Value on data bus replaces contents of cell |
|
Synchronize using the control bus signals |
|
Properties |
|
Contents are lost if power is removed from
component |
|
Once value of a cell is written, value in cell
remains the same until it is overwritten or power is turned off |
|
Contents typically unknown when power is first
applied |
|
|
|
|
|
|
|
Alterable ROM (or EPROM) |
|
Sometimes called erasable programmable ROM |
|
Less common than ROM and RAM |
|
Like a ROM |
|
Contents not lost if power is lost |
|
Like a RAM |
|
Can change the contents while in operation |
|
Usually need to use a special protocol to alter
the contents |
|
Examples: printer configurations, terminal setup |
|
Note: |
|
Disk and tape memory do not qualify as memory
components (under this model)
because individual cells are not accessed directly from a bus |
|
Blocks of data are read/written from memory used
to support input/output |
|
|
|
|
|
|
A cache is a form of memory used to provide fast
access to a copy of data and instructions that are most likely to be used
by the processor |
|
Before accessing data in slow memory check to
see if it is already in the cache, if not move it there by replacing
something else |
|
If you replace a value that has been written, it
must first be written back to the slower memory |
|
Caches are physically closer to the processor
than the rest of memory |
|
Large cache may even be within the chip (Pentium
Pro) |
|
They are usually faster than the rest of memory |
|
They are usually more expensive than the rest
of memory |
|
Rule of thumb: a processor using a cache with a
95% hit rate will run as if all memory is the same speed as the cache |
|
Memory management hardware and operating system
support are needed to ensure the contents of the cache and memory are
consistent |
|
|
|
|
|
|
Small cache of expensive but very fast memory
interacting with slower but much larger memory |
|
Invisible to OS and user programs but interact
with other memory management hardware |
|
Processor first checks if word referenced to is
in cache |
|
If not found in cache, a block of memory
containing the word is moved to the cache |
|
|
|
|
Hit ratio = fraction of access where data is in
cache |
|
T1 = access time for fast memory |
|
T2 = access time for slow memory |
|
T2 >> T1 |
|
When hit ratio is close to 1 the average access
time is close to T1 |
|
|
|
|
Memory reference for both instruction and data
tend to cluster over a long period of time. |
|
Example: once a loop is entered, there is
frequent access to a small set of instructions. |
|
|
|
Hence: once a word gets referenced, it is likely
that nearby words will get referenced often in the near future. |
|
Thus, the hit ratio will be close to 1 even for
a small cache. |
|
|
|
|
A portion of main memory used as a buffer to
temporarily to hold data for the disk |
|
Locality of reference also applies here: once a
record gets referenced, it is likely that nearby records will get
referenced often in the near future. |
|
If a record referenced is not in the disk cache,
the sector containing the record is moved into the disk cache. |
|
|
|
|
|
|
|
Caching is a concept that is exploited by
software systems as well |
|
Used for improving disk input/output subsystems
(such as a distributed file system) |
|
When reading from a disk read a full buffers
worth of data even if the program requests less |
|
On the next read, the data may already be
available in the buffer |
|
Used within application programs themselves |
|
Web browser |
|
Keep a copy of the most recently activated HTML
pages so that if the user goes back/forward the page doesn’t have to be
retrieved again |
|
|
|
|
|
|
|
|
Input/Output (IO) components permit computer
systems to interact with the real world via IO interactions |
|
What is the typical model for interacting with
an I/O component? |
|
|
|
|
|
|
|
|
|
Subset of the processor’s registers are
classified as control, status, and data registers for the device |
|
Writing to control registers influences how IO
interactions are performed, or causes them |
|
Reading data registers obtains input |
|
Writing to data registers causes output |
|
Status registers are read only and give status
of device |
|
Busy, ready, in an error mode |
|
|
|
|
|
Data to/from system bus are buffered in data
register(s) |
|
Status/Control register(s) holds |
|
current status information |
|
current control information from |
|
I/O logic interact with CPU via control bus |
|
Contains logic specific to the interface of each
device |
|
|
|
|
|
|
3 techniques are possible for I/O operation |
|
Programmed I/O |
|
Does not use interrupts: CPU has to wait for
completion of each I/O operation |
|
Interrupt-driven I/O |
|
CPU can execute code during I/O operation: it
gets interrupted when I/O operation is done. |
|
Direct Memory Access |
|
A block of data is transferred directly from/to
memory without going through CPU |
|
|
|
|
I/O module performs the action, on behalf of the
processor |
|
|
|
But the I/O module does not interrupt the CPU
when I/O is done |
|
|
|
Processor is kept busy checking status of I/O
module |
|
|
|
|
WRITE transfer control to the printer driver
(I/O pgm) |
|
I/O pgm prepare I/O module for printing (4) |
|
CPU has to WAIT for I/O command to complete |
|
Long wait for a printer |
|
I/O pgm finishes in (5) and report status of
operation |
|
User program is on hold until WRITE terminates:
not very efficient! |
|
Better: do something else while I/O operation
completes |
|
|
|
|
Processor is interrupted when I/O module ready
to exchange data |
|
Processor is free to do other work |
|
No needless waiting |
|
Consumes a lot of processor time because every
word read or written passes through the processor |
|
|
|
|
CPU issues request to a DMA module (separate
module or incorporated into I/O module) |
|
DMA module transfers a block of data directly to
or from memory (without going through CPU) |
|
An interrupt is sent when the task is complete |
|
The CPU is only involved at the beginning and
end of the transfer |
|
The CPU is free to perform other tasks during
data transfer |
|
|
|
|
|
|
Disk accesses can be caused by: |
|
Specifying: an address for a buffer in memory to
be read/written, the number of bytes, the location on the disk, and the
type of operation (read/write) |
|
Signaling the disk controller that an operation
is required |
|
The disk controller can cause an interrupt when
the operation is complete |
|
Direct memory access |
|
Direct connections can be made between memory
and a disk device |
|
Once a disk is ready to read/write data it can
transfer it directly to/from memory without the help of the processor |
|
Provides for a more efficient use of the
processor |
|
Note: the processor and disk can now compete for
the memory and possibly for the bus/lines connected to the memory |
|
|
|
|
|
|
Interrupt controller circuitry is used to
arbitrate between interrupts from: |
|
Timers |
|
Keyboards |
|
Serial ports |
|
Hard disk controllers |
|
Floppy disk controllers |
|
and Printers |
|
When the processor services the interrupt it
postpones what it is doing to support the interaction |
|
The processing needed to service the interrupt
depends on the component that causes the interrupt |
|
To process an interrupt, the interrupt code
must: |
|
Save the contents of the registers modified to
do the interrupt processing |
|
Do the interrupt processing |
|
Restore the registers before leaving |
|
|
|
|
Is a program that determines nature of the
interrupt and performs whatever actions are needed |
|
Control is transferred to this program |
|
Control must be transferred back to the
interrupted program so that it can be resumed from the point of
interruption |
|
This point of interruption can occur anywhere in
the program |
|
Thus: must save the state of the program
(content of PC + PSW + registers + ...) |
|
|
|
|
|
I/O pgm prepares the I/O module and issues the
I/O command (eg: to printer) |
|
I/O pgm branches to user pgm |
|
User code gets executed during I/O operation
(eg: printing): no waiting |
|
User pgm gets interrupted (x) when I/O operation
is done and branches to interrupt handler to examine status of I/O module |
|
Execution of user code resumes |
|
|
|
|
|
|
|
I/O |
|
signals normal completion of operation or error |
|
Program Exception |
|
overflows |
|
try to execute illegal instruction |
|
reference outside user’s memory space |
|
Timer |
|
preempts a pgm to perform another task |
|
Hardware failure (eg: memory parity error) |
|
|
|
|
Disable interrupts during an interrupt |
|
Interrupts remain pending until the processor
enables interrupts |
|
After interrupt handler routine completes, the
processor checks for additional interrupts |
|
|
|
|
Higher priority interrupts cause lower-priority
interrupts to wait |
|
Causes a lower-priority interrupt handler to be
interrupted |
|
Example: when input arrives from communication
line, it needs to be absorbed quickly to make room for more input |
|
|
|
|
When a program reads a value on an I/O device it
will need to wait for the I/O operation to complete |
|
Interrupts are mostly effective when a single
CPU is shared among several concurrently active processes. |
|
The CPU can then switch to execute another
program when a program waits for the result of the read operation. (more
later) |
|
|
|
|
|
For programming purposes, each processor is
characterized by its register set and its instruction set |
|
Instructions: operations and operands |
|
|
|
|
|
|
|
How does a processor architecture work? |
|
Von-Neuman architecture (repeat 3 steps) |
|
1. Fetch instruction |
|
2. Decode instruction |
|
3. Execute instruction |
|
Older processors did this sequentially |
|
Newer processors have pipelines so that steps
1-3 are done in parallel whenever possible |
|
|
|
|
|
|
|
|
Fetch |
|
How does a processor fetch instructions? |
|
Executes a memory read at the address in the Program
counter (PC) register |
|
Increments the PC to point to the next
instruction location |
|
Instruction fetched from bus and read into the instruction
register (IR) |
|
Instructions act on registers and operands |
|
Processor executes the operation involving its
registers |
|
May cause bus interactions to read and write
operands into registers |
|
|
|
|
|
|
How does a processor get started? |
|
Power is applied |
|
Initializes the PC to contain a specific value |
|
OR |
|
The processor initializes with a memory read
from a specific location |
|
Assumes the value read is the address of the
first instruction to be executed (ie. loads the read value into the PC) |
|
Typically, processors fetch a first instruction
from a ROM that contains a control program |
|
What are the flags for? |
|
Reflect the result of executing instructions |
|
Zero bit: the result of an operation was 0 |
|
Sign bit: the result was positive or negative |
|
Assumes a coding convention for +- values |
|
Carry bit: a carry resulted from an addition |
|
Values of the flags can be tested during some
instructions and influence how the execution proceeds |
|
|
|
|
|
|
What are the basic instruction categories? |
|
1. Data transfers |
|
2. Arithmetic and bit manipulation operations |
|
3. Program control transfer operations |
|
1. Data transfer: transfers data between |
|
memory and registers |
|
between IO components and registers |
|
between registers |
|
Some processors support block transfers to move
more than one word |
|
Ex: MOV destination, source |
|
Move the contents of source to the destination |
|
These instructions do not usually modify flags |
|
|
|
|
|
|
2. Arithmetic and bit manipulation operations |
|
Add, subtract, multiply, divide |
|
Ex: ADD destination, source |
|
Add the contents of source to the contents of
destination and leave the result in destination |
|
Modify the flags as appropriate |
|
Bit manipulation |
|
Logical (and, or, xor, not) |
|
Shift (left, right, logical/arithmetic) |
|
Rotate (left, right) |
|
Flag manipulation (set, clear, complement) |
|
Ex: SHR destination, count |
|
Shift the contents of the destination right the
number of bits specified in the count |
|
For each bit shifted, zero is shifted into the
most significant bit (MSB) |
|
Modify flags as is appropriate |
|
|
|
|
|
|
3. Program control transfer |
|
Determine where the next instruction is located
(sequential transfer is implicit in Von-Neuman Architecture) |
|
Branch instructions |
|
Unconditional |
|
Branch to a location (or offset from this
location) |
|
Conditional |
|
Test flags, transfer under some condition |
|
Iteration |
|
Perform arithmetic modification to a register
(increment or decrement) |
|
Conditionally transfer based on result |
|
Subroutine support |
|
Call/return (unconditional and conditional) |
|
HALT to terminate a program |
|
|
|
|
unsigned int R0, R1, R2; Register set for processor |
|
unsigned int M[64K]; Memory |
|
R0 = M[0]; Move |
|
R1 = 5; Move |
|
R2 = R2 + R1; -- too complicated |
|
R2 = R0; Move |
|
R2 += R1; Add |
|
If R2 != 0 Goto End Branch |
|
R2 = 0; Move |
|
M[0] = R2; Move |
|
End: HALT |
|