Pipeline r014

PIPELINING AND I/O
ORGANISATION

PIPELINING
A technique of decomposing a sequential process
into sub operations, with each sub process being
executed in a partial dedicated segment that
operates concurrently with all other segments.

R1  Ai, R2  Bi Load Ai and Bi
R3  R1 * R2, R4  Ci Multiply and load Ci
R5  R3 + R4 Add
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai
R1 R2
Multiplier
R3 R4
Adder
R5
Memory
Pipelining
Bi Ci
Segment 1
Segment 2
Segment 3
ARITHMETIC PIPELINING

OPERATIONS IN EACH PIPELINE STAGE
Clock
Pulse
Number
Segment 1 Segment 2 Segment 3
R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7
Pipelining

GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
S R
1 1 S R
2 2 S R
3 3 S R
4 4
Input
Clock
Space-Time Diagram
1 2 3 4 5 6 7 8 9
T1
T1
T1
T1
T2
T2
T2
T2
T3
T3
T3
T3 T4
T4
T4
T4 T5
T5
T5
T5 T6
T6
T6
T6
Clock cycles
Segment 1
2
3
4
Pipelining
Behavior of the pipeline is illustrated with a space time diagram.
Space time diagram:
This shows the segment utilization as a function of time.

Space Time diagram
• The horizontal axis displays the time in clock cycle
and vertical axis gives the segment number
• Diagram shows 6 task (T1 to T6)executed in four
segment
Task
is defined as the total operation performed going
through all the segment in the pipeline
Cont….

Consider
• k: segment pipeline with clock cycle time tp to execute n tasks
• first task T1 requires a time equal tkp to complete its operation
since there are k segments in the pipe .
• Remaining n-1 tasks emerge from the pipe at the rate of one
task per clock cycle and they will complete after a time equal to
(n-1)tp.
• Therefore to complete n task using k-segement pipeline
requires K+(n-1) clock cycle.
• Example 4 segment , 6task time required to complete op.
4+(6-1)=9 clock cycle
Cont….

• For nonpipeline unit that perform the same operation and takes a
time equal to tn to complete each h task.
• The total time required for n tasks =ntn
• Speedup of a pipeline processing over an equivalent nonpipeline
processing is defined by the ratio
• S=ntn / (K+n-1)tp
• As the number of tasks increases , n beomes larger the k-1, and
k+n-1 approaches the value of n under this condition ,the speedup
becomes S=tn /tp
• If we assume that the time it takes to process a task is the same in
the pipeline and nonpipeline circuit, tn=ktp
• Including the assumption speedup reduces to S=Ktp/tp=K
• This shows that the theoretical max. speedup that a pipeline can
provide is k, where k is the no. of segment in the pipeline
Cont…

P1
Ii
P2
Ii+1
P3
Ii+2
P4
Ii+3
Multiple Functional Units
Pipelining
Cont…

ARITHMETIC PIPELINE
Floating-point adder
[1] Compare the exponents
[2] Align the mantissa
[3] Add/sub the mantissa
[4] Normalize the result
X = A x 2a
Y = B x 2b
R
Compare
exponents
by subtraction
a b
R
Choose exponent
Exponents
R
A B
Align mantissa
Mantissas
Difference
R
Add or subtract
mantissas
R
Normalize
result
R
R
Adjust
exponent
R
Segment 1:
Segment 2:
Segment 3:
Segment 4:

ARITHMETIC PIPELINE
Reasons why pipeline cannot operate at its max theoretical rate
 Different segment take different time to complete their sub
operation.
 Clock cycle must be equal to time delay of the segment with
the max. propagation time.
 This cause all other segment to waste time while waiting for
the next clock pulse
 Moreover it is not always correct to assume that a non pipe
circuit has the same delay as that of an equivalent pipeline
circuit.
 Many intermediate register not required in single unit, can be
constructed using combinational circuit

4-STAGE FLOATING POINT ADDER
A = a x 2 p B = b x 2 q
p a q b
Exponent
subtractor
Fraction
selector
Fraction with min(p,q)
Right shifter
Other
fraction
t = |p - q|
r = max(p,q)
Fraction
adder
Leading zero
counter
r c
Left shifter
c
Exponent
adder
r
s d
d
Stages:
S1
S2
S3
S4
C = A + B = c x 2 = d x 2
r s
(r = max (p,q), 0.5  d < 1)
Arithmetic Pipeline

INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place
* Some instructions skip some phases
* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase
==> 4-Stage Pipeline
[1] FI: Fetch an instruction from memory
[2] DA: Decode the instruction and calculate
the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation
Instruction Pipeline

INSTRUCTION PIPELINE
Execution of Three Instructions in a 4-Stage Pipeline
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2
Conventional
Pipelined
FI DA FO EX
FI DA FO EX
FI DA FO EX
i
i+1
i+2

INSTRUCTION EXECUTION IN A 4-STAGE
PIPELINE
1 2 3 4 5 6 7 8 9 10 12 13
11
FI DA FO EX
1
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
FI DA FO EX
2
3
4
5
6
7
FI
Step:
Instruction
(Branch)
Instruction Pipeline
Fetch instruction
from memory
Decode instruction
and calculate
effective address
Branch?
Fetch operand
from memory
Execute instruction
Interrupt?
Interrupt
handling
Update PC
Empty pipe
no
yes
yes
no
Segment1:
Segment2:
Segment3:
Segment4:

RISC PIPELINE
Instruction Cycles of Three-Stage Instruction Pipeline
RISC
- Machine with a very fast clock cycle that
executes at the rate of one instruction per cycle
<- Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations
Data Manipulation Instructions
I: Instruction Fetch
A: Decode, Read Registers, ALU Operations
E: Write a Register
Load and Store Instructions
A: Decode, Evaluate Effective Address
E: Register-to-Memory or Memory-to-Register
Program Control Instructions
A: Decode, Evaluate Branch Address
E: Write Register(PC)

DELAYED LOAD
Three-segment pipeline timing
Pipeline timing with data conflict
clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E
Pipeline timing with delayed load
clock cycle 1 2 3 4 5 6 7
Load R1 I A E
Load R2 I A E
NOP I A E
Add R1+R2 I A E
Store R3 I A E
LOAD: R1  M[address 1]
LOAD: R2  M[address 2]
ADD: R3  R1 + R2
STORE: M[address 3]  R3
RISC Pipeline
The data dependency is taken
care by the compiler rather
than the hardware
M[address 1] = 2000
M[address 2] =2001
Value at 2000 = 5
Value at 2001 =8
R1=5
R2=8
R3=5+8=13
M[address 3] =2003
2003<-r3
2003=13

DELAYED BRANCH
1
I
3 4 6
5
2
Clock cycles:
1. Load A
2. Increment
4. Subtract
5. Branch to X
7
3. Add
8
6. NOP
E
I A E
I A E
I A E
I A E
I A E
9 10
7. NOP
8. Instr. in X
I A E
I A E
1
I
3 4 6
5
2
Clock cycles:
1. Load A
2. Increment
4. Add
5. Subtract
7
3. Branch to X
8
6. Instr. in X
E
I A E
I A E
I A E
I A E
I A E
Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps
Using no-operation instructions
Rearranging the instructions
RISC Pipeline

Pipeline r014

More Related Content

What's hot

Similar to Pipeline r014

More from arunachalamr16

Recently uploaded

Pipeline r014