1
Introduction to DSP Processor
Hans Kuo
hans.kuo@tatung.com
2
OUTLINE
 Introduction to DSP Processor
 C6000 Architecture
 C6000 Memory Map
 Homework 1
3
OUTLINE
 Introduction to DSP Processor
 C6000 Architecture
 C6000 Memory Map
 Homework 1
Silicon Solutions
 Decision table for designers of real-time
“Choosing the Right Architecture for Real-Time Signal Processing Designs”, Leon Adams, Texas Instruments
4
 Programmability : GPP > DSP > FPGA > ASIC
 Performance : ASIC > FPGA > DSP > GPP
 Example : Wireless communication
 GPP : OS, Network Protocol
 DSP : A/V Codec
 ASIC, FPGA : Reed Solomon, Viterbi decoder
Evaluating Category ASIC FPGA DSP GPP
Programmability 1 4 5 5
Development Cycle 2 3 4 5
Performance 5 5 4 2
Power consumption 4 2 2 2
GPP : general-purpose processor DSP : digital signal processor
FPGA : field programmable gate array
ASIC : application specific IC
Silicon Solutions
5
Ti Embedded Processors
32-bit
Real-time
32-bit
ARM (MCU)
ARM M3/M4
Industry Std
Low Power
<100 MHz
Flash
64 KB to 1 MB
USB, ENET,
ADC, PWM, SPI
Host
Control
$2.00 to $8.00
16-bit
Microcontrollers
MSP430
Ultra-Low
Power
Up to 25 MHz
Flash
1 KB to 256 KB
Analog I/O, ADC
LCD, USB, RF
Measurement,
Sensing, General
Purpose
$0.49 to $9.00
DSPs
C647x, C64x+,
C674x, C55x
Leadership DSP
Performance
24,000 MMACS
Up to 3 MB
L2 Cache
1G EMAC, SRIO,
DDR2, PCI-66
Comm, WiMAX,
Industrial/
Medical Imaging
$4.00 to $99.00+
ARM(MPU)
ARM9
Cortex A-8
Industry-Std Core,
High-Perf GPP
Accelerators
MMU
USB, LCD,
MMC, EMAC
Linux/WinCE
User Apps
$8.00 to $35.00
DSP
DaVinci,
OMAP
Industry-Std Core +
DSP for Signal Proc.
4800 MMACs/
1.07 DMIPS/MHz
MMU, Cache
VPSS, USB,
EMAC, MMC
Linux/Win +
Video, Imaging,
Multimedia
$12.00 to $65.00
ARM + DSP
ARM-Based
C2000™
Fixed &
Floating Point
Up to 300 MHz
Flash
32 KB to 512 KB
PWM, ADC,
CAN, SPI, I2C
Motor Control,
Digital Power,
Lighting, Sensing
$1.50 to $20.00
6
7
DSP Applications
8
Why do we need DSP processors?
 The Sum of Products (SOP) or Multiply-
accumulate(MAC) is the key element in most DSP
algorithms:Algorithm Equation
Finite Impulse Response Filter 

M
k
k knxany
0
)()(
Infinite Impulse Response Filter  

N
k
k
M
k
k knybknxany
10
)()()(
Convolution 

N
k
knhkxny
0
)()()(
Discrete Fourier Transform 



1
0
])/2(exp[)()(
N
n
nkNjnxkX 
Discrete Cosine Transform    









1
0
12
2
cos).().(
N
x
xu
N
xfucuF

9
Hardware vs. Software multiplication
 DSP processors are optimized to perform
multiplication and addition operations.
 Multiplication and addition are done in hardware
and in one cycle.
 Example: 4-bit multiply (unsigned).
1011
x 1110
1011
x 1110
Hardware Software
10011010 0000
1011.
1011..
1011...
10011010
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
10
OUTLINE
 Introduction to DSP Processor
 C6000 Architecture
 C6000 Memory Map
 Homework 1
11
C6000 System Block Diagram
P
E
R
I
P
H
E
R
A
L
S
Internal Memory
Internal Buses
External
Memory
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs(B0-B15)
Regs(A0-A15)
Control Regs
CPU
12
C6000 Central Processing Unit
P
E
R
I
P
H
E
R
A
L
S
Internal Memory
Internal Buses
External
Memory
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs(B0-B15)
Regs(A0-A15)
Control Regs
CPU
13
Implementation of Sum of Products
(SOP)
 SOP is the key element
for most DSP algorithms.
 let’s write the code for
this algorithm and at the
same time discover the
C6000 architecture.
 The implementation in
this module will be done
in assembly.
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
Y =
N
 an xn
n = 1
*
= a1 * x1 + a2 * x2 +... + aN * xN
14
Multiply (MPY)
The multiplication of a1 by x1 is done in
assembly by the following instruction:
MPY a1, x1, Y
This instruction is performed by a
multiplier unit that is called “.M”
Y =
N
 an xn
n = 1
*
= a1 * x1 + a2 * x2 +... + aN * xN
15
Multiply (.M unit)
.M
Y =
40
 an xn
n = 1
*
The . M unit performs multiplications in
hardware
MPY .M a1, x1, Y
16
Addition (.?)
.M
.?
Y =
40
 an xn
n = 1
*
MPY .M a1, x1, prod
ADD .? Y, prod, Y
17
Add (.L unit)
.M
.L
Y =
40
 an xn
n = 1
*
MPY .M a1, x1, prod
ADD .L Y, prod, Y
C6000 use registers to hold the operands, so lets change this
code.
18
Register File - A
Y =
40
 an xn
n = 1
*
MPY .M a1, x1, prod
ADD .L Y, prod, Y
.M
.L
A0
A1
A2
A3
A4
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
Let us correct this by replacing a, x, prod and Y by the registers
as shown above.
19
Specifying Register Names
Y =
40
 an xn
n = 1
*
MPY .M A0, A1, A3
ADD .L A4, A3, A4
Register File A contains 16 registers (A0 -A15) which are 32-bits
wide.
.M
.L
A0
A1
A2
A3
A4
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
20
Data loading
Q: How do we load the
operands into the registers?
.M
.L
A0
A1
A2
A3
A4
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
21
Load Unit “.D”
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
.D
Data Memory
A: The operands are loaded
into the registers by loading
them from the memory
using the .D unit.
Q: How do we load the
operands into the registers?
Q: Which instruction(s) can be
used for loading operands
from the memory to the
registers?
A: The load instructions.
(LDB, LDH,LDW,LDDW)
22
Using the Load Instructions
Y =
40
 an xn
n = 1
*
LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
.D
Data Memory
23
Creating a loop
 So far we have only
implemented the SOP for
one tap only, i.e.
Y= a1 * x1
 So let’s create a loop so
that we can implement
the SOP for N Taps.
Y =
40
 an xn
n = 1
*
LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
24
Create a label to branch
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
Y =
40
 an xn
n = 1
*
25
Add a branch instruction, B.
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .? loop
Y =
40
 an xn
n = 1
*
26
Which unit is used by the B instruction?
.S
Y =
40
 an xn
n = 1
*
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
.D
Data Memory
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .S loop
27
How can we add more processing
power to this processor?
.S
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
32-bits
.D
Data Memory
(1 ) Increase the clock
frequency.
(2 ) Increase the number
of Processing units.
28
Increase the number of Processing
units
.S
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
32-bits
.D
Data Memory
.S2
.M2
.L2
.D2
B0
B1
B2
B3
B15
Register File B
.
.
.
32-bits
29
C6211 Instruction Set (by unit)
.S Unit
MVKLH
NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO
ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKL
MVKH
.M Unit
SMPY
SMPYH
MPY
MPYH
.L Unit
NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM
.D Unit
STB/H/W
SUB
SUBA
ZERO
ADD
ADDA
LDB/H/W
MV
NEG
Other
IDLENOP
30
C language vs Assembly
Hand
Optimize
Assembly
Optimizer
Compiler
Optimizer
Source Efficiency Effort
C
Linear
ASM
ASM
70-100%
95-100%
100%
Low
Med
High
31
'C6x Peripherals
Internal Memory
Internal Buses
External
Memory
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs(B0-B15)
Regs(A0-A15)
Control Regs
CPU
P
E
R
I
P
H
E
R
A
L
S
32
'C6x Peripherals
EMIF (External Memory Interface)
- Glueless access to async/sync memory
EPROM, SRAM, SDRAM, SBSRAM
DMA/EDMA (Enhance Direct Memory Acces)
- 4/16 Channels
BOOT
- Boot from 4M external block
- Boot from HPI/XB
‘C6x
CPU
EMIF
DMA
Boot
External
Memory
McBSP
HPI/XB
Timer
PLL
McBSP (Multi-Channel Buffered
Serial Port)
- High speed sync serial comm
- T1/E1/MVIP interface
HPI (Host Port Interface)
/Expansion Bus (XB)
- 16/32-bit host P access
Timer/Counters
- Two 32-bit Timer/Counters
33
OUTLINE
 Introduction to DSP Processor
 C6000 Architecture
 C6000 Memory Map
 Homework 1
 Reference
34
C6000 Memory
P
E
R
I
P
H
E
R
A
L
S
Internal Memory
Internal Buses
External
Memory
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs(B0-B15)
Regs(A0-A15)
Control Regs
CPU
35
C6416 Memory Map
FFFF_FFFF
0000_0000 1024KB Internal
(L2 cache)
Internal Memory
 Unified (data or prog)
 1024KB
On-chip Peripherals
0180_0000
External Memory
 Async (SRAM, ROM, etc.)
 Sync (SBSRAM, SDRAM)
6000_0000
8000_0000
EMIFB 64MB x 4
External
Level 1 Cache
 16KB Program
 16KB Data
 Not in map CPU L2
1024K
16K
P
16K
D
EMIFA 256MB x 4
External
36
Memory Allocation
C source code
Compiler
Assmebler
COFF
Object file
Text
Data
Bss
COFF
Object file
ROM
External RAM
Internal RAM
Target Memory0x00000
0xfffff
SECTION
Stack
Heap
Text
Data
Bss
MEMORY
Memory Layout
MEMORY
{
ISRAM : origin = 0x00000000, len = 0x00100000
}
SECTIONS
{
.text > ISRAM
}
37
What is stored in memory ?
 What is stored in memory ?
 Code
 Constants
 Global and static variables
 Local variables
 Dynamic memory
Memory
0x00000
0xfffff
38
How is memory organized?
 How is memory organized?
 text : Code and constant data
 data : Initialized global and
static variables
 bss : Unintialized global and
static variables
 stack :
 Local variables
 Function return addresses
 Arguments of function
 heap : Dynamic memory
Memory
0x00000
0xfffff
stack
heap
bss
data
text
39
How is memory allocated?
 How is memory allocated ?
long array[100];
long bufsize =100;
int main(void) {
int i;
char* buf;
i=10;
buf=f1(i);
return(0);
}
Char* f1(int n){
int k;
Return malloc(bufsize);
}
Memory
0x00000
0xfffff
heap
bss
data
text
stack
100 byte block
array[100]
bufsize = 100
int main(void) {
i=10;
buf=f1(i);
return(0);
} …
Main return address
i
buf
f1 argument n
f1 return address
k
40
Memory Allocation & Deallocation
 How, and when , is memory allocated?
 Gobal and static variables = program startup
 Local variables = function call
 Dynamic memory = malloc()
 How, and when, is memory deallocated?
 Global and static variables = program finish
 Local variables = function return
 Dynamic memory = free()
41
When is memory allocated?
long array[100];
long bufsize =100;
int main(void) {
int i;
char* buf;
i=10;
buf=f1(i);
return(0);
}
Char* f1(int n){
int k;
Return malloc(bufsize);
}
bss : 0 at startup
data : 100 at startup
Stack : at function call
Stack : at function call
Heap : 100 bytes at malloc()
42
When is memory deallocated?
long array[100];
long bufsize =100;
int main(void) {
int i;
char* buf;
i=10;
buf=f1(i);
return(0);
}
Char* f1(int n){
int k;
Return malloc(bufsize);
}
Available till termination
Available till termination
Deallocate on return from main()
Deallocate on return from f1()
Deallocate on free()
43
Sections defined in C6000 compiler
 Initialized sections
 .cinit : Initial values for global/static variables
 .const : Global and static string literals
 .switch : Tables for switch instructions
 .text : code
 Uninitialized sections
 .bss : Global and static variables
 .stack : Stack(local variables, return address, arguments)
 .far : Global and statics declared far
 .sysmem : Memory for malloc functions (heap)
44
Example : 6416 DSK
16MB512KB
45
Example : C6416 DSK
Base Length
Internal Memory 0x00000000 0x00100000 (1024K)
External SDRAM 0x80000000 0x01000000(16M)
External Flash 0x64000000 0x00080000 (512K)
46
Linker command file (*.cmd)
 MEMORY Directive
 System memory description
 Name : origin = address, length = size-in-bytes
MEMORY
{
ISRAM : origin = 0x00000000, len = 0x00100000
SDRAM : origin = 0x80000000, len = 0x01000000
FLASH : origin = 0x64000000, len = 0x00080000
}
47
Linker command file (*.cmd)
 SECTIONS Directive
 Binding sections to memory
SECTIONS
{
.text > ISRAM
.bss > ISRAM
.cinit > ISRAM
…
}
48
C6416.cmd
-stack 0x400
MEMORY
{
ISRAM : origin = 0x00000000, len = 0x00100000
SDRAM : origin = 0x80000000, len = 0x01000000
FLASH : origin = 0x64000000, len = 0x00080000
}
SECTIONS
{
.text > ISRAM
.bss > ISRAM
.cinit > ISRAM
.stack > ISRAM
…}
49
DSP/BIOS Configure Tool (*.cdb)
ISRAM
Properties
System memory description
50
DSP/BIOS Configure Tool (*.cdb)
Properties
Binding sections to memory
Program Cases :
 Case 1 :
51
Void main()
{
int Image[1000];
….
}
int Image[1000];
Void main()
{
….
}
stack = ?
stack 0x400 (1024)
Program Cases :
 Case 2 :
52
Void main()
{
double Image[200000];
….
}
52
bss > SDRAM
stack 0x400 (1024)
bss < 0x100000 (1024k)
double Image[200000];
Void main()
{
….
}
Q&A

1 introduction to dsp processor 20140919

  • 1.
    1 Introduction to DSPProcessor Hans Kuo hans.kuo@tatung.com
  • 2.
    2 OUTLINE  Introduction toDSP Processor  C6000 Architecture  C6000 Memory Map  Homework 1
  • 3.
    3 OUTLINE  Introduction toDSP Processor  C6000 Architecture  C6000 Memory Map  Homework 1
  • 4.
    Silicon Solutions  Decisiontable for designers of real-time “Choosing the Right Architecture for Real-Time Signal Processing Designs”, Leon Adams, Texas Instruments 4
  • 5.
     Programmability :GPP > DSP > FPGA > ASIC  Performance : ASIC > FPGA > DSP > GPP  Example : Wireless communication  GPP : OS, Network Protocol  DSP : A/V Codec  ASIC, FPGA : Reed Solomon, Viterbi decoder Evaluating Category ASIC FPGA DSP GPP Programmability 1 4 5 5 Development Cycle 2 3 4 5 Performance 5 5 4 2 Power consumption 4 2 2 2 GPP : general-purpose processor DSP : digital signal processor FPGA : field programmable gate array ASIC : application specific IC Silicon Solutions 5
  • 6.
    Ti Embedded Processors 32-bit Real-time 32-bit ARM(MCU) ARM M3/M4 Industry Std Low Power <100 MHz Flash 64 KB to 1 MB USB, ENET, ADC, PWM, SPI Host Control $2.00 to $8.00 16-bit Microcontrollers MSP430 Ultra-Low Power Up to 25 MHz Flash 1 KB to 256 KB Analog I/O, ADC LCD, USB, RF Measurement, Sensing, General Purpose $0.49 to $9.00 DSPs C647x, C64x+, C674x, C55x Leadership DSP Performance 24,000 MMACS Up to 3 MB L2 Cache 1G EMAC, SRIO, DDR2, PCI-66 Comm, WiMAX, Industrial/ Medical Imaging $4.00 to $99.00+ ARM(MPU) ARM9 Cortex A-8 Industry-Std Core, High-Perf GPP Accelerators MMU USB, LCD, MMC, EMAC Linux/WinCE User Apps $8.00 to $35.00 DSP DaVinci, OMAP Industry-Std Core + DSP for Signal Proc. 4800 MMACs/ 1.07 DMIPS/MHz MMU, Cache VPSS, USB, EMAC, MMC Linux/Win + Video, Imaging, Multimedia $12.00 to $65.00 ARM + DSP ARM-Based C2000™ Fixed & Floating Point Up to 300 MHz Flash 32 KB to 512 KB PWM, ADC, CAN, SPI, I2C Motor Control, Digital Power, Lighting, Sensing $1.50 to $20.00 6
  • 7.
  • 8.
    8 Why do weneed DSP processors?  The Sum of Products (SOP) or Multiply- accumulate(MAC) is the key element in most DSP algorithms:Algorithm Equation Finite Impulse Response Filter   M k k knxany 0 )()( Infinite Impulse Response Filter    N k k M k k knybknxany 10 )()()( Convolution   N k knhkxny 0 )()()( Discrete Fourier Transform     1 0 ])/2(exp[)()( N n nkNjnxkX  Discrete Cosine Transform              1 0 12 2 cos).().( N x xu N xfucuF 
  • 9.
    9 Hardware vs. Softwaremultiplication  DSP processors are optimized to perform multiplication and addition operations.  Multiplication and addition are done in hardware and in one cycle.  Example: 4-bit multiply (unsigned). 1011 x 1110 1011 x 1110 Hardware Software 10011010 0000 1011. 1011.. 1011... 10011010 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
  • 10.
    10 OUTLINE  Introduction toDSP Processor  C6000 Architecture  C6000 Memory Map  Homework 1
  • 11.
    11 C6000 System BlockDiagram P E R I P H E R A L S Internal Memory Internal Buses External Memory .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Regs(B0-B15) Regs(A0-A15) Control Regs CPU
  • 12.
    12 C6000 Central ProcessingUnit P E R I P H E R A L S Internal Memory Internal Buses External Memory .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Regs(B0-B15) Regs(A0-A15) Control Regs CPU
  • 13.
    13 Implementation of Sumof Products (SOP)  SOP is the key element for most DSP algorithms.  let’s write the code for this algorithm and at the same time discover the C6000 architecture.  The implementation in this module will be done in assembly. Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Y = N  an xn n = 1 * = a1 * x1 + a2 * x2 +... + aN * xN
  • 14.
    14 Multiply (MPY) The multiplicationof a1 by x1 is done in assembly by the following instruction: MPY a1, x1, Y This instruction is performed by a multiplier unit that is called “.M” Y = N  an xn n = 1 * = a1 * x1 + a2 * x2 +... + aN * xN
  • 15.
    15 Multiply (.M unit) .M Y= 40  an xn n = 1 * The . M unit performs multiplications in hardware MPY .M a1, x1, Y
  • 16.
    16 Addition (.?) .M .? Y = 40 an xn n = 1 * MPY .M a1, x1, prod ADD .? Y, prod, Y
  • 17.
    17 Add (.L unit) .M .L Y= 40  an xn n = 1 * MPY .M a1, x1, prod ADD .L Y, prod, Y C6000 use registers to hold the operands, so lets change this code.
  • 18.
    18 Register File -A Y = 40  an xn n = 1 * MPY .M a1, x1, prod ADD .L Y, prod, Y .M .L A0 A1 A2 A3 A4 A15 Register File A . . . a1 x1 prod 32-bits Y Let us correct this by replacing a, x, prod and Y by the registers as shown above.
  • 19.
    19 Specifying Register Names Y= 40  an xn n = 1 * MPY .M A0, A1, A3 ADD .L A4, A3, A4 Register File A contains 16 registers (A0 -A15) which are 32-bits wide. .M .L A0 A1 A2 A3 A4 A15 Register File A . . . a1 x1 prod 32-bits Y
  • 20.
    20 Data loading Q: Howdo we load the operands into the registers? .M .L A0 A1 A2 A3 A4 A15 Register File A . . . a1 x1 prod 32-bits Y
  • 21.
    21 Load Unit “.D” .M .L A0 A1 A2 A3 A15 RegisterFile A . . . a1 x1 prod 32-bits Y .D Data Memory A: The operands are loaded into the registers by loading them from the memory using the .D unit. Q: How do we load the operands into the registers? Q: Which instruction(s) can be used for loading operands from the memory to the registers? A: The load instructions. (LDB, LDH,LDW,LDDW)
  • 22.
    22 Using the LoadInstructions Y = 40  an xn n = 1 * LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 .M .L A0 A1 A2 A3 A15 Register File A . . . a1 x1 prod 32-bits Y .D Data Memory
  • 23.
    23 Creating a loop So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1  So let’s create a loop so that we can implement the SOP for N Taps. Y = 40  an xn n = 1 * LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4
  • 24.
    24 Create a labelto branch loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 Y = 40  an xn n = 1 *
  • 25.
    25 Add a branchinstruction, B. loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .? loop Y = 40  an xn n = 1 *
  • 26.
    26 Which unit isused by the B instruction? .S Y = 40  an xn n = 1 * .M .L A0 A1 A2 A3 A15 Register File A . . . a1 x1 prod 32-bits Y .D Data Memory loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .S loop
  • 27.
    27 How can weadd more processing power to this processor? .S .M .L A0 A1 A2 A3 A15 Register File A . . . 32-bits .D Data Memory (1 ) Increase the clock frequency. (2 ) Increase the number of Processing units.
  • 28.
    28 Increase the numberof Processing units .S .M .L A0 A1 A2 A3 A15 Register File A . . . 32-bits .D Data Memory .S2 .M2 .L2 .D2 B0 B1 B2 B3 B15 Register File B . . . 32-bits
  • 29.
    29 C6211 Instruction Set(by unit) .S Unit MVKLH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH .M Unit SMPY SMPYH MPY MPYH .L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM .D Unit STB/H/W SUB SUBA ZERO ADD ADDA LDB/H/W MV NEG Other IDLENOP
  • 30.
    30 C language vsAssembly Hand Optimize Assembly Optimizer Compiler Optimizer Source Efficiency Effort C Linear ASM ASM 70-100% 95-100% 100% Low Med High
  • 31.
    31 'C6x Peripherals Internal Memory InternalBuses External Memory .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Regs(B0-B15) Regs(A0-A15) Control Regs CPU P E R I P H E R A L S
  • 32.
    32 'C6x Peripherals EMIF (ExternalMemory Interface) - Glueless access to async/sync memory EPROM, SRAM, SDRAM, SBSRAM DMA/EDMA (Enhance Direct Memory Acces) - 4/16 Channels BOOT - Boot from 4M external block - Boot from HPI/XB ‘C6x CPU EMIF DMA Boot External Memory McBSP HPI/XB Timer PLL McBSP (Multi-Channel Buffered Serial Port) - High speed sync serial comm - T1/E1/MVIP interface HPI (Host Port Interface) /Expansion Bus (XB) - 16/32-bit host P access Timer/Counters - Two 32-bit Timer/Counters
  • 33.
    33 OUTLINE  Introduction toDSP Processor  C6000 Architecture  C6000 Memory Map  Homework 1  Reference
  • 34.
    34 C6000 Memory P E R I P H E R A L S Internal Memory InternalBuses External Memory .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Regs(B0-B15) Regs(A0-A15) Control Regs CPU
  • 35.
    35 C6416 Memory Map FFFF_FFFF 0000_00001024KB Internal (L2 cache) Internal Memory  Unified (data or prog)  1024KB On-chip Peripherals 0180_0000 External Memory  Async (SRAM, ROM, etc.)  Sync (SBSRAM, SDRAM) 6000_0000 8000_0000 EMIFB 64MB x 4 External Level 1 Cache  16KB Program  16KB Data  Not in map CPU L2 1024K 16K P 16K D EMIFA 256MB x 4 External
  • 36.
    36 Memory Allocation C sourcecode Compiler Assmebler COFF Object file Text Data Bss COFF Object file ROM External RAM Internal RAM Target Memory0x00000 0xfffff SECTION Stack Heap Text Data Bss MEMORY Memory Layout MEMORY { ISRAM : origin = 0x00000000, len = 0x00100000 } SECTIONS { .text > ISRAM }
  • 37.
    37 What is storedin memory ?  What is stored in memory ?  Code  Constants  Global and static variables  Local variables  Dynamic memory Memory 0x00000 0xfffff
  • 38.
    38 How is memoryorganized?  How is memory organized?  text : Code and constant data  data : Initialized global and static variables  bss : Unintialized global and static variables  stack :  Local variables  Function return addresses  Arguments of function  heap : Dynamic memory Memory 0x00000 0xfffff stack heap bss data text
  • 39.
    39 How is memoryallocated?  How is memory allocated ? long array[100]; long bufsize =100; int main(void) { int i; char* buf; i=10; buf=f1(i); return(0); } Char* f1(int n){ int k; Return malloc(bufsize); } Memory 0x00000 0xfffff heap bss data text stack 100 byte block array[100] bufsize = 100 int main(void) { i=10; buf=f1(i); return(0); } … Main return address i buf f1 argument n f1 return address k
  • 40.
    40 Memory Allocation &Deallocation  How, and when , is memory allocated?  Gobal and static variables = program startup  Local variables = function call  Dynamic memory = malloc()  How, and when, is memory deallocated?  Global and static variables = program finish  Local variables = function return  Dynamic memory = free()
  • 41.
    41 When is memoryallocated? long array[100]; long bufsize =100; int main(void) { int i; char* buf; i=10; buf=f1(i); return(0); } Char* f1(int n){ int k; Return malloc(bufsize); } bss : 0 at startup data : 100 at startup Stack : at function call Stack : at function call Heap : 100 bytes at malloc()
  • 42.
    42 When is memorydeallocated? long array[100]; long bufsize =100; int main(void) { int i; char* buf; i=10; buf=f1(i); return(0); } Char* f1(int n){ int k; Return malloc(bufsize); } Available till termination Available till termination Deallocate on return from main() Deallocate on return from f1() Deallocate on free()
  • 43.
    43 Sections defined inC6000 compiler  Initialized sections  .cinit : Initial values for global/static variables  .const : Global and static string literals  .switch : Tables for switch instructions  .text : code  Uninitialized sections  .bss : Global and static variables  .stack : Stack(local variables, return address, arguments)  .far : Global and statics declared far  .sysmem : Memory for malloc functions (heap)
  • 44.
    44 Example : 6416DSK 16MB512KB
  • 45.
    45 Example : C6416DSK Base Length Internal Memory 0x00000000 0x00100000 (1024K) External SDRAM 0x80000000 0x01000000(16M) External Flash 0x64000000 0x00080000 (512K)
  • 46.
    46 Linker command file(*.cmd)  MEMORY Directive  System memory description  Name : origin = address, length = size-in-bytes MEMORY { ISRAM : origin = 0x00000000, len = 0x00100000 SDRAM : origin = 0x80000000, len = 0x01000000 FLASH : origin = 0x64000000, len = 0x00080000 }
  • 47.
    47 Linker command file(*.cmd)  SECTIONS Directive  Binding sections to memory SECTIONS { .text > ISRAM .bss > ISRAM .cinit > ISRAM … }
  • 48.
    48 C6416.cmd -stack 0x400 MEMORY { ISRAM :origin = 0x00000000, len = 0x00100000 SDRAM : origin = 0x80000000, len = 0x01000000 FLASH : origin = 0x64000000, len = 0x00080000 } SECTIONS { .text > ISRAM .bss > ISRAM .cinit > ISRAM .stack > ISRAM …}
  • 49.
    49 DSP/BIOS Configure Tool(*.cdb) ISRAM Properties System memory description
  • 50.
    50 DSP/BIOS Configure Tool(*.cdb) Properties Binding sections to memory
  • 51.
    Program Cases : Case 1 : 51 Void main() { int Image[1000]; …. } int Image[1000]; Void main() { …. } stack = ? stack 0x400 (1024)
  • 52.
    Program Cases : Case 2 : 52 Void main() { double Image[200000]; …. } 52 bss > SDRAM stack 0x400 (1024) bss < 0x100000 (1024k) double Image[200000]; Void main() { …. }
  • 53.