What is Assembly Language?
Introduction to the GNU/Linux
assembler and linker
for Intel Pentium processors
High-Level Language
• Most programming nowdays is done using
so-called “high-level” languages (such as
FORTRAN, BASIC, COBOL, PASCAL, C,
C++, JAVA, SCHEME, Lisp, ADA, etc.)
• These languages deliberately “hide” from
a programmer many details concerning
HOW his problem actually will be solved
by the underlying computing machinery
The BASIC language
• Some languages allow programmers to
forget about the computer completely!
• The language can express a computing
problem with a few words of English, plus
formulas familiar from high-school algebra
• EXAMPLE PROBLEM: Compute 4 plus 5
The example in BASIC
1 LET X = 4
2 LET Y = 5
3 LET Z = X + Y
4 PRINT X, “+”, Y, “=“, Z
5 END
Output: 4 + 5 = 9
The C language
• Other high-level languages do require a small
amount of awareness by the program-author of
how a computation is going to be processed
• For example, that:
- the main program will get “linked” with a
“library” of other special-purpose subroutines
- instructions and data will get placed into
separate sections of the machine’s memory
- variables and constants get treated differently
- data items have specific space requirements
Same example: rewritten in C
#include <stdio.h> // needed for printf()
int x = 4, y = 5; // initialized variables
int z; // unitialized variable
int main()
{
z = x + y;
printf( “%d + %d = %d n”, x, y, z );
}
“ends” versus “means”
• Key point: high-level languages let programmers
focus attention on the problem to be solved, and
not spend effort thinking about details of “how” a
particular piece of electrical machiney is going to
carry out the pieces of a desired computation
• Key benefit: their problem gets solved sooner
(because their program can be written faster)
• Programmers don’t have to know very much
about how a digital computer actually works
computer scientist vs. programmer
• But computer scientists DO want to know
how computers actually work:
-- so we can fix computers if they break
-- so we can employ optimum algorithms
-- so we can predict computer behavior
-- so we can devise faster computers
-- so we can build cheaper computers
-- so we can pick one suited to a problem
A machine’s own language
• For understanding how computers work,
we need familiarity with the computer’s
own language (called “machine language”)
• It’s LOW-LEVEL language (very detailed)
• It is specific to a machine’s “architecture”
• It is a language “spoken” using voltages
• Humans represent it with zeros and ones
Example of machine-language
Here’s what a program-fragment looks like:
10100001 10111100 10010011 00000100
00001000 00000011 00000101 11000000
10010011 00000100 00001000 10100011
11000000 10010100 00000100 00001000
It means: z = x + y;
Incomprehensible?
• Though possible, it is extremely difficult,
tedious (and error-prone) for humans to
read and write “raw” machine-language
• When unavoidable, a special notation can
help (called hexadecimal representation):
A1 BC 93 04 08
03 05 C0 93 04 08
A3 C0 94 04 08
• But still this looks rather meaningless!
Hence assembly language
• There are two key ideas:
-- mnemonic opcodes: we use abbreviations of
English language words to denote operations
-- symbolic addresses: we invent “meaningful”
names for memory storage locations we need
• These make machine-language understandable
to humans – if they know their machine’s design
• Let’s see our example-program, rewritten using
actual “assembly language” for Intel’s Pentium
Simplified Block Diagram
Central
Processing
Unit
Main
Memory
I/O
device
I/O
device
I/O
device
I/O
device
system bus
Pentium’s visible “registers”
• Four general-purpose registers:
eax, ebx, ecx, edx
• Four memory-addressing registers:
esp, ebp, esi, edi
• Six memory-segment registers:
cs, ds, es, fs, gs, ss
• An instruction-pointer and a flags register:
eip, eflags
The sixteen x86 registers
EAX ESP
EBX EBP
ECX ESI
EDX EDI
EIP EFLAGS
CS DS ES FS GS SS
Intel Pentium processor
The “Fetch-Execute” Cycle
ESP
EIP
Program
Instructions
(TEXT)
Program
Variables
(DATA)
Temporary
Storage
(STACK)
main memory
central processor
EAX
EAX
EAX
EAX
the system bus
Define symbolic constants
.equ device_id, 1
.equ sys_write, 4
.equ sys_exit, 1
our program’s ‘data’ section
.section .data
x: .int 4
y: .int 5
fmt: .asciz “%d + %d = %d n”
Our program’s ‘bss’ section
.section .bss
z: .int 0
n: .int 0
buf: .space 80
our program’s ‘text’ section
.section .text
_start:
# comment: assign z = x + y
movl x, %eax
addl y, %eax
movl %eax, z
‘text’ section (continued)
# comment: prepare program’s output
pushl z # arg 5
pushl y # arg 4
pushl x # arg 3
pushl $fmt # arg 2
pushl $buf # arg 1
call sprintf # function-call
addl $20, %esp # discard the args
movl %eax, n # save return-value
‘text’ section (continued)
# comment: request kernel assistance
movl $sys_write, %eax
movl $device_id, %ebx
movl $buf, %ecx
movl n, %edx
int $0x80
‘text’ section (concluded)
# comment: request kernel assistance
movl $sys_exit, %eax
movl $0, %ebx
int $0x80
# comment: make label visible to linker
.global _start
.end
program translation steps
program
source
module
demo.s
program
object
module
assembly
demo.o
the
executable
program
object module library
object module library
other object modules
linking
demo
The GNU Assembler and Linker
• With Linux you get free software tools for
compiling your own computer programs
• An assembler (named ‘as’): it translates
assembly language (called the ‘source code’)
into machine language (called the ‘object code’)
$ as demo.s -o demo.o
• A linker (named ‘ld’): it combines ‘object’ files
with function libraries (if you know which ones)
How a program is loaded
stack
.text
.data
.bss
Runtime libraries
Kernel’s code and data
program instructions
initialized variables
uninitialized variables
Main memory
0x00000000
0xFFFFFFFF
What must programmer know?
• Needed to use CPU register-names (eax)
• Needed to know space requirements (int)
• Needed to know how stack works (pushl)
• Needed to make symbol global (for linker)
• Needed to understand how to quit (exit)
• And of course how to use system tools:
(e.g., text-editor, assembler, and linker)
Summary
• High-level programming (offers easy and
speedy real-world problem-solving)
• Low-level programming (offers knowledge
and power in utilizing machine capabilities)
• High-level language hides lots of details
• Low-level language reveals the workings
• High-level programs: readily ‘portable’
• Low-level programs: tied to specific CPU
In-class exercise #1
• Download the source-file for ‘demo1’, and
compile it using the GNU C compiler ‘gcc’:
$ gcc demo1.c -o demo1
Website: http://cs.usfca.edu/~cruse/cs210/
• Execute this compiled applocation using:
$ ./demo1
In-class exercise #2
• Download the two source-files needed for our
‘demo2’ application (i.e., ‘demo2.s’ and
‘sprintf.s’), and assemble them using:
$ as demo2.s -o demo2.o
$ as sprintf.s -o sprintf.o
• Link them using:
$ ld demo2.o sprintf.o -o demo2
• And execute this application using: $ ./demo2
In-class exercise #3
• Use your favorite text-editor (e.g., ‘vi’) to
modify the ‘demo2.s’ source-file, by using
different initialization-values for x and y
• Reassemble your modified ‘demo2.s’ file,
and re-link it with the ‘sprintf.o’ object-file
• Run the modified ‘demo2’ application, and
see if it prints out a result that is correct
In-class exercise #4
• Download the ‘ljpages.cpp’ system-utility
from our class website and compile it:
$ g++ ljpages.cpp –o ljpages
• Execute this utility-program to print your
modified assembly language source-file:
$ ./ljpages demo2.s
• Write your name on the printed hardcopy
and turn it in to your course instructor
Summary of the exercises
Download and compile a high-level program
Assemble and Link a low-level program
Edit and recompile an assembly program
Print out and turn in your hardcopy

lessssssssssssssssssssssssssson01 (1).ppt

  • 1.
    What is AssemblyLanguage? Introduction to the GNU/Linux assembler and linker for Intel Pentium processors
  • 2.
    High-Level Language • Mostprogramming nowdays is done using so-called “high-level” languages (such as FORTRAN, BASIC, COBOL, PASCAL, C, C++, JAVA, SCHEME, Lisp, ADA, etc.) • These languages deliberately “hide” from a programmer many details concerning HOW his problem actually will be solved by the underlying computing machinery
  • 3.
    The BASIC language •Some languages allow programmers to forget about the computer completely! • The language can express a computing problem with a few words of English, plus formulas familiar from high-school algebra • EXAMPLE PROBLEM: Compute 4 plus 5
  • 4.
    The example inBASIC 1 LET X = 4 2 LET Y = 5 3 LET Z = X + Y 4 PRINT X, “+”, Y, “=“, Z 5 END Output: 4 + 5 = 9
  • 5.
    The C language •Other high-level languages do require a small amount of awareness by the program-author of how a computation is going to be processed • For example, that: - the main program will get “linked” with a “library” of other special-purpose subroutines - instructions and data will get placed into separate sections of the machine’s memory - variables and constants get treated differently - data items have specific space requirements
  • 6.
    Same example: rewrittenin C #include <stdio.h> // needed for printf() int x = 4, y = 5; // initialized variables int z; // unitialized variable int main() { z = x + y; printf( “%d + %d = %d n”, x, y, z ); }
  • 7.
    “ends” versus “means” •Key point: high-level languages let programmers focus attention on the problem to be solved, and not spend effort thinking about details of “how” a particular piece of electrical machiney is going to carry out the pieces of a desired computation • Key benefit: their problem gets solved sooner (because their program can be written faster) • Programmers don’t have to know very much about how a digital computer actually works
  • 8.
    computer scientist vs.programmer • But computer scientists DO want to know how computers actually work: -- so we can fix computers if they break -- so we can employ optimum algorithms -- so we can predict computer behavior -- so we can devise faster computers -- so we can build cheaper computers -- so we can pick one suited to a problem
  • 9.
    A machine’s ownlanguage • For understanding how computers work, we need familiarity with the computer’s own language (called “machine language”) • It’s LOW-LEVEL language (very detailed) • It is specific to a machine’s “architecture” • It is a language “spoken” using voltages • Humans represent it with zeros and ones
  • 10.
    Example of machine-language Here’swhat a program-fragment looks like: 10100001 10111100 10010011 00000100 00001000 00000011 00000101 11000000 10010011 00000100 00001000 10100011 11000000 10010100 00000100 00001000 It means: z = x + y;
  • 11.
    Incomprehensible? • Though possible,it is extremely difficult, tedious (and error-prone) for humans to read and write “raw” machine-language • When unavoidable, a special notation can help (called hexadecimal representation): A1 BC 93 04 08 03 05 C0 93 04 08 A3 C0 94 04 08 • But still this looks rather meaningless!
  • 12.
    Hence assembly language •There are two key ideas: -- mnemonic opcodes: we use abbreviations of English language words to denote operations -- symbolic addresses: we invent “meaningful” names for memory storage locations we need • These make machine-language understandable to humans – if they know their machine’s design • Let’s see our example-program, rewritten using actual “assembly language” for Intel’s Pentium
  • 13.
  • 14.
    Pentium’s visible “registers” •Four general-purpose registers: eax, ebx, ecx, edx • Four memory-addressing registers: esp, ebp, esi, edi • Six memory-segment registers: cs, ds, es, fs, gs, ss • An instruction-pointer and a flags register: eip, eflags
  • 15.
    The sixteen x86registers EAX ESP EBX EBP ECX ESI EDX EDI EIP EFLAGS CS DS ES FS GS SS Intel Pentium processor
  • 16.
  • 17.
    Define symbolic constants .equdevice_id, 1 .equ sys_write, 4 .equ sys_exit, 1
  • 18.
    our program’s ‘data’section .section .data x: .int 4 y: .int 5 fmt: .asciz “%d + %d = %d n”
  • 19.
    Our program’s ‘bss’section .section .bss z: .int 0 n: .int 0 buf: .space 80
  • 20.
    our program’s ‘text’section .section .text _start: # comment: assign z = x + y movl x, %eax addl y, %eax movl %eax, z
  • 21.
    ‘text’ section (continued) #comment: prepare program’s output pushl z # arg 5 pushl y # arg 4 pushl x # arg 3 pushl $fmt # arg 2 pushl $buf # arg 1 call sprintf # function-call addl $20, %esp # discard the args movl %eax, n # save return-value
  • 22.
    ‘text’ section (continued) #comment: request kernel assistance movl $sys_write, %eax movl $device_id, %ebx movl $buf, %ecx movl n, %edx int $0x80
  • 23.
    ‘text’ section (concluded) #comment: request kernel assistance movl $sys_exit, %eax movl $0, %ebx int $0x80 # comment: make label visible to linker .global _start .end
  • 24.
  • 25.
    The GNU Assemblerand Linker • With Linux you get free software tools for compiling your own computer programs • An assembler (named ‘as’): it translates assembly language (called the ‘source code’) into machine language (called the ‘object code’) $ as demo.s -o demo.o • A linker (named ‘ld’): it combines ‘object’ files with function libraries (if you know which ones)
  • 26.
    How a programis loaded stack .text .data .bss Runtime libraries Kernel’s code and data program instructions initialized variables uninitialized variables Main memory 0x00000000 0xFFFFFFFF
  • 27.
    What must programmerknow? • Needed to use CPU register-names (eax) • Needed to know space requirements (int) • Needed to know how stack works (pushl) • Needed to make symbol global (for linker) • Needed to understand how to quit (exit) • And of course how to use system tools: (e.g., text-editor, assembler, and linker)
  • 28.
    Summary • High-level programming(offers easy and speedy real-world problem-solving) • Low-level programming (offers knowledge and power in utilizing machine capabilities) • High-level language hides lots of details • Low-level language reveals the workings • High-level programs: readily ‘portable’ • Low-level programs: tied to specific CPU
  • 29.
    In-class exercise #1 •Download the source-file for ‘demo1’, and compile it using the GNU C compiler ‘gcc’: $ gcc demo1.c -o demo1 Website: http://cs.usfca.edu/~cruse/cs210/ • Execute this compiled applocation using: $ ./demo1
  • 30.
    In-class exercise #2 •Download the two source-files needed for our ‘demo2’ application (i.e., ‘demo2.s’ and ‘sprintf.s’), and assemble them using: $ as demo2.s -o demo2.o $ as sprintf.s -o sprintf.o • Link them using: $ ld demo2.o sprintf.o -o demo2 • And execute this application using: $ ./demo2
  • 31.
    In-class exercise #3 •Use your favorite text-editor (e.g., ‘vi’) to modify the ‘demo2.s’ source-file, by using different initialization-values for x and y • Reassemble your modified ‘demo2.s’ file, and re-link it with the ‘sprintf.o’ object-file • Run the modified ‘demo2’ application, and see if it prints out a result that is correct
  • 32.
    In-class exercise #4 •Download the ‘ljpages.cpp’ system-utility from our class website and compile it: $ g++ ljpages.cpp –o ljpages • Execute this utility-program to print your modified assembly language source-file: $ ./ljpages demo2.s • Write your name on the printed hardcopy and turn it in to your course instructor
  • 33.
    Summary of theexercises Download and compile a high-level program Assemble and Link a low-level program Edit and recompile an assembly program Print out and turn in your hardcopy