‘C’ for Microcontrollers,
  Just Being Efficient
     Lloyd Moore, President


      Lloyd@CyberData-Robotics.com
       www.CyberData-Robotics.com




       Seattle Robotics Society 9/15/2012
Agenda

  MicrocontrollerResources
  Knowing Your Environment
  Memory Usage
  Code Structure
  Optimization
  Summary
Disclaimer
  Some   microcontroller techniques necessarily
   need to trade one benefit for another –
   typically lower resource usage for
   maintainability
  Point of this presentation is to point out various
   techniques that can be used as needed
  Use these suggestions when necessary
  Feel free to suggest better solutions as we go
   along
Microcontroller Resources
  EVERYTHING    resides on one die inside one
   package: RAM, Flash, Processor, I/O
  Cost is a MAJOR design consideration
      Typical costs are $0.25 to $25 each (1000’s)
  RAM:  16 BYTES to 256K Bytes typical
  Flash/ROM: 384 BYTES to 1M Byte
  Clock Speed: 4MHz to 175MHz typical
      Much lower for battery saving modes (32KHz)
  Busis 8, 16, or 32 bits wide
  Have dedicated peripherals (MAC, Phys, etc)
Power Consumption
  Microcontrollers
                  typically used in battery
   operated devices
  Power requirements can be
   EXTREMELY tight
    Energy  harvesting applications
    Long term battery installations (remote
     controls, hard to reach devices, etc.)
  EVERY  instruction executed consumes
  power, even if you have the time and
  memory!
Know Your Environment
  Traditionallywe ignore hardware details
  Need to tailor code to hardware
   available
    Specialized   hardware MUCH more efficient
  Compilers     typically have extensions
    Interrupt – specifies code as being ISR
    Memory model – may handle banked
     memory and/or simultaneous access banks
    Multiple data pointers / address generators

  Debugger      may use some resources
Memory Usage
    Put constant data into program memory (Flash/ROM)
    Alignment / padding issues
        Typically NOT an issue, non-aligned access ok
    Avoid dynamic memory allocation, even if available
        Take extra space and processing time
        Memory fragmentation a big issue
    Use and reuse static buffers
        Reduces variable passing overhead
        Allows for smaller / faster code due to reduced indirections
        Does bring back over write bugs if not done carefully
        More reliable for mission critical systems
    Use the appropriate variable type
        Don’t use int and double for everything!!
        Affects processing time as well as storage
C99 Datatypes – inttypes.h

  int8_t,int16_t, int32_t, int64_t
  uint8_t, uint16_t, uint32_t, uint_64_t


  Avoids   the ambiguity of int and uint
   when moving code between processors
   of different native size
  Makes code more portable and
   upgradable over time
Char vs. Int Increment on 8051
            char cX;                          int iX;
            cX++;                             iX++;
                                   0000   900000        MOV     DPTR,#iX
 000A   900000   MOV    DPTR,#cX   0003   E4            CLR     A
 000D   E0       MOVX   A,@DPTR    0004   75F001        MOV     B,#01H
 000E   04       INC    A          0007   120000        LCALL   ?C?IILDX
 000F   F0       MOVX   @DPTR,A



                                      10 Bytes of Flash +
 6  Bytes of Flash                    subroutine overhead
  4 Instruction cycles               Many more than 4
                                       instruction cycles with a
                                       LCALL
Code Structure
  Count    down instead of up
     Saves a subtraction on all processors
     Decrement-jump-not-zero style instruction on some
      processors
  Pointers    vs. array notation
       Generally better using pointers
  Bit   Shifting
     May not always generate what you think
     May or may not have barrel shifter hardware

     May or may not have logical vs. arithmetic shifts
Shifting Example on 8051
  cX = cX << 3;                                   cA = 3;
                                                  cX = cX << cA;
   0006   33         RLC        A
                                           000B   900000   MOV      DPTR,#cA
   0007   33         RLC        A
                                           000E   E0       MOVX     A,@DPTR
   0008   33         RLC        A
                                           000F   FE       MOV      R6,A
   0009   54F8       ANL        A,#0F8H
                                           0010   EF       MOV      A,R7
                                           0011   A806     MOV      R0,AR6
                                           0013   08       INC      R0
                                           0014   8002     SJMP     ?C0005
                                           0016            ?C0004:
      Constants turn into seperate        0016   C3       CLR      C
       statements                          0017   33       RLC      A
      Variables turn into loops           0018            ?C0005
                                           0018   D8FC     DJNZ    R0,?C0004
      Both of these can be one
       instruction with a barrel shifter
Indexed Array vs Pointer on M8C
 ucMode = g_Channels[uc_Channel].ucMode;   ucMode = pChannel->ucMode;

  01DC   52FC     mov A,[X-4]               01ED   5201        mov   A,[X+1]
  01DE   5300     mov [__r1],A              01EF   5300        mov   [__r1],A
  01E0   5000     mov A,0
                                            01F1   3E00        mvi   A,[__r1]
  01E2   08       push A
  01E3   5100     mov A,[__r1]              01F3   5405        mov   [X+5],A
  01E5   08       push A
  01E6   5000     mov A,0                   Does the same thing
  01E8   08       push A                    Saves 29 bytes of memory AND a
  01E9   5007     mov A,7
  01EB   08       push A
                                             call to a 16 bit multiplication routine!
  01EC   7C0000   xcall __mul16             Pointer version will be at least 4x
  01EF   38FC     add SP,-4                  faster to execute as well, maybe 10x
  01F1   5F0000   mov [__r1],[__rX]         Most compilers not this bad – but you
  01F4   5F0000   mov [__r0],[__rY]          do find some!
  01F7   060000   add[__r1],<_g_Channels
  01FA   0E0000   adc[__r0],>_g_Channels
  01FD   3E00     mvi A,[__r1]
  01FF   5403     mov [X+3],A
More Code Structure
    Actual parameters typically passed in registers if
     available
        Keep function parameters to less than 3
        May also be passed on stack or special parameter area
        May be more efficient to pass pointer to struct
    Global variables
        While generally frowned upon for most code can be very
         helpful here
        Typically ends up being a direct access
    Read assembly code for critical areas
    Know which optimizations are present
        Small compilers do not always have common optimizations
        Inline, loop unrolling, loop invariant, pointer conversion
Switch Statement Implementation

    Switch statements can be implemented in various
     ways
        Sequential compares
        In line table look up for case block
        Special function with look up table
    Specific implementation can also vary based case
     clauses
        Clean sequence (1, 2, 3, 4, 5)
        Gaps in sequence (1, 10, 30, 255)
        Ordering of sequence (5, 4, 1, 2, 3)
    Knowing which method gets implemented is critical to
     optimizing!
Switch Statement Example
 switch(cA)            0006   900000          MOV        DPTR,#cA
 {                     0009   E0              MOVX       A,@DPTR
                       000A   FF              MOV        R7,A
     case 0:
                       000B   EF              MOV        A,R7
            cX = 4;    000C   120000          LCALL      ?C?CCASE
            break;     000F   0000            DW         ?C0003
     case 1:           0011   00              DB         00H
            cX = 10;   0012   0000            DW         ?C0002
            break;     0014   01              DB         01H
     case 2:           0015   0000            DW         ?C0004
            cX = 30;   0017   02              DB         02H
            break;     0018   0000            DW         00H
     default:          001A   0000            DW         ?C0005
            cX = 0;
            break;     001C             ?C0002:
 }                     001C   900000        MOV          DPTR,#cX
                       001F   7404          MOV          A,#04H
                       0021   F0            MOVX         @DPTR,A
                       0022   8015          SJMP         ?C0006

                       ...More blocks follow for each case
Optimization Process
  Step  0 – Before coding anything, think about
   risk points and prototype unknowns!!!
      Use available dedicated hardware
  Step   1 – Get it working!!
    Fast but wrong is of no use to anyone
    Optimization will typically reduce readability

  Step   2 – Profile to know where to optimize
    Usually only one or two routines are critical
    You need to have specific performance metrics to
     target
Optimization Process
  Step 3 – Let the tools do as much as
   they can
    Turn off debugging!
    Select the correct memory model
    Select the correct optimization level

  Step   4 – Do it manually
    Read  the generated code! Might be able to
     make a simple code or structure change.
    Last – think about assembly coding
Summary

  Microcontrollers  are a resource
   constrained environment
  Be familiar with the hardware in your
   microcontroller
  Be familiar with your compiler options
   and how it translates your code
  For time or space critical code look at
   the assembly listing from time to time
Questions?

C for Microcontrollers

  • 1.
    ‘C’ for Microcontrollers, Just Being Efficient Lloyd Moore, President Lloyd@CyberData-Robotics.com www.CyberData-Robotics.com Seattle Robotics Society 9/15/2012
  • 2.
    Agenda  MicrocontrollerResources  Knowing Your Environment  Memory Usage  Code Structure  Optimization  Summary
  • 3.
    Disclaimer  Some microcontroller techniques necessarily need to trade one benefit for another – typically lower resource usage for maintainability  Point of this presentation is to point out various techniques that can be used as needed  Use these suggestions when necessary  Feel free to suggest better solutions as we go along
  • 4.
    Microcontroller Resources EVERYTHING resides on one die inside one package: RAM, Flash, Processor, I/O  Cost is a MAJOR design consideration  Typical costs are $0.25 to $25 each (1000’s)  RAM: 16 BYTES to 256K Bytes typical  Flash/ROM: 384 BYTES to 1M Byte  Clock Speed: 4MHz to 175MHz typical  Much lower for battery saving modes (32KHz)  Busis 8, 16, or 32 bits wide  Have dedicated peripherals (MAC, Phys, etc)
  • 5.
    Power Consumption Microcontrollers typically used in battery operated devices  Power requirements can be EXTREMELY tight  Energy harvesting applications  Long term battery installations (remote controls, hard to reach devices, etc.)  EVERY instruction executed consumes power, even if you have the time and memory!
  • 6.
    Know Your Environment  Traditionallywe ignore hardware details  Need to tailor code to hardware available  Specialized hardware MUCH more efficient  Compilers typically have extensions  Interrupt – specifies code as being ISR  Memory model – may handle banked memory and/or simultaneous access banks  Multiple data pointers / address generators  Debugger may use some resources
  • 7.
    Memory Usage  Put constant data into program memory (Flash/ROM)  Alignment / padding issues  Typically NOT an issue, non-aligned access ok  Avoid dynamic memory allocation, even if available  Take extra space and processing time  Memory fragmentation a big issue  Use and reuse static buffers  Reduces variable passing overhead  Allows for smaller / faster code due to reduced indirections  Does bring back over write bugs if not done carefully  More reliable for mission critical systems  Use the appropriate variable type  Don’t use int and double for everything!!  Affects processing time as well as storage
  • 8.
    C99 Datatypes –inttypes.h  int8_t,int16_t, int32_t, int64_t  uint8_t, uint16_t, uint32_t, uint_64_t  Avoids the ambiguity of int and uint when moving code between processors of different native size  Makes code more portable and upgradable over time
  • 9.
    Char vs. IntIncrement on 8051 char cX; int iX; cX++; iX++; 0000 900000 MOV DPTR,#iX 000A 900000 MOV DPTR,#cX 0003 E4 CLR A 000D E0 MOVX A,@DPTR 0004 75F001 MOV B,#01H 000E 04 INC A 0007 120000 LCALL ?C?IILDX 000F F0 MOVX @DPTR,A  10 Bytes of Flash + 6 Bytes of Flash subroutine overhead  4 Instruction cycles  Many more than 4 instruction cycles with a LCALL
  • 10.
    Code Structure Count down instead of up  Saves a subtraction on all processors  Decrement-jump-not-zero style instruction on some processors  Pointers vs. array notation  Generally better using pointers  Bit Shifting  May not always generate what you think  May or may not have barrel shifter hardware  May or may not have logical vs. arithmetic shifts
  • 11.
    Shifting Example on8051 cX = cX << 3; cA = 3; cX = cX << cA; 0006 33 RLC A 000B 900000 MOV DPTR,#cA 0007 33 RLC A 000E E0 MOVX A,@DPTR 0008 33 RLC A 000F FE MOV R6,A 0009 54F8 ANL A,#0F8H 0010 EF MOV A,R7 0011 A806 MOV R0,AR6 0013 08 INC R0 0014 8002 SJMP ?C0005 0016 ?C0004:  Constants turn into seperate 0016 C3 CLR C statements 0017 33 RLC A  Variables turn into loops 0018 ?C0005 0018 D8FC DJNZ R0,?C0004  Both of these can be one instruction with a barrel shifter
  • 12.
    Indexed Array vsPointer on M8C ucMode = g_Channels[uc_Channel].ucMode; ucMode = pChannel->ucMode; 01DC 52FC mov A,[X-4] 01ED 5201 mov A,[X+1] 01DE 5300 mov [__r1],A 01EF 5300 mov [__r1],A 01E0 5000 mov A,0 01F1 3E00 mvi A,[__r1] 01E2 08 push A 01E3 5100 mov A,[__r1] 01F3 5405 mov [X+5],A 01E5 08 push A 01E6 5000 mov A,0  Does the same thing 01E8 08 push A  Saves 29 bytes of memory AND a 01E9 5007 mov A,7 01EB 08 push A call to a 16 bit multiplication routine! 01EC 7C0000 xcall __mul16  Pointer version will be at least 4x 01EF 38FC add SP,-4 faster to execute as well, maybe 10x 01F1 5F0000 mov [__r1],[__rX]  Most compilers not this bad – but you 01F4 5F0000 mov [__r0],[__rY] do find some! 01F7 060000 add[__r1],<_g_Channels 01FA 0E0000 adc[__r0],>_g_Channels 01FD 3E00 mvi A,[__r1] 01FF 5403 mov [X+3],A
  • 13.
    More Code Structure  Actual parameters typically passed in registers if available  Keep function parameters to less than 3  May also be passed on stack or special parameter area  May be more efficient to pass pointer to struct  Global variables  While generally frowned upon for most code can be very helpful here  Typically ends up being a direct access  Read assembly code for critical areas  Know which optimizations are present  Small compilers do not always have common optimizations  Inline, loop unrolling, loop invariant, pointer conversion
  • 14.
    Switch Statement Implementation  Switch statements can be implemented in various ways  Sequential compares  In line table look up for case block  Special function with look up table  Specific implementation can also vary based case clauses  Clean sequence (1, 2, 3, 4, 5)  Gaps in sequence (1, 10, 30, 255)  Ordering of sequence (5, 4, 1, 2, 3)  Knowing which method gets implemented is critical to optimizing!
  • 15.
    Switch Statement Example switch(cA) 0006 900000 MOV DPTR,#cA { 0009 E0 MOVX A,@DPTR 000A FF MOV R7,A case 0: 000B EF MOV A,R7 cX = 4; 000C 120000 LCALL ?C?CCASE break; 000F 0000 DW ?C0003 case 1: 0011 00 DB 00H cX = 10; 0012 0000 DW ?C0002 break; 0014 01 DB 01H case 2: 0015 0000 DW ?C0004 cX = 30; 0017 02 DB 02H break; 0018 0000 DW 00H default: 001A 0000 DW ?C0005 cX = 0; break; 001C ?C0002: } 001C 900000 MOV DPTR,#cX 001F 7404 MOV A,#04H 0021 F0 MOVX @DPTR,A 0022 8015 SJMP ?C0006 ...More blocks follow for each case
  • 16.
    Optimization Process Step 0 – Before coding anything, think about risk points and prototype unknowns!!!  Use available dedicated hardware  Step 1 – Get it working!!  Fast but wrong is of no use to anyone  Optimization will typically reduce readability  Step 2 – Profile to know where to optimize  Usually only one or two routines are critical  You need to have specific performance metrics to target
  • 17.
    Optimization Process Step 3 – Let the tools do as much as they can  Turn off debugging!  Select the correct memory model  Select the correct optimization level  Step 4 – Do it manually  Read the generated code! Might be able to make a simple code or structure change.  Last – think about assembly coding
  • 18.
    Summary  Microcontrollers are a resource constrained environment  Be familiar with the hardware in your microcontroller  Be familiar with your compiler options and how it translates your code  For time or space critical code look at the assembly listing from time to time
  • 19.