Saturday, December 7, 2013

Shrinkify Arduino using Cheap ATtiny13A Microcontroller

You can shrinkify your simple Arduino project into ultra tiny ATtiny13A microcontroller as long as your project code size doesn't exceed 1 Kb limit of ATtiny13A, and it doesn't use RAM / EEPROM over 64 bytes . Why would you do that? Because two important reason: it's cheap (vcc2gnd.com sell this microcontroller for as low as $1.5 / Rp17.000,- for Indonesian customer, even tinier and cheaper for SOIC version), and it certainly has much smaller form than your ordinary Arduino, even compared with Arduino smallest form (Arduino Micro).

If you need more power (bigger program code space, bigger RAM), you should consider using bigger version of ATtiny such ATtiny2313, but that's another story. For now let's focus with ATtiny13A (note that "A" suffix means newer version of ATtiny13 series, older version has no such suffix). Also note that for serious design / commercial product, you really should consider using newer series such ATtiny45 or ATtiny85 (same form factor with bunch of new capabilities).

To program any AVR tiny microcontrollers you can use SPI-based programmer such usbASP, or use your existing SPI-enabled board such Arduino (using ArduinoISP sketch, see following picture to see how simple it is to connect ATtiny13A to your Arduino)...


Friday, December 6, 2013

Character LCD I2C Library for Arduino

With I²C Display Module, you can easily connect character LCD (Liquid Crystal Display) to your Arduino via I²C protocol, thus saving a lot of valuable pins usually used for parallel connection (at least 6 pins: 2 control pins - RS and EN - and 4 data pins D7, D6, D5, and D4 for 4-bit mode). With I²C (a.k.a. TWI /  Two Wires Interface), you need only two pins. Even better, those two pins can also be shared with other I2C-based peripherals.

Note: in Arduino Uno, SDA is pin A4, SCL is pin A5. For other models, please check corresponding pin diagram.

The only drawback of using these modules is (beside a little additional cost for purchasing I²C display module) is speed reduction, but it's negligible since you don't need to update lot of data at high speed with such character-based display device (might be different story with graphics LCD, they do need to fetch a lot of bitmap data).

You might want to use new LiquidCrystal_I2C library, please download most recent version (v1.2.1) in zipped file (485 Kb), generously contributed by F. Malpartida. Extract it to your Arduino working folder under library sub-folder (i.e., My Document\Arduino\libraries).

There are several Character LCD I²C modules on the market, you should use correct initialization code which might be slightly different for each device.

Include required library at beginning of your sketch as follow:
#include 'Wire.h'
#include 'LiquidCrystal_I2C.h'

Next step is to instantiate the LCD object by calling LiquidCrystal_I²C class constructor. This constructor accepts parameter in following order: addr, en, rw, rs, d4, d5, d6, d7, bl, blpol
  • addr is I²C address of the module. It's unique for each device, check with your I²C module supplier to get the correct address. Usually, they are set to 0x20, 0x27, or 0x38
  • en is bit index for Enable (EN) pin.
  • rw is bit index for Read/Write Selector (RW) pin.
  • rs is bit index for Register Selector (RS) pin.
  • d4d5d6d7 are bit indexes for upper 4-bit of data pins
  • bl is bit index for backlight pin.
  • blpol polarity of backlight pin, might be differs according to the LCD being used. Value is either POSITIVE or NEGATIVE (enum declared in LCD.h).
The first argument (addr) is mandatory, that means you have to manually specify the address of your I²C device. Other arguments are optional, if not specified they will be set to default values (bit#6 for en, bit#5 for rw, bit#4 for rs, bit#0 for d4, bit#1 for d5, bit#2 for d6, bit#3 for d7). If bl is omitted, backlight state won't be modified (default to value set by the I²C module). Default value for blpol is POSITIVE.

Those bit indexes are required for the library to send correct signal to appropriate pins since different I²C modules has different pin mapping.

Some example of initialization codes (try them to find one that suitable for your device), instantiated on object lcd:
  • LiquidCrystal_I2C lcd (0x20,2,1,0,4,5,6,7,3,POSITIVE);
  • LiquidCrystal_I2C lcd (0x27,2,1,0,4,5,6,7,3,POSITIVE);
  • LiquidCrystal_I2C lcd (0x20,4,5,6,0,1,2,3,7,NEGATIVE);
Finally, initialize the LCD by invoking begin() method which accepts two arguments: column count and row count. Example: for 16x2 display, invoke lcd.begin(16,2); For 20x4 display, invoke lcd.begin(20,4);

I2C LCD Display 20x4 Sample Sketch

#include <Wire.h>
#include <LiquidCrystal_I2C.h>

// Instantiate lcd object
LiquidCrystal_I2C lcd( 0x20, 4, 5, 6, 0, 1, 2, 3, 7, NEGATIVE );

// Custom character patterns
const uint8_t charBitmap[][8] = {
   { 0xc, 0x12, 0x12, 0xc, 0, 0, 0, 0 },
   { 0x6, 0x9, 0x9, 0x6, 0, 0, 0, 0 },
   { 0x0, 0x6, 0x9, 0x9, 0x6, 0, 0, 0x0 },
   { 0x0, 0xc, 0x12, 0x12, 0xc, 0, 0, 0x0 },
   { 0x0, 0x0, 0xc, 0x12, 0x12, 0xc, 0, 0x0 },
   { 0x0, 0x0, 0x6, 0x9, 0x9, 0x6, 0, 0x0 },
   { 0x0, 0x0, 0x0, 0x6, 0x9, 0x9, 0x6, 0x0 },
   { 0x0, 0x0, 0x0, 0xc, 0x12, 0x12, 0xc, 0x0 }
};

void setup() {
  // initialize the lcd
  lcd.begin( 20, 4 ); 
  
  // create custom chars
  for( uint8_t i = 0; i < 8; i++ ) {
    lcd.createChar ( i, (uint8_t *)charBitmap[ i ] );
  }

  lcd.setCursor( 3, 2 );                  
  lcd.print( "LCD 20x4 DEMO" );
  lcd.setCursor( 0, 3 );
  lcd.print( "azTech @ vcc2gnd.com" );

}

void loop() {
   register uint8_t i;
   // Animate
   lcd.home();   
   for( i = 20; i--; ) lcd.write( random( 8 ) );
   lcd.setCursor( 0, 1 );
   for( i = 20; i--; ) lcd.write( random( 8 ) );   
   delay( 200 );
}
Watch the result on demo video below...

Wednesday, December 4, 2013

Prevent GCC from auto-inline a function

Sometimes GCC goes too "smart" by inlining a function (put entire block of it's code directly in place) rather than assembling it as normal subroutine (block of code invoked by RCALL/CALL instruction and returns with RET instruction).

You can tell GCC not to automatically inline a specific function by declaring noinline attribute modifier before function type declaration. For example:
__attribute__((noinline)) int myStrictFunction() { }

Thursday, November 28, 2013

Online Tool for Easily Write Inline Assembler from Arduino IDE

Inline assembler is practical optimizing technique to speed-up some time-sensitive routines in your program. Unfortunately, it's not as convenience as it should be, for example:
  • You have to enclose asm instructions within quotes
  • You have to manually add hard-coded line separator (and tabs, for making it readable for later debugging)
  • You have to manually describe what register used for input/output, plus you have to report what registers are being clobbered
To help you assembler-inlining, we create this simple tool for you. No software to download / install, it's running right in this blog using embedded javascript. Just type your code below and hit the  Generate Code  button below, an inline assembler code suitable for avr-gcc will be automatically generated.
Enter your plain assembler code below...
Click to generate inline assembler code...

Make Your Arduino Uno Runs 25% Faster

Your Arduino Uno comes with 16 MHz crystal, thus it runs at nearly 16 MIPS (since most of instructions are executed in single cycle). As you might know, Arduino Uno utilize ATmega328 MCU from Atmel. One obvious fact that strangely most of ordinary Arduino users don't know is that the MCU's top speed is actually rated at 20 MHz, not 16 MHz! 16 MHz is official speed limit of MCU used in early version of Arduino, ATmega8 (up to Arduino NG and Severino). Carried on with this obsolete limit, Arduino Uno is still clocked with 16 MHz crystal.

To boost your Arduino Uno's performance up to 25% faster, all you have to do is replace the 16 MHz crystal with 20 MHz crystal, and update the bootloader with one that designed for this upgraded speed (see instruction below).

Please note that this is NOT overclocking, we'll just tuning it to maximum speed allowed by manufacturer as stated in ATmega328 datasheet. So it's 100% safe and guaranteed to run as reliable as before, it's just 25% faster :) up to nearly 20 MIPS!

Step 1: Add following content to your boards.txt (located in hardware/arduino sub-directory of the Arduino application directory, i.e. \Program Files (x86)\Arduino\hardware\arduino in default installation path on 64-bit Windows (or \Program Files\Arduino\hardware\arduino if you're still using the immortal WinXP ;)...

##############################################################

atmega328_20.name=Arduino Uno++ 20MHz

atmega328_20.upload.protocol=stk500
atmega328_20.upload.maximum_size=30720
atmega328_20.upload.speed=57600

atmega328_20.bootloader.low_fuses=0xFF
atmega328_20.bootloader.high_fuses=0xDA
atmega328_20.bootloader.extended_fuses=0x05
atmega328_20.bootloader.path=atmega
atmega328_20.bootloader.file=ATmega328_20MHz.hex
atmega328_20.bootloader.unlock_bits=0x3F
atmega328_20.bootloader.lock_bits=0x0F

atmega328_20.build.mcu=atmega328p
atmega328_20.build.f_cpu=20000000L
atmega328_20.build.core=arduino
Step 2: Create new file with name ATmega328_20MHz.hex under hardware/arduino/bootloaders/atmega sub-directory with following content:
Step 3: desolder the old 16 Mhz crystal from Arduino Uno board. Please note that this action may void your warranty, please proceed on your own risk! If you're unsure with this, perhaps it's better to build a brand new Arduino compatible board by your own from scratch. For example, you can buy Playduino-One kit from Play-Zone — they ship worldwide — for Fr. 19.9 (about USD 21). If you live in Indonesia, you can also purchase Playduino-One Kit from azTech for only Rp150.000,- (less than USD 14).
Step 4: Install a 20 MHz crystal. Soldering should be easy since space on bottom surface is sparse.
Step 5: Burn the bootloader: open Arduino IDE, if you've done step #1 correctly then a new board should be appear under Tools > Board menu with name Arduino Uno++ 20MHz. Select the new board, attach USB cable to Arduino (or ISP programmer if you build Playduino One), and execute Tools > Burn Bootloader command. That's all, now you have a much faster Arduino!

Sunday, November 24, 2013

Virtual USB port for AVR Microcontrollers

V-USB (formerly known as AVR-USB) is a software-only implementation of a low-speed USB device for Atmel’s AVR® microcontrollers, making it possible to build USB hardware with almost any AVR® microcontroller, not requiring any additional chip.

V-USB can be licensed freely under the GNU General Public License or alternatively under a commercial license. A comprehensive set of example projects demonstrates the wide range of possible applications.

Features of V-USB:
  • Fully USB 1.1 compliant low-speed device, except handling of communication errors and electrical specifications.
  • Example projects demonstrate device and host driver implementations on Linux, Mac OS X and Windows.
  • Supports multiple endpoints: one control endpoint, two interrupt/bulk-in endpoints and up to 7 interrupt/bulk-out endpoints. (Note that the USB specification forbids bulk endpoints for low speed devices, but V-USB supports them to some degree.)
  • Transfer sizes up to 254 bytes by default, more as configuration option.
  • Comes with freely usable USB identifiers (Vendor-ID and Product-ID pairs).
  • Runs on any AVR microcontroller with at least 2 kB of Flash memory, 128 bytes RAM and a clock rate of at least 12 MHz.
  • No UART, timer, input capture unit or other special hardware is required (except one edge triggered interrupt).
  • Can be clocked with 12 MHz, 15 MHz, 16 MHz or 20 MHz crystal or from a 12.8 MHz or 16.5 MHz internal RC oscillator.
  • High level functionality is written in C and is well commented.
  • Only about 1150 to 1400 bytes code size.
  • Choice of licensing type: Open Source or commercial.
This diagram shows a typical circuit for a bus powered device using Atmel ATtiny2313 MCU...
D1 and D2 are a low cost replacement for a low drop 3.3 V regulator chip, such as the LE33. Operating the AVR at higher voltages exceeds the common mode range of many USB chips. If you need to run the AVR at 5 V, add 3.6 V zener diodes at D+ and D- to limit the voltage.
Download the V-USB package containing a short description and several simple code examples.

Tuesday, November 12, 2013

Arduino UNO USART

Arduino Uno utilizing ATmega328P for it's main controller. Usually we use Hardware Serial library provided by Arduino. It's great library, except for three factors:
  1. It can NOT achieve highest speed actually possible by hardware UART underlying in it's core processor due to overhead in it's library implementation
  2. It's bloated in size, consuming a lot of unoptimized code
  3. It's hide powerful options which can be actually utilized to maximize performance
These conditions are actually derived from necessity (for Arduino broad range of user level) because Arduino library were intended to transparently support broad range of MCUs (resulting in unoptimized code) and to cover unexperienced programmer by rechecking every possible conditions which may lead to program malfunction. Take example the digitalRead example which checks PWM state every time it's being invoked.

Now, since we're trying to boost up Arduino to it's maximum potential, we need to understand the background process, sometimes in it's lowest level. Once we understand them, we can write efficient code which execute faster and consuming less resource than those consumed by the standard library.

The most source of information is certainly from Atmel's own-released datasheet. If you haven't had one, take a time to download it: [ ATmega328P Datasheet Complete Edition ]

USART is described in detail on it's own section, please refer to page 127-156 on the datasheet. For your convenience, some parts are copy-pasted on this article (marked with yellowish background).

USART Features on ATmega328

The Universal Synchronous and Asynchronous serial Receiver and Transmitter (USART) is a highly-flexible serial communication device. The main features are:
  • Full Duplex Operation (Independent Serial Receiveand Transmit Registers)
  • Asynchronous or Synchronous Operation
  • Master or Slave Clocked Synchronous Operation
  • High Resolution Baud Rate Generator
  • Supports Serial Frames with 5, 6, 7, 8, or 9 Databits and 1 or 2 Stop Bits
  • Odd or Even Parity Generation and Parity Check Supported by Hardware
  • Data OverRun Detection
  • Framing Error Detection
  • Noise Filtering Includes False Start Bit Detection and Digital Low Pass Filter
  • Three Separate Interrupts on TX Complete, TX Data Register Empty and RX Complete
  • Multi-processor Communication Mode
  • Double Speed Asynchronous Communication Mode

A simplified block diagram of the USART Transmitter is shown in following figure. CPU accessible I/O
Registers are shown with green boxes, and I/O pins are shown in blue boxes. The USART Data Register UDR is shown in yellow boxes. Please note that altough shown as two boxes in diagram below, there is only one UDR register which functioned as register for holding either received byte  / byte to be transmitted.

ATmega328 USART Block Diagram

On diagram above, you'll notice on the top block is Clock Generator section. This section conduct the whole orchestra of USART operations. It generates the base clock for the Transmitter and Receiver.

There are four modes of clock operations:
  • Normal Asynchronous mode
  • Double Speed Asynchronous mode
  • Master Synchronous mode
  • Slave Synchronous mode
To select Asynchronous mode, clear UMSEL bit in UCSRC (USART Control and Status Register C). In other hand, set UMSEL bit to select Synchronous mode.

To activate Double Speed Asynchronous mode, after clearing the UMSEL bit (to set USART into Async mode), set the U2X bit of UCSRA register. Clearing the U2X bit bring back USART to normal Async mode. For synchronous mode, this bit has no effect and should be cleared.

For Synchronous mode (UMSEL=1), data is clocked in sync with XCK pin (PD.4, pin#6 of ATmega328). In this case, value of DDR_XCK (Data Direction Register for XCK Pin) determine whether the clock source is internal (Master mode, MCU is the one who generate clock signal into XCK pin) or external (Slave mode, MCU is following clock signaled on XCK  pin from other party). The XCK  in is only active when using Synchronous mode. Please note that while in Sync mode, since PD.4  is shared the same physical pin as XCK  (pin#6), it's no longer functioned as General I/O pin.

Except for Synchronous Slave Mode, all other modes requires the MCU to generate clock signal. This is done internally by Clock Generator section (refer back to block diagram above).

As shown in the diagram, register related to this section is UBRR, acronym for USART Bit Rate Register. It's actually a 12-bit register, thus it's divided into two byte: UBRRH register for the high-byte (note that only lower nibble of this register is being used) and UBRRL register for the low-byte.

Following is detailed block diagram of Clock Generator section:

USART Clock Generator Block Diagram

The UBRR and the down-counter connected to it function as a programmable prescaler or baud rate generator. The down-counter, running at system clock (fosc), is loaded with the UBRR value each time the counter has counted down to zero or when the lower byte (UBRRL) Register is written. A clock is generated each time the counter reaches zero. This clock is the baud rate generator clock output, equal to  fosc / (UBRR+1).

The Transmitter divides the baud rate generator clock output by 2, 8, or 16 depending on mode being selected. The baud rate generator output is used directly by the Receiver's clock and data recovery units. However, the recovery units use a state machine that uses 2, 8, or 16 states depending on mode set by the state of the UMSEL, U2X and DDR_XCK bits (examine flow logic shown by arrow lines on the diagram).

Following table constains equations for calculating the baud rate (in bits per second) and for calculating the UBRR value for each mode of operation using an internally generated clock source:



BAUD = Baud rate (in bits per second, bps)
fosc = System Oscillator clock frequency
UBRR = Contents of UBRRH and UBRRL Registers (12 bits, 0-4095)

Thus for fosc = 16 MHz (frequency of oscillator used in Arduino Uno), we can calculate the value of UBRR on Async Double Speed Mode (U2X=1) by formula:
UBRR= ( 16,000,000 / 8*BAUD ) - 1
 = (  2,000,000 /   BAUD ) - 1
According to this equation, following is the value for common baud rates (Error Margin on the third row is rounding error caused by rounding a float number into it's corresponding integer value):

Baud Rate24004800960014.4K19.2K28.8K38.4K57.6K115.2K250K
UBBR Value832416207138103685134167
Error Margin-0.1%0.2%-0.1%0.2%-0.6%0.2%-0.8%2.1%

Note that in Double Speed mode, although for Transmitter it has no down-effect, the Receiver will only use half the number of samples (reduced from 16 to 8) for data sampling and clock recovery (it's the cost we paid for doubling the speed). Thus in this mode a more accurate baud rate setting and system clock are required. Avoid using baud rate with high error margin (such 28.8K, 57.6K, and especially 115.2K with 2.1% error margin). If you need to use 115.2K, better use Normal Async mode (with double speed turned off, U2X=0) and set UBRR value to 8 (although it's error margin will be increased to -3.5%, but with double sample the USART can compensate better and less prone from error).

[ to be continued ]

Monday, October 21, 2013

Fastest digitalRead / digitalWrite Alternative

Arduino's standard digitalRead/digitalWrite is well known for two reasons: it's simplicity / ease to use, and... it's extraordinary slow speed.

Fastest alternative is by using direct port manipulation. For example, alternative to digitalWrite( 13, HIGH ) is PORTB |= (1 << 5). Compiler will translated that code into 2-cycle instruction using sbi opCode. In 16 MHz, it will be executed in about 130 nano-seconds.

However it's not equivalent of digitalWrite. Beside of setting corresponding pin with specified value, digitalWrite also check and turn PWM output off for corresponding pin with PWM capability. If you've previously use PWM on corresponding pin (i.e., by invoking analogWrite function), this method won't work.

Anyway, this condition is rarely encountered. Usually once a pin assigned as PWM-driven pin, it will never reverted back to "normal" (non-PWM-driven) pin. So rather than wasting execution time on invoking unnecessary code, we should take a little bit control and explicitly turn PWM off only if it's really necessary.

To keep it simple and easy to use, we'll use following macros (note that this code only works with ATmega8/168/328-based board such Arduino Uno. Other MCU might have different pin numbering!):

#define portOfPin(P)\
  (((P)>=0&&(P)<8)?&PORTD:(((P)>7&&(P)<14)?&PORTB:&PORTC))
#define ddrOfPin(P)\
  (((P)>=0&&(P)<8)?&DDRD:(((P)>7&&(P)<14)?&DDRB:&DDRC))
#define pinOfPin(P)\
  (((P)>=0&&(P)<8)?&PIND:(((P)>7&&(P)<14)?&PINB:&PINC))
#define pinIndex(P)((uint8_t)(P>13?P-14:P&7))
#define pinMask(P)((uint8_t)(1<<pinIndex(P)))

#define pinAsInput(P) *(ddrOfPin(P))&=~pinMask(P)
#define pinAsInputPullUp(P) *(ddrOfPin(P))&=~pinMask(P);digitalHigh(P)
#define pinAsOutput(P) *(ddrOfPin(P))|=pinMask(P)
#define digitalLow(P) *(portOfPin(P))&=~pinMask(P)
#define digitalHigh(P) *(portOfPin(P))|=pinMask(P)
#define isHigh(P)((*(pinOfPin(P))& pinMask(P))>0)
#define isLow(P)((*(pinOfPin(P))& pinMask(P))==0)
#define digitalState(P)((uint8_t)isHigh(P))

Thus, you can save valuable code space and get dramatically faster execution by changing:
  • pinMode( pin, INPUT ); with pinAsInput( pin );
  • pinMode( pin, OUTPUT ); with pinAsOutput( pin );
  • pinMode( pin, INPUT_PULLUP); with pinAsInputPullUp( pin );
  • digitalWrite( pin, LOW ); with digitalLow( pin );
  • digitalWrite( pin, HIGH ); with digitalHigh( pin );
  • digitalRead( pin ) with digitalState( pin )

Additionally, rather than typing if( digitalState( pin ) == HIGH ) you can type if( isHigh( pin ) ) for clearer code clarity. Also use isLow( pin ) rather than digitalState( pin ) == LOW.

Now let's try it in action. Load the Blink.pde example sketch and try to compile. You'll get 1,084 bytes of compiled code. Now insert our new macros in the beginning of the file, and replace the code according to
changing guide above.

Your source code will be like this (comments removed, newly inserted macros are not shown):

int led = 13;

void setup() {              
  pinAsOutput(led);
}

void loop() {
  digitalHigh( led );
  delay( 1000 );
  digitalLow( led );
  delay( 1000 );
}

After compiling, we'll get size reduction to 956 bytes. Not much fat-loss, eh? Actually you can get much smaller code, by changing the way you define associated led symbol.

First, it's defined as int (with range from -32,768 to 32,767) which taken 2 bytes. A pin number in Arduino Uno is from 0 to 19, so it's a waste to declared it as int (2 bytes). If you really need to put it in variable, you should defined it with byte (uint8_t) type.

Second, and most importantly, since the LED won't changed it's pin attachment on middle of execution, you should define it as constant with const keyword. This way, compiler will evaluate associated macro condition in compile time (instead of making actual run-time code to evaluate variable arguments).

Take hard notice on this issue. Under any circumstances, use variable only if you need to change it's value (variable ⇒ able to vary) throughout execution. Otherwise, always use const (constant ⇒ always the same, never changed).  You'll save a lot of code space and execution time by follow this simple rule.

So, try changing the int led = 13; statement with const byte led = 13 (or simply #define led 13) and recompile you code. Now your slim program only takes 674 bytes, more than 30% size reduction!

How fast is digitalHigh / digitalLow versus digitalWrite in common 16 MHz clockrate? For digitalWrite  it depends on whether specified pin has PWM capabilities or not  (from about 3.6 µs to  4.8 µs). For digitalHigh / digitalLow, it is exactly 130 ns (2 cycles), so it's between 27-37 times faster).