ESP8266 Output GPIO and Optimization

Like many microcontrollers, the ESP8266 has a slew of GPIO pins that can be used to interact with the outside world. Espressif provides a number of ways to interact with these pins, including the popular function gpio_output_set. When we were first getting oriented with the ESP, we utilized demonstration code by Tom Trebisky to learn how the GPIO worked. Tom’s code is probably the most basic scenario, useful for changing the GPIO on a pin-by-pin basis. Tom’s code begins with some arrays that store the Espressif SDK’s internal codes for the GPIO pins and their functions, as well as masking bits for use in his GPIO logic.

//Tom Trebisky's GPIO code
static const int bits[] = { 1, 2, 4, 8, 0x10, 0x20, 0x40, 0x80,
0x100, 0x200, 0x400, 0x800, 0x1000, 0x2000, 0x4000, 0x8000 };

static const int mux[] = {
PERIPHS_IO_MUX_GPIO0_U, /* 0 - D3 */
PERIPHS_IO_MUX_U0TXD_U, /* 1 - uart */
PERIPHS_IO_MUX_GPIO2_U, /* 2 - D4 */
PERIPHS_IO_MUX_U0RXD_U, /* 3 - uart */
PERIPHS_IO_MUX_GPIO4_U, /* 4 - D2 */
PERIPHS_IO_MUX_GPIO5_U, /* 5 - D1 */
Z, /* 6 */
Z, /* 7 */
Z, /* 8 */
PERIPHS_IO_MUX_SD_DATA2_U, /* 9 - D11 (SD2) */
PERIPHS_IO_MUX_SD_DATA3_U, /* 10 - D12 (SD3) */
Z, /* 11 */
PERIPHS_IO_MUX_MTDI_U, /* 12 - D6 */
PERIPHS_IO_MUX_MTCK_U, /* 13 - D7 */
PERIPHS_IO_MUX_MTMS_U, /* 14 - D5 */
PERIPHS_IO_MUX_MTDO_U /* 15 - D8 */
};

static const int func[] = { 0, 3, 0, 3, 0, 0, Z, Z, Z, 3, 3, Z, 3, 3, 3, 3 };

Looking at eagle_soc.h, the SDK’s header file pertaining to the CPU’s hardware registers and associated manipulation macros, we can see that the macros (created with #define) referred to in Tom’s mux[] are references to specific memory locations, and the constants in func[] refer to the register settings required to configure these registers as general-purpose I/O instead of more specific functionality like the ESP8266’s SPI or SD card I/O.

//Espressif's eagle_soc.h

#define PERIPHS_DPORT_BASEADDR 0x3ff00000
#define PERIPHS_GPIO_BASEADDR 0x60000300
#define PERIPHS_TIMER_BASEDDR 0x60000600
#define PERIPHS_RTC_BASEADDR 0x60000700
#define PERIPHS_IO_MUX 0x60000800

...

#define PERIPHS_IO_MUX_FUNC             0x13
#define PERIPHS_IO_MUX_FUNC_S           4

...

#define PERIPHS_IO_MUX_CONF_U (PERIPHS_IO_MUX + 0x00)
#define SPI0_CLK_EQU_SYS_CLK BIT8
#define SPI1_CLK_EQU_SYS_CLK BIT9
#define PERIPHS_IO_MUX_MTDI_U (PERIPHS_IO_MUX + 0x04)
#define FUNC_GPIO12 3
#define PERIPHS_IO_MUX_MTCK_U (PERIPHS_IO_MUX + 0x08)
#define FUNC_GPIO13 3
#define PERIPHS_IO_MUX_MTMS_U (PERIPHS_IO_MUX + 0x0C)
#define FUNC_GPIO14 3
#define PERIPHS_IO_MUX_MTDO_U (PERIPHS_IO_MUX + 0x10)
#define FUNC_GPIO15 3
#define FUNC_U0RTS 4
#define PERIPHS_IO_MUX_U0RXD_U (PERIPHS_IO_MUX + 0x14)
#define FUNC_GPIO3 3
#define PERIPHS_IO_MUX_U0TXD_U (PERIPHS_IO_MUX + 0x18)
#define FUNC_U0TXD 0
#define FUNC_GPIO1 3
#define PERIPHS_IO_MUX_SD_CLK_U (PERIPHS_IO_MUX + 0x1c)
#define FUNC_SDCLK 0
#define FUNC_SPICLK 1
#define PERIPHS_IO_MUX_SD_DATA0_U (PERIPHS_IO_MUX + 0x20)
#define FUNC_SDDATA0 0
#define FUNC_SPIQ 1
#define FUNC_U1TXD 4
#define PERIPHS_IO_MUX_SD_DATA1_U (PERIPHS_IO_MUX + 0x24)
#define FUNC_SDDATA1 0
#define FUNC_SPID 1
#define FUNC_U1RXD 4
#define FUNC_SDDATA1_U1RXD 7
#define PERIPHS_IO_MUX_SD_DATA2_U (PERIPHS_IO_MUX + 0x28)
#define FUNC_SDDATA2 0
#define FUNC_SPIHD 1
#define FUNC_GPIO9 3
#define PERIPHS_IO_MUX_SD_DATA3_U (PERIPHS_IO_MUX + 0x2c)
#define FUNC_SDDATA3 0
#define FUNC_SPIWP 1
#define FUNC_GPIO10 3
#define PERIPHS_IO_MUX_SD_CMD_U (PERIPHS_IO_MUX + 0x30)
#define FUNC_SDCMD 0
#define FUNC_SPICS0 1
#define PERIPHS_IO_MUX_GPIO0_U (PERIPHS_IO_MUX + 0x34)
#define FUNC_GPIO0 0
#define PERIPHS_IO_MUX_GPIO2_U (PERIPHS_IO_MUX + 0x38)
#define FUNC_GPIO2 0
#define FUNC_U1TXD_BK 2
#define FUNC_U0TXD_BK 4
#define PERIPHS_IO_MUX_GPIO4_U (PERIPHS_IO_MUX + 0x3C)
#define FUNC_GPIO4 0
#define PERIPHS_IO_MUX_GPIO5_U (PERIPHS_IO_MUX + 0x40)
#define FUNC_GPIO5 0

The first step to using a pin for any purpose is to use these registers and function values in order to configure the ESP to do what we want. Espressif provides a macro, PIN_FUNC_SELECT, that takes in the appropriate memory location and function and configures the pin for us.

PIN_FUNC_SELECT(PERIPHS_IO_MUX_MTCK_U, FUNC_GPIO13);

Essentially, at compile time, this code that instructs GPIO13 to be configured as a GPIO pin is substituted with a far more complex block of code.

do {
	WRITE_PERI_REG(PERIPHS_IO_MUX_MTCK_U,
		READ_PERI_REG(PERIPHS_IO_MUX_MTCK_U)
		&  (~(PERIPHS_IO_MUX_FUNC<<PERIPHS_IO_MUX_FUNC_S))
		|( (((FUNC_GPIO13&BIT2)<<2)|(FUNC_GPIO13&0x3))<<PERIPHS_IO_MUX_FUNC_S) );
    } while (0);

Of course, the whole purpose of providing these macros in the first place is so that you don’t need to do the heavy lifting when configuring a pin to blink a light, as Tom does in his code. Memory manipulation is hard, close to the metal stuff. This example demonstrates a lot of the bitwise manipulation you have to do when you want to flip specific bits without touching other ones. It’s a lot of fun for me since I enjoy the feeling of manipulating computer hardware by manually flipping the bits, but it’s a bit of an acquired taste. When you’re trying to solve problems efficiently, even like configuring a piece of hardware with its memory register as in the above example, almost-unreadable code sometimes makes a lot of sense.

Breaking down the PIN_FUNC_SELECT macro, we can see exactly how the pin function configuration works:

  1. Read the register associated with GPIO13
  2. Take a value representing the bits that can be set to configure the pin and shift it to line up with the register, PERIPHS_IO_MUX_FUNC<<PERIPHS_IO_MUX_FUNC_S (which evaluates to 0b000100110000), and invert it into the mask 0b111011001111
  3. Bitwise AND the mask with the register’s value to clear the function configuration bits, giving us a blank slate without touching bits that don’t have to do with the function configuration
  4. Interpret the provided function, whose max value can be 7 (or 0b111) by isolating the most significant bit (with a bitwise AND) and moving it to the left two bits, then combining it with the two least significant bits (with a bitwise OR) before moving the whole thing over 4 bits to line it up with the empty spaces in the mask
  5. Bitwise OR the configuration mask with the actual configuration data, and then writes it to memory.

In case you’re wondering, WRITE_PERI_REG(addr, val) is just a nice-looking way to set a pointer referring to the location “addr” to the value “val.” It just substitutes in a line of code that turns the call into pointer magic.

#define WRITE_PERI_REG(addr, val) (*((volatile uint32_t *)ETS_UNCACHED_ADDR(addr))) = (uint32_t)(val)

Further exploration tells me that the ETS_UNCACHED_ADDR macro doesn’t actually do anything, so the above line of code is literally just casting a memory location as a pointer.

With the first step out of the way, we can see how easy it is to use gpio_output_set to set a pin high or low. Looking back at Tom’s code, we can see that he’s accepting a single pin as input, grabbing the corresponding mask, and then feeding it into the arguments of the function.

void
eio_high ( int gpio )
{
	register int mask = bits[gpio];

	gpio_output_set(0, mask, mask, 0);
}

void
eio_low ( int gpio )
{
	register int mask = bits[gpio];

	gpio_output_set ( mask, 0, mask, 0);
}

So, what do each of these arguments do? According to the SDK API Guide:

void gpio_output_set(
	uint32 set_mask,
	uint32 clear_mask, 
	uint32 enable_mask,
	uint32 disable_mask

	)

Set_mask sets high output, clear_mask sets low output, enable_mask enables the corresponding pin for use as an output, and disable_mask enables the corresponding pin for use as an input. This is where things get a little hairy, as the actual SDK code is not open source, so we have to do a bit of detective work to figure out what’s happening. These arguments actually correspond to four memory registers associated with the ESP8266’s GPIO. To figure out what’s going on here, let’s take a look at Espressif’s Interface GPIO Manual. It turns out, there are at least six 16-bit registers associated with the general-purpose output:

  1. GPIO_ENABLE, a status register indicating which pins are enabled as outputs. If a pin’s bit isn’t high here, the ESP considers it an input
  2. GPIO_ENABLE_W1TS, a register where writing 1s sets the corresponding bits of the GPIO_ENABLE register to high
  3. GPIO_ENABLE_W1TC, a register where writing 1s sets the corresponding bits of the GPIO_ENABLE register to low
  4. GPIO_OUT, a status register indicating the values driven to outputs
  5. GPIO_OUT_W1TS, a register where writing 1s sets the corresponding sets bits of the GPIO_OUT register to high
  6. GPIO_OUT_W1TS, a register where writing 1s sets the corresponding clears bits of the GPIO_OUT register to low

Now we can come to understand what is actually happening when gpio_output_set is called. The function at the very least performs four writes to memory, and those writes will need to respect the current contents of memory. In eagle_soc.h, Espressif provides a function macro, GPIO_REG_WRITE, and additional macros designed to simplify calls to WRITE_PERI_REG when GPIO is concerned.

#define GPIO_REG_READ(reg)            READ_PERI_REG(PERIPHS_GPIO_BASEADDR + reg)
#define GPIO_REG_WRITE(reg, val)      WRITE_PERI_REG(PERIPHS_GPIO_BASEADDR + reg, val)
#define GPIO_OUT_ADDRESS              0x00
#define GPIO_OUT_W1TS_ADDRESS         0x04
#define GPIO_OUT_W1TC_ADDRESS         0x08

#define GPIO_ENABLE_ADDRESS           0x0c
#define GPIO_ENABLE_W1TS_ADDRESS      0x10
#define GPIO_ENABLE_W1TC_ADDRESS      0x14

So if we were to reconstruct the code that comprises gpio_output_set, it would look something like this:

void gpio_output_set(uint32 set_mask, uint32 clear_mask, uint32 enable_mask, uint32 disable_mask
)
{
	GPIO_REG_WRITE(GPIO_OUT_W1TS_ADDRESS, GPIO_REG_READ(GPIO_OUT_ADDRESS) | set_mask);
	GPIO_REG_WRITE(GPIO_OUT_W1TC_ADDRESS, ~GPIO_REG_READ(GPIO_OUT_ADDRESS) | clear_mask);
	GPIO_REG_WRITE(GPIO_ENABLE_W1TS_ADDRESS, GPIO_REG_READ(GPIO_ENABLE_ADDRESS) | enable_mask);
	GPIO_REG_WRITE(GPIO_ENABLE_W1TC_ADDRESS, ~GPIO_REG_READ(GPIO_ENABLE_ADDRESS) | clear_mask);
}

It should be very clear at this point that gpio_output_set is a generalized function designed to do any major GPIO work a user trying to avoid memory manipulation may need. It may look clean but doesn’t lend itself to practical applications where speed is a factor. Using it necessitates entering into the function itself, which comes with overhead. Additionally, unless Espressif is detecting whether a mask is 0 (itself necessitating overhead via branching code), every gpio_output_set call performs several useless memory writes. You may be asking, useless how?

If you’re designing a substantial project or product that uses the ESP8266, you likely already know which pins you are using for I/O. In our stepper motor project, we know that we have specific pins for enabling the motor, motor direction, microstepping and stepping. If we manually enable only those pins during our program’s initialization, suddenly those calls to write to the enable addresses (and related bitwise math) are totally useless. We intended to drive the stepper motor by toggling the stepping pin with a resolution of 5 microseconds using Tom’s eio_high and eio_low functions, and the overhead of hopping into his function, and then hopping into Espressif’s slow GPIO function caused a plethora of watchdog timer resets. Drivers are all about efficiency, especially in embedded systems where a greedy driver can cause a critical failure. Don’t take my word for it, however. Other people far more experienced than I have proved beyond a measure of a doubt that controlling GPIO through a special function is too slow. Nerd Ralph, working on the Arduino-based implementation of the ESP8266 SDK, replaced the digitalWrite function with inline memory manipulation and managed to squeeze 4Mbps communication out of it.

If that weren’t enough, Espressif provides a macro-based way to manipulate singular GPIO pins, confusingly named GPIO_OUTPUT_SET. It takes a pin number and the binary value to drive the pin. According to the Interface GPIO manual,

GPIO_OUTPUT_SET(GPIO_ID_PIN(12), 1);

is functionally equivalent to

GPIO_REG_WRITE(GPIO_ENABLE_W1TS_ADDRESS, GPIO_REG_READ(GPIO_ENABLE_ADDRESS) | 0x1000);
GPIO_REG_WRITE(GPIO_OUT_W1TS_ADDRESS, GPIO_REG_READ(GPIO_OUT_ADDRESS) | 0x1000);

Of course, a nice person from the internet with an oscilloscope proved that this macro was still too slow compared to just manually writing to the registers. Besides, it only works on a pin-by-pin basis. What happens when, such as the case of our hobby servo controller, we will potentially be flipping up to 4 pins simultaneously? Using GPIO_REG_WRITE with the correct mask is the only option.

Our GPIO code is still in progress. For v0.3.0 Pulled Pork, we wanted to be completely weaned off of Tom’s demo GPIO code in a way that was efficient enough that the motor driver interrupts could run at the intended resolution of 5 microseconds. For the most part, we succeeded. Having moved on to some internal product application demonstrations, we’ve discovered that the hobby servo controller still has the occasional watchdog timer reset, and I’m pretty sure I know how to make the code more efficient.

Pulled Pork’s GPIO code relies on this macro:

#define GPIO_MASK_WRITE(mask)	{			\
	GPIO_REG_WRITE(GPIO_OUT_W1TS_ADDRESS, (mask));	\
	GPIO_REG_WRITE(GPIO_OUT_W1TC_ADDRESS, ~(mask));	\
	}

Doing both of these writes in one macro is actually really unnecessary. All of the stepper motor driver’s writes to the GPIO registers outside of the initialization are either setting or clearing a bit. Doing both a set and a clear is just a waste of a few cycles. This problem is exacerbated in the hobby servo array driver, which features an embarrassingly inefficient for loop.

//OpenMYR's quad_servo_driver.c
void step_driver ( void )
{
	ticks++;
	if(ticks <= SERVO_TICKS_CEILING)
	{
		int n = 0;
		unsigned int gpio_output = 0;
		for(n; n < 4; n++)
		{
			gpio_output += (ticks <= high_ticks[n]) * gpio_output_mask[n];
		}
		GPIO_MASK_WRITE(gpio_output);
	}
	else if(ticks == PULSE_LENGTH_TICKS)
	{
		ticks = 0;
		int current_motor = 0;
		GPIO_MASK_WRITE(GPIO_ALL_MASK);
...
	}
}

The irst branching block only sets the motor signal pins low to indicate the end of a duty cycle, and the else if block only sets sets all the signal pins high at the beginning of the pulse. Doing an unnecessary extra memory write is really just icing on the cake; the for loop is way too much overhead. All we’re doing is taking the boolean value of whether a motor is still within its duty cycle and then multiplying it by the mask associated with that pin and adding it to the mask to be sent to GPIO_MASK_WRITE. However, assembling the entire mask to be sent to a clear-only line of code can be done in one line with just as many additions and multiplications, by detecting whether the duty cycle is done instead of ongoing. What a waste!

I hope you’ve gained some knowledge of how the ESP SDK handles GPIO, and how some resourcefulness, Google-fu and digging in the include files can yield much better results. I still have a lot of experimentation to do. The stepper motor board has tri-state LEDs on it, and I need to learn how the pullup resistor registers work in order to use them. Additionally, one feature we want to support is external peripheral I/O, so I’m going to need to learn all about input and the input interrupts as well. I expect that I’ll be writing a post about these topics in the future!

This entry was posted in Development, ESP8266, Reference and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *