This wiki documents Piksi v2.3.1 which was discontinued April 1st, 2017.
Visit support.swiftnav.com for newer products including Piksi Multi.

STM32F4xx porting and speed notes

From Swift Navigation Wiki
Jump to navigation Jump to search

Floating-point and speed notes

  • GCC 4.5 doesn't really support cortex-m4, but summon-arm-toolchain and libopencm3 are tested mainly with Linaro GCC 4.5.
  • The GCC manual claims that thumb instruction format and VFP floating-point instructions are not compatible. This is not true.
  • zippe makes the claim that -ftree-vectorize should begin helping more in GCC 4.6. The benchmark program is not complicated enough for this to do anything (I tested).
  • -flto is available in GCC 4.6 and above. You must pass this flag to the linker as well.

Floating-point benchmarks

Test results for STM32f407 (on the Discovery board):

The clock was initialized to 168MHz.

For software floating point tests, the following compiler flags were used for both libopencm3 and the test program:

CFLAGS += -Os -g -Wall -Wextra -I$(TOOLCHAIN_DIR)/include \
		   -fno-common -mcpu=cortex-m4 -march=armv7e-m -mthumb \
                   -msoft-float -mfloat-abi=soft -MD -DSTM32F4
LDFLAGS += -lc -lnosys -L$(TOOLCHAIN_DIR)/lib \
	           -L$(TOOLCHAIN_DIR)/lib/stm32/f4 \
		   -T$(LDSCRIPT) -nostartfiles -Wl,--gc-sections -mcpu=cortex-m4 \
		   -mthumb -march=armv7e-m -mfloat-abi=soft -msoft-float

For hardware floating point tests, the following compiler flags were used for both libopencm3 and the test program:

CFLAGS  += -Os -g -Wall -Wextra -I$(TOOLCHAIN_DIR)/include \
		   -fno-common -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m \
                   -MD -DSTM32F4 

LDFLAGS  += -lc -lnosys -L$(TOOLCHAIN_DIR)/lib \
	           -L$(TOOLCHAIN_DIR)/lib/stm32/f4 \
		   -T$(LDSCRIPT) -nostartfiles -Wl,--gc-sections \
		   -mthumb -mcpu=cortex-m4 -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 

Results

Approximate instruction times in nanoseconds:

Operation Soft Single Hard Single Soft Double Hard Double
+ 369 47.6 554 41.7
- 178 47.6 583 41.7
* 619 476 399 500
\ 4325 560 5895 583

Hardware floating point operations are faster by a factor of

Operation Float Double
+ 8.9 13.3
- 4.3 14.0
* 1.2 0.8
/ 7.4 10.1

Single-precision floating point operations are faster by a factor of

Operation Soft Hard
+ 1.5 0.875
- 3.3 0.875
* 0.64 1.05
/ 1.4 1.04

Transcript

Here is the result of the timing program, using 'soft' floating point ABI and software operations (two runs, single precision):

Completed 210015488 empty loops in 10 seconds.
Completed 24001769 additions in 10 seconds.
Completed 44213787 subtractions in 10 seconds.
Completed 15000789 multiplications in 10 seconds.
Completed 2286939 divisions in 10 seconds.
Completed 210015488 empty loops in 10 seconds.
Completed 24001769 additions in 10 seconds.
Completed 44213787 subtractions in 10 seconds.
Completed 15000789 multiplications in 10 seconds.
Completed 2286939 divisions in 10 seconds.

And double precision:

Completed 210015489 empty loops in 10 seconds.
Completed 16634888 additions in 10 seconds.
Completed 15850225 subtractions in 10 seconds.
Completed 22386461 multiplications in 10 seconds.
Completed 1682897 divisions in 10 seconds.
Completed 210015489 empty loops in 10 seconds.
Completed 16634888 additions in 10 seconds.
Completed 15850225 subtractions in 10 seconds.
Completed 22386461 multiplications in 10 seconds.
Completed 1682897 divisions in 10 seconds.

Here is the result of the timing program, using the "hard" floating-point ABI (two runs, single precision):

Completed 210015489 empty loops in 10 seconds.
Completed 105007744 additions in 10 seconds.
Completed 105007744 subtractions in 10 seconds.
Completed 19092313 multiplications in 10 seconds.
Completed 16471793 divisions in 10 seconds.
Completed 210015489 empty loops in 10 seconds.
Completed 105007744 additions in 10 seconds.
Completed 105007744 subtractions in 10 seconds.
Completed 19092313 multiplications in 10 seconds.
Completed 16471793 divisions in 10 seconds.

And double precision:

Completed 210015489 empty loops in 10 seconds.
Completed 112008260 additions in 10 seconds.
Completed 112008260 subtractions in 10 seconds.
Completed 18262213 multiplications in 10 seconds.
Completed 15850216 divisions in 10 seconds.
Completed 210015489 empty loops in 10 seconds.
Completed 112008260 additions in 10 seconds.
Completed 112008260 subtractions in 10 seconds.
Completed 18262213 multiplications in 10 seconds.
Completed 15850216 divisions in 10 seconds.