I've been working on a Nucleo STM32L433RCP and had a great setup where I had a custom _write function that took characters buffered by printf
and transmitted them over a UART. All of it was written in C and worked wonderfully. I have since transitioned to C++ for compatibility with a group I'm working on a project in, and it doesn't work anymore. printf
will still call the write function on a flush, but the buffer is filled with garbage as far as I can tell. I am working bare-metal, and not using the STM32CubeIDE or the HAL tools it comes with.
The only changes I've made while switching between C and C++ are having _write and my reset handler be run as C code using extern "C"
(without which, nothing happens).
Below is my main.cpp, please ignore some uglier formatting
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#define GPIOBS 0x48000400
#define RCC_APB2ENR 0x4002104C
// #define ODR 0x14
// #define OTR 0x04
#define SERIAL_PORT "/dev/ttyACM0"
#include "../bsp/stm32l433xx.h"
uint32_t SystemCoreClock = 0;
extern uint32_t _sidata, _sdata, _edata, _sbss, _ebss;
uint32_t arr[2000];
extern "C" {
// Reset handler: set the stack pointer and branch to main().
__attribute__((naked)) void reset_handler(void) {
// Set the stack pointer to the 'end of stack' value.
__asm__("LDR r0, =_estack\n\t"
"MOV sp, r0");
// Branch to main().
__asm__("B main");
}
}
extern "C" {
int __io_putchar(int ch){
while (!(USART2->ISR & USART_ISR_TXE)) {};
USART2->TDR = ch;
return ch;
}
int _write(int handle, const char* data, int size) {
int count = size;
while (count--) {
__io_putchar(*data++);
}
while (!(USART2->ISR & USART_ISR_TXE)) {};
USART2->TDR = 74;
while (!(USART2->ISR & USART_ISR_TXE)) {};
USART2->TDR = 10;
while (!(USART2->ISR & USART_ISR_TXE)) {};
USART2->TDR = 13;
return size;
}
void init(){
// copy the flash varibales into ram
memcpy(&_sdata, &_sidata, 4*(&_edata - &_sdata));
// Clear the .bss section in RAM.
memset(&_sbss, 0x00, 4* (&_ebss -&_sbss));
for (int i = 0; i < 10000; ++i){
arr[i] = i;
}
// Enable floating-point unit.
SCB->CPACR |= (0xF << 20);
RCC->AHB2ENR |= 7;
// Enable floating-point unit.
SCB->CPACR |= (0xF << 20);
// Default clock source is the "multi-speed" internal oscillator.
// Switch to the 16MHz HSI oscillator.
RCC->CR |= (RCC_CR_HSION);
while (!(RCC->CR & RCC_CR_HSIRDY)) {
};
RCC->CFGR &= ~(RCC_CFGR_SW);
RCC->CFGR |= (RCC_CFGR_SW_HSI);
while ((RCC->CFGR & RCC_CFGR_SWS) != RCC_CFGR_SWS_HSI) {
};
SystemCoreClock = 16000000;
RCC->APB1ENR1 |= (RCC_APB1ENR1_USART2EN);
RCC->AHB2ENR |= (RCC_AHB2ENR_GPIOAEN);
// Configure pins A2, A15 for USART2 (AF7, AF3).
GPIOA->MODER &= ~((0x3 << (2 * 2)) | (0x3 << (15 * 2)));
GPIOA->MODER |= ((0x2 << (2 * 2)) | (0x2 << (15 * 2)));
GPIOA->OTYPER &= ~((0x1 << 2) | (0x1 << 15));
GPIOA->OSPEEDR &= ~((0x3 << (2 * 2)) | (0x3 << (15 * 2)));
GPIOA->OSPEEDR |= ((0x2 << (2 * 2)) | (0x2 << (15 * 2)));
GPIOA->AFR[0] &= ~((0xF << (2 * 4)));
GPIOA->AFR[0] |= ((0x7 << (2 * 4)));
GPIOA->AFR[1] &= ~((0xF << ((15 - 8) * 4)));
GPIOA->AFR[1] |= ((0x3 << ((15 - 8) * 4)));
uint16_t uartdiv = SystemCoreClock / 9600;
USART2->BRR = uartdiv;
USART2->CR1 |= (USART_CR1_RE | USART_CR1_TE | USART_CR1_UE);
//printf("starting, Clock speed is: %lu \r\n", SystemCoreClock);
// LED Test
GPIOB->MODER &= (~(1 << 27));
GPIOB->MODER |= (1 << 26);
GPIOB->ODR |= (1 << 13);
printf("Test\n");
}
}
int main() {
init();
int x = 0;
;
while (1) {
GPIOB->ODR |= (1 << 13);
for (int i = 0; i < 500000; i++) {
};
GPIOB->ODR &= (~(1 << 13));
for (int i = 0; i < 500000; i++) {
};
x++;
printf("APPLE\n");// %d\r\n", x);
_write(0, "APE TOGETHER",12);
fflush(stdout);
}
}
Currently I have my _write function also outputting a newline and carriage return so it's less abyssmal to look at when viewing the output over uart. I've tested everything in C and it works.
I'm compiling using arm-none-eabi-g++/gcc depending on if it's a C file or not.
Thank you
Your C/C++ runtime start-up is inadequate. It simply establishes a stack and jumps to main()
. It must do other things to establish a full run-time environment that is expected of both the C and C++ runtime, and it should normally be done before main()
, the expectation being that in main()
the runtime environment is already complete.
Some of those things you are doing in init()
albeit from rather than before main()
. For example you appear to be performing static initialisation of .data
and .bss
segments, but for C++ that is insufficient. You also need to iterate through static constructors, and prior to that there will likely be some C library initialisation that needs to be called to establish stdin
, stdout
, stderr
and a heap for example.
Some of the things that should be done (normally from __start
before main
):
main()
).Additional runtime initialisation steps may be required depending on the application and run-time environment, such as:
You would do well to take a provided C/C++ runtime start-up and adapt it to your platform's needs than to take your rather minimalist approach that is clearly not establishing a suitable environment for a full C or C++ runtime.
For your part you would at least perhaps link the CMSIS core_cm4.c, startup_stm32l433xx.s and system_stm32l4xx.c which include most of the start-up and initialisation code you will need or need to adapt. These are ARM specified CMSIS start-up code and not STM32 HAL code, they largely address core Cortex M4 features and ST's specific PLL configuration and external memory configuration (if used). I'd recommend also using stm32l4xx_it.c for basic exception handlers (which you should probably elaborate).
The easiest way to obtain the STM32L4xx CMSIS perhaps is to install the STM32Cube support and copy the installed CMSIS layer to your project. It is all source code and independent of STM32Cube and critically is not related to either of STM32 HAL or LL layers or even the older standard peripheral library (SPL).