Created a project template via CubeMX: STM32G030 is SPI slave and uses circular DMA bufs for transfers.
As SPI master I'm using an ESP32 running micropython. This, though, should be irrelevant, as I don't rely on its output but what the Logic Analyzer says. And what the ESP32 = SPI Master = micropython says equals the output of the LA.
Back to the STM32:
Everything appears to be working, except changes to the MISO-buffer are not /fully/ reflected in the final transfer.
I'm using a 4 byte array as MISO-buffer (called spi_tx_tst_buf
in the project's source posted below), which I set initially to XXXX
, modify right after boot to !!!!
and then modify every 5 seconds in the mainloop - setting all 4 bytes to the same (random) character (e.g. 'AAAA', 5s later to 'BBBB' and so on).
After boot up of the stm32 I trigger a full-duplex SPI transfer every 2s from the SPI master.
When triggering the SPI transfer the first time after boot of the stm32 - let's say 1s after power-up - the stm32 always sends "XX!!" for the very first transfer after power-up - which is a mix of the old and the new values.
Every consecutive (>1) SPI read within the next ~4s (= time until the buffer is changed again) however, the stm32 provides the SPI master with the correct byte sequence (in this case: "!!!!"). Until the buffer is changed again, where the first read results in a mix of old and new values again.
What might appear as a race condition, I feel like is unlikely, as the passed time between changes to the buffer and querying it from the SPI master does not matter. Also, that reading twice after buffer change - and with definitely no change in between - results in different output.
When driving the SPI clock every 2s - starting right after the stm32 successfully powered up and it then changing its buffer every 5 seconds - the SPI master receives the following results:
XX!!
!!!!
!!!!
!!!L
LLLL
LLLV
VVVV
VVVV
VVVQ
QQQQ
QQQD
DDDD
DDDD
DDDY
YYYY
What sticks out is, that only one transfer per buffer change is wrong. That also doesn't change if I increase the SPI transfer frequency on the SPI master (e.g. every 2s -> every 0.5s).
What puzzles me most is, that I figured DMA means a direct mapping of the memory canonically storing the 4 bytes. So, given I'm not completely off, either the changes I apply to the buffer do not (completely) make it into the memory, or the buffer which SPI uses to read the 4 bytes from for the transmission, is a shadowed one.
Either way: I'd really like to know how/why this is happening and - surprise - also how to do it correctly.
Here's the project's source code. Lot's of boilerplate, but too scared to cut something out:
/* USER CODE BEGIN Header */
/**
******************************************************************************
* @file : main.c
* @brief : Main program body
******************************************************************************
* @attention
*
* Copyright (c) 2023 STMicroelectronics.
* All rights reserved.
*
* This software is licensed under terms that can be found in the LICENSE file
* in the root directory of this software component.
* If no LICENSE file comes with this software, it is provided AS-IS.
*
******************************************************************************
*/
/* USER CODE END Header */
/* Includes ------------------------------------------------------------------*/
#include "main.h"
/* Private includes ----------------------------------------------------------*/
/* USER CODE BEGIN Includes */
#include <string.h>
/* USER CODE END Includes */
/* Private typedef -----------------------------------------------------------*/
/* USER CODE BEGIN PTD */
/* USER CODE END PTD */
/* Private define ------------------------------------------------------------*/
/* USER CODE BEGIN PD */
/* USER CODE END PD */
/* Private macro -------------------------------------------------------------*/
/* USER CODE BEGIN PM */
/* USER CODE END PM */
/* Private variables ---------------------------------------------------------*/
SPI_HandleTypeDef hspi1;
DMA_HandleTypeDef hdma_spi1_rx;
DMA_HandleTypeDef hdma_spi1_tx;
/* USER CODE BEGIN PV */
volatile char spi_tx_tst_buf[4] = "XXXX";
volatile char spi_rx_tst_buf[4] = {0,0,0,0};
/* USER CODE END PV */
/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);
static void MX_GPIO_Init(void);
static void MX_DMA_Init(void);
static void MX_SPI1_Init(void);
/* USER CODE BEGIN PFP */
/* USER CODE END PFP */
/* Private user code ---------------------------------------------------------*/
/* USER CODE BEGIN 0 */
/* USER CODE END 0 */
/**
* @brief The application entry point.
* @retval int
*/
int main(void)
{
/* USER CODE BEGIN 1 */
/* USER CODE END 1 */
/* MCU Configuration--------------------------------------------------------*/
/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
HAL_Init();
/* USER CODE BEGIN Init */
/* USER CODE END Init */
/* Configure the system clock */
SystemClock_Config();
/* USER CODE BEGIN SysInit */
/* USER CODE END SysInit */
/* Initialize all configured peripherals */
MX_GPIO_Init();
MX_DMA_Init();
MX_SPI1_Init();
/* USER CODE BEGIN 2 */
if(HAL_SPI_TransmitReceive_DMA(&hspi1, (uint8_t *)&spi_tx_tst_buf, (uint8_t *)&spi_rx_tst_buf, sizeof(spi_tx_tst_buf)) != HAL_OK)
{
/* Transfer error in transmission process */
Error_Handler();
}
strncpy(spi_tx_tst_buf, "!!!!", 4);
/* USER CODE END 2 */
/* Infinite loop */
/* USER CODE BEGIN WHILE */
while (1)
{
HAL_Delay(5000);
int n = rand()%((90+1)-65) + 65;
spi_tx_tst_buf[0] = n;
spi_tx_tst_buf[1] = n;
spi_tx_tst_buf[2] = n;
spi_tx_tst_buf[3] = n;
/* USER CODE END WHILE */
/* USER CODE BEGIN 3 */
}
/* USER CODE END 3 */
}
/**
* @brief System Clock Configuration
* @retval None
*/
void SystemClock_Config(void)
{
RCC_OscInitTypeDef RCC_OscInitStruct = {0};
RCC_ClkInitTypeDef RCC_ClkInitStruct = {0};
/** Configure the main internal regulator output voltage
*/
HAL_PWREx_ControlVoltageScaling(PWR_REGULATOR_VOLTAGE_SCALE1);
/** Initializes the RCC Oscillators according to the specified parameters
* in the RCC_OscInitTypeDef structure.
*/
RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI;
RCC_OscInitStruct.HSIState = RCC_HSI_ON;
RCC_OscInitStruct.HSIDiv = RCC_HSI_DIV1;
RCC_OscInitStruct.HSICalibrationValue = RCC_HSICALIBRATION_DEFAULT;
RCC_OscInitStruct.PLL.PLLState = RCC_PLL_NONE;
if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK)
{
Error_Handler();
}
/** Initializes the CPU, AHB and APB buses clocks
*/
RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK|RCC_CLOCKTYPE_SYSCLK
|RCC_CLOCKTYPE_PCLK1;
RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_HSI;
RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV1;
if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_0) != HAL_OK)
{
Error_Handler();
}
}
/**
* @brief SPI1 Initialization Function
* @param None
* @retval None
*/
static void MX_SPI1_Init(void)
{
/* USER CODE BEGIN SPI1_Init 0 */
/* USER CODE END SPI1_Init 0 */
/* USER CODE BEGIN SPI1_Init 1 */
/* USER CODE END SPI1_Init 1 */
/* SPI1 parameter configuration*/
hspi1.Instance = SPI1;
hspi1.Init.Mode = SPI_MODE_SLAVE;
hspi1.Init.Direction = SPI_DIRECTION_2LINES;
hspi1.Init.DataSize = SPI_DATASIZE_8BIT;
hspi1.Init.CLKPolarity = SPI_POLARITY_LOW;
hspi1.Init.CLKPhase = SPI_PHASE_1EDGE;
hspi1.Init.NSS = SPI_NSS_SOFT;
hspi1.Init.FirstBit = SPI_FIRSTBIT_MSB;
hspi1.Init.TIMode = SPI_TIMODE_DISABLE;
hspi1.Init.CRCCalculation = SPI_CRCCALCULATION_DISABLE;
hspi1.Init.CRCPolynomial = 7;
hspi1.Init.CRCLength = SPI_CRC_LENGTH_DATASIZE;
hspi1.Init.NSSPMode = SPI_NSS_PULSE_DISABLE;
if (HAL_SPI_Init(&hspi1) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN SPI1_Init 2 */
/* USER CODE END SPI1_Init 2 */
}
/**
* Enable DMA controller clock
*/
static void MX_DMA_Init(void)
{
/* DMA controller clock enable */
__HAL_RCC_DMA1_CLK_ENABLE();
/* DMA interrupt init */
/* DMA1_Channel1_IRQn interrupt configuration */
HAL_NVIC_SetPriority(DMA1_Channel1_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel1_IRQn);
/* DMA1_Channel2_3_IRQn interrupt configuration */
HAL_NVIC_SetPriority(DMA1_Channel2_3_IRQn, 0, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel2_3_IRQn);
}
/**
* @brief GPIO Initialization Function
* @param None
* @retval None
*/
static void MX_GPIO_Init(void)
{
/* GPIO Ports Clock Enable */
__HAL_RCC_GPIOA_CLK_ENABLE();
}
/* USER CODE BEGIN 4 */
/* USER CODE END 4 */
/**
* @brief This function is executed in case of error occurrence.
* @retval None
*/
void Error_Handler(void)
{
/* USER CODE BEGIN Error_Handler_Debug */
/* User can add his own implementation to report the HAL error return state */
__disable_irq();
while (1)
{
}
/* USER CODE END Error_Handler_Debug */
}
#ifdef USE_FULL_ASSERT
/**
* @brief Reports the name of the source file and the source line number
* where the assert_param error has occurred.
* @param file: pointer to the source file name
* @param line: assert_param error line source number
* @retval None
*/
void assert_failed(uint8_t *file, uint32_t line)
{
/* USER CODE BEGIN 6 */
/* User can add his own implementation to report the file name and line number,
ex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */
/* USER CODE END 6 */
}
#endif /* USE_FULL_ASSERT */
The main problem is that the SPI device has a FIFO, which it pre-loads from memory when you start the DMA transaction. It needs to do this, because when acting as an SPI slave (as you are), it needs to have data ready to send as soon as a transaction is initiated by the master.
From the reference manual, in section 27.5.9:
All SPI data transactions pass through the 32-bit embedded FIFOs. This enables the SPI to work in a continuous flow, and prevents overruns when the data frame size is short. Each direction has its own FIFO called TXFIFO and RXFIFO. These FIFOs are used in all SPI modes except for receiver-only mode (slave or master) with CRC calculation enabled
In a similar way, write access of a data frame to be transmitted is managed by the TXE event. This event is triggered when the TXFIFO level is less than or equal to half of its capacity. Otherwise TXE is cleared and the TXFIFO is considered as full. In this way, RXFIFO can store up to four data frames, whereas TXFIFO can only store up to three when the data frame format is not greater than 8 bits. This difference prevents possible corruption of 3x 8-bit data frames already stored in the TXFIFO when software tries to write more data in 16-bit mode into TXFIFO.
In summary, the SPI peripheral is buffering 3 bytes of data in advance of any transaction.
Let's say you start with a buffer (of 4 bytes) like this:
Buffer: "AAAA"
You then call HAL_SPI_TransmitReceive_DMA()
. You now have this situation:
TXFIFO: "AAA" Buffer: "AAAA"
The master carries out a transaction of 4 bytes, and receives the AAA
from the TXFIFO, plus another A
read from memory. So you're still like this:
TXFIFO: "AAA" Buffer: "AAAA"
Now you change the buffer to BBBB
. So now you have this:
TXFIFO: "AAA" Buffer: "BBBB"
The next transaction from the master will receive the AAA
from the TXFIFO, and a B
read from memory. And this situation will repeat for other changes.
Your very first transaction shows the data being like this:
XX!!
This is a simple race condition. Your code is basically:
volatile char spi_tx_tst_buf[4] = "XXXX";
HAL_SPI_TransmitReceive_DMA(&hspi1, (uint8_t *)&spi_tx_tst_buf, (uint8_t *)&spi_rx_tst_buf, sizeof(spi_tx_tst_buf));
strncpy(spi_tx_tst_buf, "!!!!", 4);
i.e. you start the DMA while the data is XXXX
, then immediately overwrite it with !!!!
. The race is whether the DMA can grab three bytes before you can overwrite them.
It seems you need some kind of synchronisation between data being read out of your buffer, and you writing new data into it.
It seems likely you'll need a buffer that can contain more than one of your 4-byte data packets, so that the SPI can be reading from one, while you are writing to the other. You'd probably need to use interrupts to find out where the DMA is currently processing, e.g. a half-complete interrupt and a complete interrupt.
An alternative might be to not use DMA, and do everything in an ISR. That will be tricky if you need the first byte of the response to be correct, but I've seen FPGA implementations that send a dummy byte as the first MISO byte, followed by the actual data, e.g. the master sends (and receives) 5 bytes, but only the last 4 bytes of the response contain data.