armreal-timestm32freertosreal-time-clock

Precise Time Measurement in STM32MP1 with DWT CYCCNT


I am using OSD32MP1 (based on STM32MP157c) in Production Mode with OpenSTLinux on Core A7 and FreeRTOS on M4. One of the tasks is to timestamp ADC data acquired by M4 at very highspeed, very precisely (think it order of nanosecond to microsecond). Note that only time difference between measurements is important.

On-chip RTC is available (it is assigned to A7 but registers are accessible to M4). However the subsecond precision is ~0.003s (PREDIV_S is 255 - See Reference Manual for Detail) so it is not good enough.

This, this and this stackoverflow posts led to using DWT_CYCCNT i.e., CPU Cycle Counter to measure the time time difference. Relevant portions of the code is as following:

On M4 Side:

typedef struct tTimeStamp
{
    uint32_t nCPUFreq;
    uint32_t nCPUCycles;
    ...
}tTimeStamp;

...

tTimeStamp oTimeStamp;

...

oTimeStamp.nCPUCycles = DWT->CYCCNT;
oTimeStamp.nCPUFreq = HAL_RCC_GetSystemCoreClockFreq();

The last 2 statements runs inside the FreeRTOS task right before ADC values are read. The timestamps along with other data are handed over to A7.

On A7 Side (assuming to have tTimeStamp at time T0 and then tTimeStamp at time T1):

// Second to NanoSecond Conversion
#define SECTONS 1000000000 

... 

float ComputeTimeDiffNS(tTimeStamp oTS0, tTimeStamp oTS1)
{
    // to avoid reporting time diff at t0
    // and in case CPU frequency changes
    if (oTS0.nCPUFreq != oTS1.nCPUFreq)
        return -1;
    
    // in case of counter overflow
    if (oTS0.nCPUCycles > oTS1.nCPUCycles)
    {
        float fCyclesDiff = float(UINT32_MAX- oTS0.nCPUCycles + oTS1.nCPUCycles);
        return fCyclesDiff * SECTONS / float(oTS0.nCPUFreq) / 2;
    }

    // base case 
    else
    {
        float fCyclesDiff = float(oTS1.nCPUCycles - oTS0.nCPUCycles);
        return fCyclesDiff * SECTONS / float(oTS0.nCPUFreq);
    }
}
  1. Is this the correct method to measure very precise time difference using DWT->CYCCNT and HAL_RCC_GetSystemCoreClockFreq()? Is there a better, more precise method?
  2. The above method gives me twice the time than it should be. While reading DWT->CYCCNT, I also toggle a pin and measure the interval between toggles using logic analyzer. Say that this time tActual is 2ms. However the above formula i.e., CPU_Cycles / CPU_Frequency returns tMeasured = 4ms.

This seems to suggest that formula should be CPU_Cycles / (2*CPU_Frequency). So either frequency needs to doubled or cycles needs to halved.

In readouts, nCPUFreq is 208878528 (max allowed per Reference Manual is 209000000), therefore this must be correct and cannot be multiplied by 2.

CPU_Cycles may be divided by 2 but would it not suggest that CPU is going through 2 cycles per one clock cycle? Is that possible (CPU cycling on both rising and falling edge??)


Solution

  • TLDR: Packet drop between M4 and A7.

    Hi, I ended up solving my own problem with a lot of help from PatrikF at ST Forum who suggested that DWT should work as ARM specifies it to.

    Turned out the problem was very consistent packet drop between M4 and A7, exactly by factor of 2 which resulted in twice the CYCCNT. I wasted too much time looking in wrong direction but at the end of the day, I learnt the importance of packet counter.

    Note that Partrik also added some recommendations on high precision counters in STM:

    Maybe using STGENR is another option independant of Cortex-M4 frequency.

    STGEN is running by default on HSI 64MHz which give you a resolution of about 15ns, but HSI is not an high precision oscillator (+/-1%).

    alternatively, using STGEN on HSE 24MHz which is more precise (few ten of ppm) but give a resolution of about 40ns.

    See also this post: https://community.st.com/s/question/0D53W00000oXAqhSAG/how-can-i-get-access-to-m4-timers-from-a7-linux-is-it-possible-

    As STGEN is read using AXI bus thru async buses from Cortex-m4, it must suffer some ns of additional latency.