cembeddeddriverwatchdogc-libraries

Making driver library for a slow module, watchdog friendly


Context

I'm making some libraries to manage internet protocol trough GPRS, some part of this communications (made trough UART) are rather slow (some can take more than 30 seconds) because the module has to connect through GPRS.

First I made a driver library to control the module and manage TCP/IP connections, this library worked whit blocking functions, for example a function like Init_GPRS_connection() could take several seconds to end, I have been made to notice that this is bad practice, cause now I have to implement a watchdog timer and this kind of function is not friendly whit short timeout like watchdogs have (I cannot kick the timer before it expire)

What have I though

I need to rewrite part of my libraries to be watchdog friendly, for this purpose I have tough in this scheme, I need functions that have state machine inside, those will be pulling data acquired trough UART interruptions to advance trough the state machines, so then I can write code like:

GPRS_typef Init_GPRS_connection(){
  switch(state){ //state would be a global functions that take the current state of the state machine
   ....  //here would be all the states of the state machine
    case end:
        state = 0;
        return Done;
  }
}

while(Init_GPRS_connection() != Done){
   Do_stuff(); //Like kick the Watchdog
}

But I see a few problems whit this solution:

My question

What kind of implementation should I use to make a user and watchdog friendly driver library?, how does other drivers library manage this?

Extra information


Solution

  • Given where you are and assuming you do not what too much upheaval to your project to "do it properly", what you might to is add variable watchdog timeout extension, such that you set a counter that is decremented in a timer interrupt and if the counter is not zero, the watch dog is reset.

    That way you are not allowing the timer interrupt to reset the watchdog indefinitely while your main thread is stuck, but you can extend the watchdog immediately before executing any blocking code, essentially setting a timeout for that operation.

    So you might have (pseudocode):

    static volatile uint8_t wdg_repeat_count = 0 ;
    void extendWatchdog( uint8_t repeat ) { wdg_repeat_count = repeat ; }
    void timerISR( void )
    {
        if( wdg_repeat_count > 0 )
        {
            resetWatchdog() ;
            wdg_repeat_count-- ;
        }
    }
    

    Then you can either:

    extendWatchdog( CONNECTION_INIT_WDG_TIMEOUT ) ;
    while(Init_GPRS_connection() != Done){
       Do_stuff(); //Like kick the Watchdog
    }
    

    or continue to use your existing non-state-machine based solution:

    extendWatchdog( CONNECTION_INIT_WDG_TIMEOUT ) ;
    bool connected = Init_GPRS_connection() ;
    if( connected ) ...
    

    The idea is compatible with both what you have and what you propose, it simply allows you to extend the watchdog timeout beyond that dictated by the hardware.

    I suggest a uint8_t, because it prevents a lazy developer simply setting a large value and effectively disabling the watchdog protection, and it is likely to be atomic and so shareable between the main and interrupt context.

    All that said, it would clearly have been better to design in your integrity infrastructure from the outset at the architectural level rather than trying to bolt it on after the event. For example if you were using an RTOS, you might reset the watchdog in a low priority task that if starved, would cause a watchdog expiry, and that "watchdog task" could be use to monitor the other tasks to ensure they are scheduling as expected.

    Without an RTOS you might have a "big-loop" architecture with each "task" implemented as a state-machine. In your example you seem to have missed the point of a state-machine. "initialising connection" should be a single state of a high level state-machine, the internals of that state may itself be a state-machine (hierarchical state machines). So your entire system would be a single master state-machine in the main loop, and the watchdog reset once at each loop iteration. Nothing in any sub-state should block to ensure the loop time is low and deterministic. That is how for example Arduino framework's loop() function should work (when done properly - unfortunately seldom the case in examples). To understand how to implement a real-time deterministic state-machine architecture you couls do worse that look at the work of Miro Samek. The framework described therein is available via his company.