tcplwipstm32f7

Lwip 1.4.1 stuck on endless while loop in tcp_fasttmr


Core: Cortex-M7

Microcontroller: stm32f765zi

IP Stack: lwIP 1.4.1

I have a STM32F7 based embedded system with tcp server setup using LwIP which accepts 3 connections on the application layer. The server runs as expected for a few hours and randomly gets stuck in the while loop of the tcp_fasttmr function shown below:

void
tcp_fasttmr(void)
{
  struct tcp_pcb *pcb;

  ++tcp_timer_ctr;

tcp_fasttmr_start:
  pcb = tcp_active_pcbs;

  **while(pcb != NULL) {**
    if (pcb->last_timer != tcp_timer_ctr) {
      struct tcp_pcb *next;
      pcb->last_timer = tcp_timer_ctr;
      /* send delayed ACKs */
      if (pcb->flags & TF_ACK_DELAY) {
        LWIP_DEBUGF(TCP_DEBUG, ("tcp_fasttmr: delayed ACK\n"));
        tcp_ack_now(pcb);
        tcp_output(pcb);

On investigating, both pcb->last_timer and tcp_timer_ctr had the same value of 58

pcb->last_timer = tcp_timer_ctr = 58

It will be of great help if anybody can explain the reason for this behaviour of LwIP.

Further investigation also pointed out that two of the 3 client connections were pointing to the same pcb. I am not sure how this is happening as well because the wireshark traces show that the 3 different clients were communicating successfully until getting stuck in the while loop.

I found in one of the forums that I could make it a finite while loop by placing an arbitrary timeout counter. But since this issue occurs randomly after few hours, I can't tell whether I really have solved this issue. This is why I would like to know the reason this is happening.


Solution

  • I found the following answer here: https://lists.gnu.org/archive/html/lwip-users/2012-12/msg00003.html

    This is a common bug in lwIP ports that do not obey lwIP's threading requirements. The reason for pcb->next pointing to pcb is most often that more than one execution context calls lwIP functions at the same time. In your case (no OS), I guess you are using lwIP from the main loop and from an ISR (e.g. feeding packets into ethernet_input or ip_input from interrupt level or calling lwIP timer functions from interrupt level), which is not supported.