linux-kernelarmsd-cardomap

Stop MMC queue from fetching new requests when communication with card times out


We are using a custom board running version 3.2 of the kernel, but we've come across some issues when testing the robustness of the MMC layer.

First of all, our MMC slot does not have a card detection pin. The issue in question consists of the following:

  1. Load the module (omap_hsmmc). The card is detected on power-up, and is mounted appropriately.
  2. Read something from the SD card (i.e. cat foo.txt)
  3. While the file is being read, remove the card.
  4. After many failed attempts, the system hangs.

Now, I've tracked the issue to the following code section, in drivers/mmc/card/queue.c:

static int mmc_queue_thread(void *d)                                                                                    
{                                                                                                                       
   struct mmc_queue *mq = d;                                                                                            
   struct request_queue *q = mq->queue;                                                                                 

   current->flags |= PF_MEMALLOC;                                                                                       

   down(&mq->thread_sem);                                                                                               
   do {                                                                                                                 
      struct request *req = NULL;                                                                                       
      struct mmc_queue_req *tmp;                                                                                        

      spin_lock_irq(q->queue_lock);                                                                                     
      set_current_state(TASK_INTERRUPTIBLE);                                                                            
      req = blk_fetch_request(q);                                                                                       
      mq->mqrq_cur->req = req;                                                                                          
      spin_unlock_irq(q->queue_lock);                                                                                   

      if (req || mq->mqrq_prev->req) {                                                                                  
         set_current_state(TASK_RUNNING);                                                                               
         mq->issue_fn(mq, req);                                                                                         
      } else {                                                                                                          
         if (kthread_should_stop()) {                                                                                   
            set_current_state(TASK_RUNNING);                                                                            
            break;                                                                                                      
         }                                                                                                              
         up(&mq->thread_sem);                                                                                           
         schedule();                                                                                                    
         down(&mq->thread_sem);                                                                                         
      }                                                                                                                 

      /* Current request becomes previous request and vice versa. */                                                    
      mq->mqrq_prev->brq.mrq.data = NULL;                                                                               
      mq->mqrq_prev->req = NULL;                                                                                        
      tmp = mq->mqrq_prev;                                                                                              
      mq->mqrq_prev = mq->mqrq_cur;                                                                                     
      mq->mqrq_cur = tmp;                                                                                               
   } while (1);                                                                                                         
   up(&mq->thread_sem);                                                                                                 

   return 0;                                                                                                            
}

Investigating this code, I've found that it hangs inside the mq->issue_fn(mq, req) call. This function prepares and issues the appropriate commands to fufill the request passed into it, and it is aware of any errors that happens when it can't communicate with the card. The error is handled in an awkward (in my opinion) manner, and does not 'bubble up' to the mmc_queue_thread. However, I've checked my version's code against that of the most recent kernel release (4.9), and I did not see any difference in the treatment of such errors, apart from a better separation of each error case (the treatment is very similar).

I suspect that the issue is caused by the inability ofthe upper layers to stop making new requests to read from the MMC card.


What I've Tried:


My questions:

  1. Is the best way to solve this by stopping the upper layers from making new requests? Or should the thread itself be 'killed off'?
  2. In a case such as mine, should it be assumed that because there is no card detection pin, the card should not be removed?

Update: I've found out that you can tell the driver to use polling to detect card insertion/removal. It can be done by adding the MMC_CAP_NEEDS_POLL flag to the mmc's .caps on driver initialization (on newer kernels, you can use the broken-cd property on your DT). However, the problem still happens after this modification.


Solution

  • Figured out the solution!

    The issue was, as I suspected, that the block requests kept being issued even after the card was removed. The error was fixed in commit a8ad82cc1b22d04916d9cdb1dc75052e80ac803c from Texas' kernel repository:

    commit a8ad82cc1b22d04916d9cdb1dc75052e80ac803c
    Author: Sujit Reddy Thumma <sthumma@codeaurora.org>
    Date:   Thu Dec 8 14:05:50 2011 +0530
    
        mmc: card: Kill block requests if card is removed
    
        Kill block requests when the host realizes that the card is
        removed from the slot and is sure that subsequent requests
        are bound to fail. Do this silently so that the block
        layer doesn't output unnecessary error messages.
    
        Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
        Acked-by: Adrian Hunter <adrian.hunter@intel.com>
        Signed-off-by: Chris Ball <cjb@laptop.org>
    

    The key in the commit is that a return BLKPREP_KILL was added to the mmc_prep_request(), besides some minor tweaks that added some 'granularity' to the error treatment, adding a specific code for the cases where the card was removed.