linuxtimerkernel

The Concurrency Issues of mod_timer and refcount_inc in the linux kernel


I have found several pieces of code where refcount_inc is called after the mod_timer. E.g. this one:

static int ip_frag_reinit(struct ipq *qp)
{
  unsigned int sum_truesize = 0;

  if (!mod_timer(&qp->q.timer, jiffies + qp->q.fqdir->timeout)) {
    refcount_inc(&qp->q.refcnt);
    return -ETIMEDOUT;
  }
}

Is it safe to call functions in that order?

Why this looks as a problem

The logic of the timer is like this

void timer_callback()
{
  ...
  refcount_dec(obj);
}

The timer may have already been executed on another CPU when mod_timer returns.

Even if the object is already referenced by the caller of ip_frag_reinit function, if the timer callback executes first, the object at the caller's side will face a UAF (Use After Free) issue.

At the same time there are many places in the kernel where same functions are called in the opposite, seemingly correct order: refcount_inc before mod_timer.


Solution

  • The timer function is probably the ip_expire, which looks like that:

    static void ip_expire(struct timer_list *t)
    {
        struct ipq *qp;
        int err;
    
        qp = container_of(...);
    
        rcu_read_lock();
    
        /* Paired with WRITE_ONCE() in fqdir_pre_exit(). */
        if (READ_ONCE(qp->q.fqdir->dead))
            goto out_rcu_unlock;
    
        spin_lock(&qp->q.lock);
    
        ...
    
        spin_unlock(&qp->q.lock);
        icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
        goto out_rcu_unlock;
    
    out:
        spin_unlock(&qp->q.lock);
    out_rcu_unlock:
        rcu_read_unlock();
        kfree_skb_reason(head, SKB_DROP_REASON_FRAG_REASM_TIMEOUT);
        ipq_put(qp);
    }
    

    The last line calls ipq_put, which is effectively refcount_dec_and_test.

    However, before that line the function ip_expire acquires (and releases) the spinlock.

    The function ip_frag_reinit is called with the same spinlock being taken. That is, before this function calls refcount_inc, the timer function ip_expire is unable to execute all its instructions from the beginning to the ipq_put(): this execution will stop at line

        spin_lock(&qp->q.lock);
    

    (Note, that if branch with refcount_inc call is executed only when mod_timer returns 0, which means that given timer is not active, so its function is not started yet.)

    So, there is no risk for the timer function to drop reference count for the object to 0, before ip_frag_reinit function get its own reference with refcount_inc.

    Well, in the timer function ip_expire there is branch

      if (READ_ONCE(qp->q.fqdir->dead))
    

    which bypass spinlock acquiring. But I guess that condition is not fulfilled for the object processed with ip_frag_reinit.