c++multithreadingcachingthread-safetylru

Is this a truly Thread-Safe LRU Cache Design with Minimal Blocking?


Below is pseudocode for a LRU cache design with minimal blocking that I've been working on.

The cache has the following characteristics:

  1. It has a fixed size MAX_CACHE_SIZE
  2. It is LRU

I want the following conditions to be true:

  1. If a cache miss occurs, no blocking will occur while the data becomes available (downloaded in this case)
  2. The data is not retrieved twice following two concurrent cache misses

Below is the psudocode function getData(name) that returns data from the cache given the name of the data. Does this design seem safe? If not, how can it be improved?


storage : dequeue<data> // container for the cache data
lookupmap : map<string, dequeue::iterator> // lookup map with name of data and position in storage dequeue

data getData(name) {
    lock()
    item_iterator : dequeue::iterator
    item_iterator = lookupmap[name]

    // if item name is present in the lookup map 
    if(item_iterator != null) {
       
       // unlocks and waits while item is being downloaded. Still a cache miss, 
       // but not this threads responsibility to get the data therefore wait for it to be downloaded
       cond_var.wait({return item_iterator.item.isDownloaded}) 
       
       // this removes the item from the queue and adds it to the front, essentially "bumping" it
       item : data
       item = item_iterator.item
       storage.remove(item_iterator)
       storage.push_front(item)
       lookupmap[name] = storage.begin()
       unlock()
       
       return item
    } else {
       // registers that item name but initialises it as null to indicate not downloaded yet
       lookupmap[name] = null
       unlock() // unlocks so that other threads can proceed while peforming download
       item : data
       item = downloadItem(name)
       lock() // locks again once downloaded
    
       // removes last item in cache if storage is full (LRU)
       if(storage.size() == MAX_CACHE_SIZE) {
           tempitem : data
           tempitem = storage.back()
           storage.pop_back()
           lookupmap.remove(tempItem.name)
       }

       // registers new item in cache
       storage.push_front(item)
       lookupMap[dataname] = storage.begin()
       
       unlock()
       cv.notify_all() // notifies any waiting threads that a item has now been downloaded
       return item

    }

}

Edit:

Not sure if this is a concern but just to clarify - there is one global mutex and one global condition variable for this design.


Solution

  • The pseudocode does not describe a thread-safe algorithm.

    Consider that multiple threads may be waiting for a particular item to be fully downloaded. Each of them would be holding (in stack memory) the same item_iterator. Download finishes. When the first waiting thread wakes up, to maintain the LRU, it will invalidate iterator still being held by the others.

    (Another problem I thought of is OK as long as the maximum number of participating threads is less than MAX_CACHE_SIZE.)

    Is it sufficient fix to lookup fresh item_iterator after waking up? Maybe, but it is concerning that the main data structure being shared, the storage itself, is frequently mutated while threads are active in various states. It is not a robust design.

    I would consider keeping the storage stable and evaluating the LRU a different way.