configurationlatencytps

What's a good mechanism to store device config information


I have 10k devices which had same configuration attributes but different values. The config file includes the name, type, model, manufacture year, temperature, usage, status. The first 4 values don't change and last 3 values keep changing every few seconds. Each device has a computer connected to it.

Between the following 2 ways of storing the configs, which is better? 1)way 1: put the config information in a json file and store the json file on the computer which is connected to a device; 2)way 2: put the config information in a database table.

The advantage of way 1 is that it has less latency, but hard to maintain the data. Way 2 has more latency but easier to maintain. We can just create an API to get the data from the table. Way 2 also has TPS issue if there are more and more devices. For example, if there are 80k devices, and every device is writing config data to the database table at the same time.

Update: As mentioned by Ciaran McHale, the three variables are dynamic information, so I added the following information to the question:

The 3 variables (temperature, usage, status) are dynamic information and can be kept in memory, but we also want to keep the final values somewhere so that when we reboot the device or application we know those values. So my question was about the good mechanism to keep those final values. (database table vs local json/xml/txt file).


Solution

  • It seems to me that only the first 4 variables (type, model, manufacturer and year) belong in a configuration file. Since the other 3 variables (temperature, usage, status) change every few seconds, they are really dynamic state information that should be held in memory, and you should provide an API so this state information can be examined. The API might be, for example, via a client-server socket connection, or via shared memory. You are probably going to say, "I can't do that because [such-and-such]", so I suggest you update your question with such reasoning. Doing that might help you obtain more useful answers.

    Edit due to extra information provided in updated question...

    What I am about to suggest will work on Linux. I don't know about other operating systems. You can do man shm_open and man mmap to learn about shared memory. A shared-memory segment can survive across process reboots, and be backed to a file (on disk), so it can survive across machine reboots. My (possibly incorrect) understanding is that, most of the time, the file's contents will be cached in kernel buffers and virtual memory will map those kernel buffers into a process's address space, so reading/writing will be a memory-only operation; hence you won't suffer frequent disk I/O.

    For simplicity, I am going to assume that each device needs to store the same sort of dynamic information, and this can be represented in a fixed-length struct, for example:

    struct DeviceData {
      double       temperature;
      SomeEnum     usage;
      AnotherEnum  status;
    };
    

    You can have a shared-memory segment large enough to store an array of, say, 100,000 DeviceData structs. The static configuration file for each device will contain entries such as:

    name="foo";
    type="bar";
    model="widget";
    manufacture_year="2020";
    shared_memory_id="/somename";
    shared_memory_array_index="42";
    

    The last two entries in the static configuration file specify the shared memory segment that the process should connect to, and the array index it should use to update the DeviceData associated with the process.

    If the above seems suitable for your needs, then a challenge to deal with is efficient synchronization for reading/updating a DeviceData in shared memory. A good basic approach is discussed in a blog article called A scalable reader/writer scheme with optimistic retry. That blog article uses C# to illustrate the concept. If you are using C++, then I recommend you read Chapter 5 (The C++ memory model and operations on atomic types) of C++ Concurrency in Action by Anthony Williams. By the way, if you can use padding to ensure that DeviceData (complete with fields for m_version1 and m_version2 used in the blog article) is exactly the same size as one or more cache lines (a cache line is 64 bytes in most CPU architectures) then your implementation won't suffer from false sharing (which can needlessly reduce performance).

    The final step is to avoid exposing the low-level shared-memory operations to developers. So write a simple wrapper API with, say, four operations to: connect() to and disconnect() from the shared memory segment, and readDeviceData() and updateDeviceData().