machine-learningtensorflow-federatedfederated-learning

How to build federated learning model of unbalanced and small dataset


I am working to build a federated learning model using TFF and I have some questions:

  1. I am preparing the dataset, I have separate files of data, with same features and different samples. I would consider each of these files as a single client. How can I maintain this in TFF?

  2. The data is not balanced, meaning, the size of data varies in each file. Is this affecting the modeling process?

  3. The size of the data is a bit small, one file (client) is having 300 records and another is 1500 records, is it suitable to build a federated learning model?

Thanks in advance


Solution

    1. You can create a ClientData for your dataset, see Working with tff's ClientData.
    2. The dataset doesn't have to balanced to build a federated learning model. In https://arxiv.org/abs/1602.05629, the server takes weighted federated averaging of client's model updates, where the weights are the number of samples each client has.
    3. A few hundred records per client is no less than the EMNIST dataset, so that would be fine. About the total number of clients: this tutorial shows FL with 10 clients, you can run the colab with smaller NUM_CLIENTS to see how it works on the example dataset.