pythonscikit-learntensorflow-federated

How can I prepare my image dataset for a federated model?


How could I transform my dataset (composed of images) in a federated dataset? I am trying to create something similar to emnist but for my own dataset.

tff.simulation.datasets.emnist.load_data( only_digits=True, cache_dir=None )


Solution

  • You will need to create the clientData object first

    for example:

    client_data = tff.simulation.datasets.ClientData.from_clients_and_tf_fn(client_ids,
    create_dataset)
    

    where create_dataset is a serializable function but first you have to prepare your images read this tutorial about preprocessing data

    labels_tf = tf.convert_to_tensor(labels) 
    
    def parse_image(filename):
    
    parts = tf.strings.split(filename, os.sep)
    label_str = parts[-2]
    
    label_int = tf.where(labels_tf == label_str)[0][0]
    image = tf.io.read_file(filename)
    image = tf.io.decode_jpeg(image,channels=3) 
    image = tf.image.convert_image_dtype(image, tf.float32)
    image = tf.image.resize(image, [32, 32]) 
    
    return image, label_int
    

    When you prepared your data pass it to the create_dataset function

    def create_dataset(client_id):
    ....
    
    list_ds = tf.data.Dataset.list_files(<path of your dataset>)
    
    images_ds = list_ds.map(parse_image)
        
    return images_ds
    

    after this step, you can make some preprocessing function

    NUM_CLIENTS = 10
    NUM_EPOCHS = 5
    BATCH_SIZE = 20
    SHUFFLE_BUFFER = 100
    PREFETCH_BUFFER = 10
    
    def preprocess(dataset):
    
    
      return dataset.repeat(NUM_EPOCHS).shuffle(SHUFFLE_BUFFER, seed=1).batch(
          BATCH_SIZE).prefetch(PREFETCH_BUFFER)
    

    After this you could make a tf.data.Dataset which will be suitable for federated training.

    def make_federated_data(client_data, client_ids):
      return [
          preprocess(client_data.create_tf_dataset_for_client(x))
          for x in client_ids
      ]
    

    After this your dataset is ready for federated learning!