I have two dummy image dataset with three elements in the first and 6 elements in the second dataset.
like 1st dataset images name = [1.png, 2.png, 3.png]
2nd dataset images name = [1_1.png, 1_2.png, 2_1.png, 2_2.png, 3_1.png, 3_2.png]
I'm try to figure out, how to make a zip of these datasets in such a way to map these two datasets that [1.png has to map with 1_1.png and 1_2.png], and [2.png has to map with 2_1.png and 2_2.png] and so on. Is this possible? Here is the code I was trying to implement but I really don't know how to do this.
import os
import tensorflow as tf
X=tf.data.Dataset.list_files('D:/test/clear/*.png',shuffle=False)
Y=tf.data.Dataset.list_files('D:/test/haze/*.png',shuffle=False)
paired=tf.data.Dataset.zip((X,Y))
for x in paired:
print(x)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_1.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_2.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_1.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_2.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\2_1.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\2_2.png'>)
(This is my first ever answer written on StackOverflow, so I hope that it will be clear (enough) and without too many formatting errors.)
The easiest way I can think of right now is by duplicating the file names of X.
These are the dummy filepath lists I used:
files_x = ["D:\\test\\clear\\1.png", "D:\\test\\clear\\2.png", "D:\\test\\clear\\3.png"]
files_y = ["D:\\test\\haze\\1_1.png", "D:\\test\\haze\\1_2.png", "D:\\test\\haze\\2_1.png", "D:\\test\\haze\\2_2.png", "D:\\test\\haze\\3_1.png", "D:\\test\\haze\\3_2.png"]
First, you create a dataset based on the list of file paths.
ds_files_x_dup = tf.data.Dataset.from_tensor_slices(files_x)
Then you can repeat the elements by applying tf.repeat to each element via the map function. This, however, leads to the repeated elements being grouped as one sample. To get a dataset with one element per sample you then have to apply flat_map on the dataset.
ds_files_x_dup = ds_files_x_dup.map(lambda x: tf.repeat(x,2))
ds_files_x_dup = ds_files_x_dup.flat_map(lambda x: tf.data.Dataset.from_tensor_slices(x))
Now you just have to create the dataset based on files_y:
ds_files_y = tf.data.Dataset.from_tensor_slices(files_y)
And zip the two together:
paired = tf.data.Dataset.zip((ds_files_x_dup, ds_files_y))
The elements of paired are then:
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_1.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_2.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\2_1.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\2_2.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\3.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\3_1.png'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\3.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\3_2.png'>)