I am using feature-column dataset in my code, in newer version of TensorFlow 2.16.1 and later there is no keras.layers.DenseFeatures class in order to ready the input layer for the DNN. what is the alternative for that? As I am using python 3.11.7 I couldn't install TensorFlow 2.15 or earlier.
INPUT_COLS = [
"pickup_longitude",
"pickup_latitude",
"dropoff_longitude",
"dropoff_latitude",
"passenger_count",
]
feature_columns = {colname: tf.feature_column.numeric_column(colname) for colname in INPUT_COLS}
# Build a keras DNN model using Sequential API
model = Sequential(
[
keras.layers.DenseFeatures(feature_columns=feature_columns.values()),
keras.layers.Dense(units=32, activation="relu", name="h1"),
keras.layers.Dense(units=8, activation="relu", name="h2"),
keras.layers.Dense(units=1, activation="linear", name="output"),
]
)
I assume you were doing this lab from Google cloud . I have finished the training and predict parts with Tensorflow 2.16.1 as well.
First of all according to this migration guide DenseFeatures is deprecated in favour of preprocessing layer when we transit from Tensorflow 2.15 to Tensorflow 2.16.1
During the lab I also received the warning which looked similar to:
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/3124623333.py:2: numeric_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0., -1., 1.]], dtype=float32)>
It clearly said that we should either use preprocessing layer or FeatureSpace utility. I tried both and can give you the solutions as following
First define the constants for batch size, number of epochs, number of examples and prepare your dataset. I do this step before creating the model which is different from what the lab was designed without changing the result. The reason is the normalizer needs the dataset as argument in the next step
TRAIN_BATCH_SIZE = 1000
NUM_TRAIN_EXAMPLES = 10000 * 5 # training dataset will repeat, wrap around
NUM_EVALS = 50 # how many times to evaluate
NUM_EVAL_EXAMPLES = 10000 # enough to get a reasonable sample
trainds = create_dataset(
pattern="../data/taxi-train.csv", batch_size=TRAIN_BATCH_SIZE, mode="train"
)
evalds = create_dataset(
pattern="../data/taxi-valid.csv", batch_size=TRAIN_BATCH_SIZE, mode="eval"
).take(NUM_EVAL_EXAMPLES // TRAIN_BATCH_SIZE)
Then define a normalizer as a method:
def get_normalization_layer(name, dataset):
# Create a Normalization layer for our feature.
normalizer = tf.keras.layers.Normalization(axis=None)
# Prepare a Dataset that only yields our feature.
feature_ds = dataset.map(lambda x, y: x[name])
# Learn the statistics of the data.
normalizer.adapt(feature_ds)
return normalizer
Then build your model with functional API as follows:
INPUT_COLS = [
"pickup_longitude",
"pickup_latitude",
"dropoff_longitude",
"dropoff_latitude",
"passenger_count",
]
inputs = [tf.keras.Input(shape = (1,), name = col) for col in INPUT_COLS] # Here you need a KerasTensor with named elements
encoded_features = [get_normalization_layer(col, trainds)(inputs[idx]) for (idx, col) in enumerate(INPUT_COLS)]
x = tf.keras.layers.Dense(32, activation = "relu")(tf.keras.layers.concatenate(encoded_features)) # Here you need a KerasTensor with unnamed elements which are preprocessed KerasTensor referenced to inputs
x = tf.keras.layers.Dense(8, "relu")(x)
outputs = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs, outputs)
Compile your model
model.compile(optimizer = "rmsprop", loss = "mse", metrics = ["mse", "root_mean_squared_error"])
Finally do the training part as required in the lab
%time
steps_per_epoch = NUM_TRAIN_EXAMPLES // NUM_EVALS
LOGDIR = "./taxi_trained"
history = model.fit(x = trainds, batch_size = TRAIN_BATCH_SIZE, epochs = NUM_EVALS, validation_data = evalds, steps_per_epoch=steps_per_epoch)
INPUT_COLS = [
"pickup_longitude",
"pickup_latitude",
"dropoff_longitude",
"dropoff_latitude",
"passenger_count",
]
# Create input layer of feature columns
feature_columns = [tf.feature_column.numeric_column(col) for col in INPUT_COLS]
from tensorflow.keras.utils import FeatureSpace
feature_space = tf.keras.utils.FeatureSpace(features = {col: FeatureSpace.float_normalized() for col in INPUT_COLS}, output_mode = "concat")
# Here the normalization is done when we adapt the features to sample data
feature_space.adapt(tf.data.Dataset.from_tensor_slices({
'dropoff_latitude': [40.751293, 40.75003 ],
'dropoff_longitude': [-73.99051 , -73.974396],
'passenger_count': [1., 1.],
'pickup_latitude': [40.7661 , 40.753353],
'pickup_longitude': [-73.97977, -73.98125]
}))
Then we can build our model with Sequential
model = tf.keras.Sequential(
[
feature_space,
tf.keras.layers.Dense(32, "relu"),
tf.keras.layers.Dense(8, "relu"),
tf.keras.layers.Dense(1),
]
)
The model compilation, dataset preparation and training can be repeated as mentioned above.
After trying both approaches, I figured out I might not understand how to use FeatureSpace.adapt()
thoroughly which leads to the fact that the first approach performs much better (the model gives tremendously better metrics). Maybe someone can clear this for me as well :)
I hope that help!