python machine-learning tensorflow neural-network google-cloud-ml-engine

Export a model with tf Estimator and export_savedmodel function

I'm doing a Deep Neural Network regressor with Tensorflow based on this tuorial. When I'm trying to save the model with tf.estimator export_savemodel I get the following error:

 raise ValueError('Feature {} is not in features dictionary.'.format(key))
 ValueError: Feature ad_provider is not in features dictionary.

I need to export it in order to deploy a model to support prediction in Google Cloud Platform.

Here is my where I define the columns :

CSV_COLUMNS = [
"ad_provider", "device", "split_group","gold", "secret_areas",
 "scored_enemies", "tutorial_sec", "video_success"
]

FEATURES = ["ad_provider", "device", "split_group","gold", "secret_areas",
 "scored_enemies", "tutorial_sec"]

LABEL = "video_success"

ad_provider = tf.feature_column.categorical_column_with_vocabulary_list(
    "ad_provider", ["Organic","Apple Search Ads","googleadwords_int",
"Facebook Ads","website"]  )

split_group = tf.feature_column.categorical_column_with_vocabulary_list(
    "split_group", [1,2,3,4])

device = tf.feature_column.categorical_column_with_hash_bucket(
    "device", hash_bucket_size=100)


secret_areas = tf.feature_column.numeric_column("secret_areas")
gold = tf.feature_column.numeric_column("gold")
scored_enemies = tf.feature_column.numeric_column("scored_enemies")
finish_tutorial_sec = tf.feature_column.numeric_column("tutorial_sec")
video_success = tf.feature_column.numeric_column("video_success")


feature_columns = [
tf.feature_column.indicator_column(ad_provider),
tf.feature_column.embedding_column(device, dimension=8),
tf.feature_column.indicator_column(split_group),
tf.feature_column.numeric_column(key="gold"),
tf.feature_column.numeric_column(key="secret_areas"),
tf.feature_column.numeric_column(key="scored_enemies"),
tf.feature_column.numeric_column(key="tutorial_sec"),
]

After, I create a function to export my model in JSON dictionaries. I'm not sure If I'm doing well the serving function.

def json_serving_input_fn():
  """Build the serving inputs."""
  inputs = {}
  for feat in feature_columns:
    inputs[feat.name] = tf.placeholder(shape=[None], dtype= feat.dtype if     
hasattr(feat, 'dtype') else tf.string)

features = {
  key: tf.expand_dims(tensor, -1)
  for key, tensor in inputs.items()
}
  return tf.contrib.learn.InputFnOps(features, None, inputs)

Here is the rest of my code:

def main(unused_argv):

  #Normalize columns 'Gold' and 'tutorial_sec' for Traininig Set
  train_n = training_set
  train_n['gold'] = (train_n['gold'] - train_n['gold'].mean()) / (train_n['gold'].max() - train_n['gold'].min())
  train_n['tutorial_sec'] = (train_n['tutorial_sec'] - train_n['tutorial_sec'].mean()) / (train_n['tutorial_sec'].max() - train_n['tutorial_sec'].min())
  train_n['scored_enemies'] = (train_n['scored_enemies'] - train_n['scored_enemies'].mean()) / (train_n['scored_enemies'].max() - train_n['scored_enemies'].min())

  test_n = test_set
  test_n['gold'] = (test_n['gold'] - test_n['gold'].mean()) / (test_n['gold'].max() - test_n['gold'].min())
  test_n['tutorial_sec'] = (test_n['tutorial_sec'] - test_n['tutorial_sec'].mean()) / (test_n['tutorial_sec'].max() - test_n['tutorial_sec'].min())
  test_n['scored_enemies'] = (test_n['scored_enemies'] - test_n['scored_enemies'].mean()) / (test_n['scored_enemies'].max() - test_n['scored_enemies'].min())

  train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=train_n,
    y=pd.Series(train_n[LABEL].values),
    batch_size=100,
    num_epochs=None,
    shuffle=True)

  test_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=test_n,
    y=pd.Series(test_n[LABEL].values),
    batch_size=100,
    num_epochs=1,      
   shuffle=False)


  regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns,
                                      hidden_units=[40, 30, 20],
                                      model_dir="model1",
                                      optimizer='RMSProp'
                                      )


  # Train

  regressor.train(input_fn=train_input_fn, steps=5)

  regressor.export_savedmodel("test",json_serving_input_fn)

  #Evaluate loss over one epoch of test_set.
  #For each step, calls `input_fn`, which returns one batch of data.
  ev = regressor.evaluate(
    input_fn=test_input_fn)
  loss_score = ev["loss"]
  print("Loss: {0:f}".format(loss_score))
  for key in sorted(ev):
      print("%s: %s" % (key, ev[key]))


  # Print out predictions over a slice of prediction_set.
  y = regressor.predict(
    input_fn=test_input_fn)
  # Array with prediction list!
  predictions = list(p["predictions"] for p in y)

  #real = list(p["real"] for p in pd.Series(training_set[LABEL].values))
  real = test_set[LABEL].values
  diff = np.subtract(real,predictions)

  diff = np.absolute(diff)
  diff = np.mean(diff)
  print("Mean Square Error of Test Set = ",diff*diff)

Solution

Besides the issue you mentioned, there are actual multiple additional issues I foresee you running into:

You are using tf.estimator.DnnRegressor which was introduced in TensorFlow 1.3. CloudML Engine only officially supports TF 1.2.
You are normalizing the features in the panda dataframe, and that won't happen at serving time (unless you do it client side). This introduces skew and you'll get poor prediction results.

So let's start by using tf.contrib.learn.DNNRegressor, which only requires minor changes:

regressor = tf.estimator.DNNRegressor(
    feature_columns=feature_columns,
    hidden_units=[40, 30, 20],
    model_dir="model1",
    optimizer='RMSProp'
)
regressor.fit(input_fn=train_input_fn, steps=5)
regressor.export_savedmodel("test",json_serving_input_fn)

Note the fit instead of train.

(NB: your json_serving_inputfn is actually already written for TF 1.2 and is incompatible with TF 1.3. Which is good for now).

Now, the root cause of the error that you are seeing is that the column/features ad_provider is not in the list of inputs and features (but you do have ad_provider_indicator). This is because you are iterating through feature_columns and not through the original input column list. The way to address that is by iterating over the actual inputs instead of the feature columns; however, we'll need to know the types, too (simplified with just a few columns):

CSV_COLUMNS = ["ad_provider", "gold", "video_success"] 
FEATURES = ["ad_provider", "gold"] 
TYPES = [tf.string, tf.float32] 
LABEL = "video_success" 

def json_serving_input_fn(): 
  """Build the serving inputs.""" 
  inputs = {} 
  for feat, dtype in zip(FEATURES, TYPES): 
    inputs[feat] = tf.placeholder(shape=[None], dtype=dtype) 

  features = {
    key: tf.expand_dims(tensor, -1)
    for key, tensor in inputs.items()
  }
  return tf.contrib.learn.InputFnOps(features, None, inputs)

Finally, to normalize your data, you'll probably want to do that in the graph. You could try using tf.transform, or, alternatively, write a custom estimator that does the transformation, delegating the actual model implementation DNNRegressor.