I am hoping to design a single end-to-end CNN to extract three features: segmentation and bifurcation for thing A, and detection for thing B. There will be a common weight trunk and then three branches with their own weights for the three feature types, and then the branches will be merged. I'm hoping to achieve this with a custom stochastic gradient decent function.
I need to combine different datasets of the same subject matter, where each image contains A and B, but the different datasets contain different ground truths. I am hoping to add an extra vector to each image indicating which of the three ground truths are available (eg [0 0 1]). This is so that the common weights w_0 will always update, but the individual branch weights w_t know to ignore an unsuitable image when encountered or even suitable images if enough are not encountered within a batch.
The problem is I'm not sure how to handle this.
I am considering doing this with Theano in Lasagne instead of my original intention of Keras due to the latter's higher level of abstraction. Detection of thing B can also be ignored if it over complicates things.
So, you have two different shapes for the ground truth.
But since they are "truth", they should go in the Y side, never the X.
Assuming that segmentation of A results in a two-dimension (side,side) matrix the same size as the input image, and that the results for bifurcation are one-dimension (2,) arrays, you can do this:
#this is the data you already have
batch = the amount of training images you have
side = the size in pixels of one side of your training images
segTruth = your truth images for segmentation, shaped (batch,side,side)
bifTruth = your truth coordinates for bifurcations, shaped (batch,2)
trainImages = your training images, shaped (batch,side,side)
Now, let's create the main trunk:
from keras.models import Model
from keras.layers import *
inp = Input((side,side))
x = Convolution2D(blablabla)(inp)
x = AnyOtherLayerYouNeed(blablabla)(x)
....
trunkOut = TheLastTrunkLayer(balblabla)(x)
Now, we split the model:
b1 = FirstLayerInBranch1(blablaba)(trunkOut)
b2 = FirstLayerInBranch2(blablabl)(trunkOut)
....
out1 = LastLayerInBranch1(blablabla)(b1)
out2 = LastLayerInBranch2(blablabla)(b2)
And finally, when we define the model, we pass both outputs:
model = Model(inp, [out1,out2])
When compiling, you can define loss = [lossfunction1, lossfunction2]
if you want. Or simply give one loss function that will be the same for both outputs.
And when training, pass the truth values also in a list:
model.fit(trainImages, [segTruth,bifTruth],.....)
As you can see, the results are not merged, and the model has two outputs. There are separate loss functions for each output.
If you do need to merge the outputs, that would be a very complicated task, since they have different shapes. If you need that one loss function be more important, you can pass the loss_weights
argument in the compile
call.
In case you want to train or predict using only one branch, all you have to do is to create a new Model
, without changing any layer:
modelB1 = Model(inp, out1)
modelB2 = Model(inp, out2)
So, suppose you only have "bifTruth" for a certain set of images. Then just use modelB2
for training. It doesn't consider the other branch at all.
Before training, you will have to compile
each model. But their weights will be common for all three models (model, modelB1 and modelB2).
If you want that some part of the model keeps unchanged while training, you can go to each model.layer[i]
and make their .trainable = False
before compiling. (This will not change models that are already compiled).