I have two different neural network architectures. Both of them are for image segmentation. I run single input through both of them and got two sigmoid outputs (x and y). I want to combine them to get the best possible result, but I am unsure how.
My current idea is: I have threshold 0.5.
x < 0.5 && y < 0.5
-> pick min(x, y)
x > 0.5 && y > 0.5
-> pick max(x, y)
x < 0.5 && y > 0.5
-> calculate how "far" x is from 0 and y from 1 and pick the value with a smaller "error" (eg. x = 0.3 and y = 0.8 => error of y is only 0.2 => pick y=0.8)x > 0.5 && y < 0.5
-> calculate how "far" x is from 1 and y from 0 and pick the value with a smaller "error"Is this a valid approach or is there a better way? Can this logic be somehow "translated" into a math function instead of if-else branches?
Unless extra information/data is given the only really valid approach is to average predictions, unfortunately even this is not really a problem with one solution but typically there are 2 paths:
c(x,y) = exp([log(x) + log(y)]/2)
c(x,y) = (x+y)/2
Both have some desired properties and not the others. In the neural network community "2." is a typical path that corresponds to just treating your collection of models as "a naive ensemble".
Now if you had some validation data to make model selection on you could train the mixture coefficient, e.g. by maximising your performance of
c(x,y|a) = a x + (1-a) y
for different values of a.
Now lets analyse your heuristic: Rephrasing your rules you are just saying: if both model agree, trust them, otherwise take the one that is "more certain". Which is a convoluted way of just expressing "2.", quick proof:
.1) x<0.5, y<0.5 => (x+y)/2<0.5 thus it is classified as negative (and so does your rule)
.2) x>0.5, y>0.5 => (x+y)/2>0.5 thus it is classified as positive(and so does your rule)
.3a) x<0.5, y>0.5; and x is "closer to 0 than y is to one", which means (x+y)/2 = (0.5-a + 0.5+b)/2 where a>b and thus (x+y)/2 < 0.5 thus it is classified as negative (and so does your rule)
.3b) x<0.5, y>0.5; and y is "closer to 0 than x is to one", which means (x+y)/2 = (0.5-a + 0.5+b)/2 where a<b and thus (x+y)/2 > 0.5 thus it is classified as positive(and so does your rule) 4a and 4b are analogous.
TLDR; just average probabilities and threshold the average.