The problem :
For nearly all test case the output probability is near 0.95.. no output was under 0.9 !
Even for nearly impossible results, it gave that high prob.
PS:I think this is because I taught it happened cases only, but not un-happened ones.. But I can not at each step in the episode teach it the output=0.0 for every un-happened action!
Any suggestions how to overcome this problem? Or may be another way to use NN or to implement prob function?
The problem is that the sum over all possible following states has to equal 1. If you construct your network like that, that is not guaranteed. Two possible alternatives come to my mind, where I assume discrete states.
These two are actually roughly equivalent from a mathematical perspective.
In the case of continuous variables, you will have to assume distributions (e.g. a multivariate Gaussian) and use the parameters of that distribution (e.g. mean and covariance stdev) as outputs.