tensorflowtensorflow2.0loss-function# How do I handle a custom loss function with (1/(1-exp(-x))-1/x) in it?

I am working on a deep learning model with a ragged tensor where the custom loss function is related to:

f(x)+f(x+50)

and f(x)=1/(1-exp(-x))-1/x when x!=0, f(x)=0.5 when x=0.

f(x) is ranged between 0 and 1 and it is continuous and differentiable for all x. Below is the graph of f(x)

I first tried to implement this function as `tf.where(tf.abs(x)<0.1, 0.5+x/12, 1/(1-exp(-x))-1/x)`

as the gradient at x=0 is 1/12. But problem was the loss became nan after some fitting like below:

```
Epoch: 0 train_loss: 0.072233 val_loss: 0.052703
Epoch: 10 train_loss: 0.008087 val_loss: 0.041443
Epoch: 20 train_loss: 0.005942 val_loss: 0.029767
Epoch: 30 train_loss: 0.005200 val_loss: 0.026407
Epoch: 40 train_loss: nan val_loss: nan
Epoch: 50 train_loss: nan val_loss: nan
```

I tried to solve this problem but all of it failed.

- I made the code separately calculate f(x) when x<-10 and x>10, too, which is:

```
tf.where(tf.abs(x)<0.1, 0.5+x/12,
tf.where(x<-10., -1/x,
tf.where(x>10., 1-1/x, 1/(1-tf.exp(-x))-1/x)))
```

but it gave the same result.

Lowering the learning rate and changing the optimizer gave the same result and started giving nan at a similar training loss to the one above.

I set the default float as float64 by

`tf.keras.backend.set_floatx('float64')`

. It managed to train the model further, but again, it started to give the same result at a lower training loss:

```
Epoch: 0 train_loss: 0.043096 val_loss: 0.050407
Epoch: 10 train_loss: 0.006179 val_loss: 0.034259
Epoch: 20 train_loss: 0.005841 val_loss: 0.034110
...
Epoch: 210 train_loss: 0.003594 val_loss: 0.026524
Epoch: 220 train_loss: nan val_loss: nan
Epoch: 230 train_loss: nan val_loss: nan
```

**Replacing f(x) with the sigmoid function solved the problem.**But I really want to use f(x) because it is really meaningful to what I am doing.

I guess some inf/inf, 0/0, or inf-inf occurred while calculating the gradient, but I am not that expert and could not get a more detailed clue. I would be really grateful if you know how to solve this or if you know what I need to look to solve the problem.

Solution

This is mostly a problem of catastrophic numerical cancellation than anything else - you can't just compute something in IEEE754 numbers using the algebraic form and expect it to work for very small or large numbers.

Your definition of f(x) is:

```
f(x)=1/(1-exp(-x))-1/x when x!=0, f(x)=0.5 when x=0.
```

In many languages there is a provided function expxm1 which computes "exp(x)-1" to full machine precision (dating back to the hardware implementation of x87 numeric coprocessor and possibly before that). That may be enough to solve your immediate problems to avoid division by zero for small values of x (<1e-7 for floats, <2e-16 for doubles).

It seems Tensorflow has such an expxm1 and it's purpose to maintain precision is explained in the help.

But you can probably do even better by multiplying through and computing it over the common denominator to get slightly better accuracy.

```
f(x) = (x - (1-exp(-x))/(x*(1-exp(x)) x !=0
```

evaluated as

```
f(x) = -(x + expxm1(-x))/(x*expxm1(-x))
```

It will still lose accuracy and return zero for some very small values of x but it should no longer generate Nans when it reaches its precision limit.

If you really need it to work continuously for any x no matter how small then the fixup is to return the first term of the numerator that doesn't actually cancel out `x^2/2`

when input `x`

is tiny.

```
f(x) = -x^2/(2*x*expm1(-x)) for x << 1e-16
```

evaluated as

```
f(x) = -x/(2*expxm1(-x)) x != 0
```

This sort of problem is common in numerical calculations where two nearly equal components get subtracted. There are various classical rearrangement tricks to circumvent catastrophic cancelation.

**EDIT.** Inspired by Martin's answer, I first implemented the function as below:

```
-(x + tf.math.expm1(-x))/(x * tf.math.expm1(-x))
```

Here, I used `tf.math.expm1`

in the main tensorflow package.

But when abs(x)<0.1, this function of float32 started to deviate from the true value. So I used `tf.where`

to solve this:

```
tf.where(tf.abs(x)<0.1, 0.5+x/12, -(x + tf.math.expm1(-x))/(x * tf.math.expm1(-x)))
```

It really helped with the nan problem, but this could not handle the case where sometimes x exactly equaled to 0. Luckily, there was a function for that, `tf.math.xdivy`

that makes the function calculate 0/0=0, so I added it.

```
tf.where(tf.abs(x)<0.1, 0.5+x/12,
-tf.math.xdivy(x + tf.math.expm1(-x),x * tf.math.expm1(-x)))
```

But again, it was found that this function could not handle the case where x<-40 because the function became -inf/inf. To this end, I managed to find another expression for the exponential term: `1/(1-exp(-x))=0.5coth(0.5x)+0.5`

. And I could come up with the final expression that can handle all cases:

```
tf.where(tf.abs(x)<0.1, 0.5+x/12,
0.5+tf.math.xdivy(x/2-tf.math.tanh(x/2), x*tf.math.tanh(x/2)))
```

- How to clean images to use with a MNIST trained model?
- ModelCheckpoint not saving the hdf5 file
- List operations of a TFLite model
- Can't load data from tensorflow data set
- Dimension problems: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (26, 26, 1)
- WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor
- Input image is not compatible with tensorflow model input shape
- Unbalanced client size in federated learning
- How should I determine the input size for layers following the initial layer in a CNN?
- Finding most similar sentences among all in python
- ValueError: Out of range float values are not JSON compliant error on Heroku, and WSL but not on Windows
- Tensorflow predict() timeseries alignment in Python
- Image processing in Tensor flow TFX pipelines
- Tensorflow errors: cuFFT, cuDNN, cuBLAS and "Assertion '__n < this->size()' failed"
- Tensorflow Docker Not Using GPU
- Why is numpy native on M1 Max greatly slower than on old Intel i5?
- keras (.h5) model to tensorflow-lite (.tflite) model conversion
- Behavior of Model Saving When EarlyStopping Is Not Triggered in Keras
- How to access the weights of a layer in pretrained efficientnet-b3 in torch?
- Is it possible to use face_landmark.tflite from Mediapipe to generate face mesh in Android independently?
- How Can I Increase My CNN Model's Accuracy
- The analogue of torch.autograd in TensorFlow
- Error : Failed to create temp directory "C:\Users\user\AppData\Local\Temp\conda-<RANDOM>\"
- Why do I run out of memory when training with a large dataset, but have no problems with a small dataset?
- How to Calculate R^2 in Tensorflow
- How to Build a Two-Branch Keras Model with Dense and Conv2D Layers?
- InvalidVersionSpecError: Invalid version spec: =2.7
- Do I need to define metrics in model.compile in order to use them later?
- ImportError: cannot import name 'model_from_config' from 'tensorflow.keras.models' for Mac M1
- Adding an attention block in deep neural network issue for regression problem