This is a sample document with the following points: Pharmaceutical Marketing Building – responsibilities.  Mass. – Aug. 13, 2020 –Â
How to remove the special characters or non ascii unicode chars from content while indexing? I'm using ES 7.x and storm crawler 1.17
Looks like an incorrect detection of charset. You could normalise the content before indexing by writing a custom parse filter and remove the unwanted characters there.