regexkotlinregex-negation

Regex (Kotlin) to match end of sentence periods only and ignore periods in the middle such as abbreviations


I need a regex to find all sentence-ending periods and ignore middle of the sentence periods, such as in abbreviations. Note: I understand that there are many other variations, and it may not be possible to account for all of them, so the focus of the question would be : can at least the below sample be solved with a regex?

Suppose I have this text. The regex rule below finds any period matches followed by a white space. But it also matches p.m. and U.S. - how can I ignore periods in a word that a) consists of characters all separated by a period? (such as U.S.) and b) a period preceded by one characters only (such as J.). This is in Kotlin.

        val text = "At 12.51 p.m. local time, J. Knapp, former U.S. Navy,  went out for a walk. Yes he did. And then a Mw6.3 earthquake happened."
        val regexRule = "\\.\\s+"
        val splitText = text.split(regexRule.toRegex())
        val result = splitText.joinToString( separator = ".\n\n")

Current result with just that rule:

At 12.51 p.m.

local time, J.

Knapp, former U.S.

Navy, went out for a walk.

Yes he did.

And then a Mw6.3 earthquake happened.


Solution

  • You can use

    val regexRule = "(?<!\\b\\p{L})\\.(?<!\\d.(?=\\d))(?!\\s*\$)\\s*"
    

    See the regex demo.

    Details: