flutterdart

Dart: Is there a way to split strings into sentences without using Dart's split method?


I'm looking to split a paragraph of text into individual sentences using Dart. The problem I am having is that sentences can end in a number of punctuation marks (e.g. '.', '!', '?') and in some cases (such as the Japanese language), sentences can end in unique symbols (e.g. '。').

Additionally, Dart's split method removes the split value from the string. For example, 'Hello World!" becomes "Hello World" when using the code text.split('! ');

I've looked around at Dart packages available but I'm unable to find anything that does what I'm looking for.

Ideally, I'm looking for something similar to BreakIterator in Java which allows the programmer to define which locale they wish to use when detecting punctuation and also maintains the punctuation mark when splitting the string into sentences. I'm happy to use a solution in Dart that doesn't automatically detect sentence endings based on Locale but if this isn't available I would like to have the ability to define all sentence endings to look for when splitting a string.

Any help is appreciated. Thank you in advance.


Solution

  • it can be done using regex, something like this:

      String str1 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. In vulputate odio eros, sit amet ultrices ipsum auctor sed. Mauris in faucibus elit. Nulla quam orci? ultrices a leo a, feugiat pharetra ex. Nunc et ipsum lorem. Integer quis congue nisi! In et sem eget leo ullamcorper consectetur dignissim vitae massa。Nam quis erat ac tellus laoreet posuere. Vivamus eget sapien eget neque euismod mollis.";
    
      // regular expression:
      RegExp re = new RegExp(r"(\w|\s|,|')+[。.?!]*\s*");
    
      // get all the matches:
      Iterable matches = re.allMatches(str1);
    
      //  Iterate all matches:
      for (Match m in matches) {
        String match = m.group(0);
        print("match: $match");
      }
    

    output:

    // match: Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
    // match: In vulputate odio eros, sit amet ultrices ipsum auctor sed. 
    // match: Mauris in faucibus elit. 
    // match: Nulla quam orci? 
    // match: ultrices a leo a, feugiat pharetra ex. 
    // match: Nunc et ipsum lorem. 
    // match: Integer quis congue nisi! 
    // match: In et sem eget leo ullamcorper consectetur dignissim vitae massa。
    // match: Nam quis erat ac tellus laoreet posuere. 
    // match: Vivamus eget sapien eget neque euismod mollis.