nlpstemminglinguisticsporter-stemmersnowball

Snowball Stemming: defining Null Region


I'm trying to understand the snowball stemming algorithmus. HW90 has had a similar question with examples, but not mine. The algorithmus is using two regions R1 and R2 that are definied as follows:

R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.

R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.

http://snowball.tartarus.org/texts/r1r2.html

I don't understand, what "the null region at the end of the word" is. Could anybody give me some examples for that, please?


Solution

  • Null region means empty region, no letters. You missed the examples in the documentation page:

    Below, R1 and R2 are shown for a number of English words,

    b   e   a   u   t   i   f   u   l
                      |<------------->|    R1
                              |<----->|    R2
    

    Letter t is the first non-vowel following a vowel in beautiful, so R1 is iful. In iful, the letter f is the first non-vowel following a vowel, so R2 is ul.

    b   e   a   u   t   y
                      |<->|    R1
                        ->|<-  R2 
    

    In beauty, the last letter y is classed as a vowel. Again, letter t is the first non-vowel following a vowel, so R1 is just the last letter, y. R1 contains no non-vowel, so R2 is the null region at the end of the word.

    b   e   a   u
                ->|<-  R1
                ->|<-  R2