For example, I get I document that contains 2 sentences: I am a person. He also likes apples. Do we need to count the cooccurrence of "person" and "He" ?
Each document is separated with a line break. Context windows of cooccurrences are limited to each document.
Based on the implementation here.
A newline is taken as indicating a new document (contexts won't cross newline).
So, depending on how you prepare sentences, you may get different results:
Setting 1: ('He', 'person')
cooccurred
...
I am a person. He also likes apples.
...
Setting 2: ('He', 'person')
not cooccurred
...
I am a person.
He also likes apples.
...