I'm learning about sentiment analysis and I can't seem to find anything online that outlines how to create a PTB Dataset. I'm using StanfordNLP with Java. I've downloaded the test, dev and validate data that they used and I can't get my head around how these have been outlined:
test.txt:
(3 (2 (2 The) (2 Rock)) (4 (3 (2 is) (4 (2 destined) (2 (2 (2 (2 (2 to) (2 (2 be) (2 (2 the) (2 (2 21st) (2 (2 (2 Century) (2 's)) (2 (3 new) (2 (2 ``) (2 Conan)))))))) (2 '')) (2 and)) (3 (2 that) (3 (2 he) (3 (2 's) (3 (2 going) (3 (2 to) (4 (3 (2 make) (3 (3 (2 a) (3 splash)) (2 (2 even) (3 greater)))) (2 (2 than) (2 (2 (2 (2 (1 (2 Arnold) (2 Schwarzenegger)) (2 ,)) (2 (2 Jean-Claud) (2 (2 Van) (2 Damme)))) (2 or)) (2 (2 Steven) (2 Segal))))))))))))) (2 .)))
I figure that numbers are aligned to sentiment value but I'm still not sure how it works.
TLDR; I'm trying to develop my own model for news analysis and have seen that the StanfordNLP model has been trained on movie reviews which is leading to poor sentiment analysis so, I thought to attempt to develop my own but I can't find anything online that teaches what each element is or how to even do this.
At best; outlined on this page: https://nlp.stanford.edu/sentiment/code.html
Is the dataset available and the code to train.
Models can be retrained using the following command using the PTB format dataset:
java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz
I have the data that I need to parse ready.
Okay.. So I've done some more digging and have started to finally understand (some what) as how to create a Dataset Tree and will try to break it down for anyone who stumbles upon this post with the same troubles as I've been having.
Step 1.
UK renters: are you living with someone you’ve fallen out with?
UK property asking prices stagnating, lifting hopes of softer landing for housing market
Step 2.
2 UK renters: are you living with someone you’ve fallen out with?
1 fallen out with
1 fallen out
2 UK renters
2 living with someone
3 fallen
2 :
2 ?
2 living with
2 someone
3 UK property asking prices stagnating, lifting hopes of softer landing for housing market
2 UK property
3 asking prices stagnating
2 asking prices
4 lifting hopes
2 hopes
4 lifting hopes of softer landing
3 softer landing for housing market
2 housing market
2 lifting
2 landing
2 ,
Annotation Meanings
Very Positive= 4
Positive = 3
Neutral = 2
Negative = 1
Very Negative = 0
Structure
2 UK renters: are you living with someone you’ve fallen out with?
//Overall sentiment
1 fallen out with
// Negative
1 fallen out
// Negative
2 UK renters
// Neutral
...etc..
Step 3:
Locate your stanford-corenlp-4.5.2.jar
~/.m2/repository/edu/stanford/nlp/stanford-corenlp/4.5.2
Step 4:
java -cp "*" -mx5g edu.stanford.nlp.sentiment.BuildBinarizedDataset -input /c/Users/rusku/Desktop/StanfordNPL/rusSample/sample.txt
Step 5:
(2 (2 (2 (2 UK) (2 renters)) (2 :)) (2 (2 (2 (2 are) (2 you)) (2 (2 living) (2 (2 with) (2 (2 someone) (2 (2 you) (2 (2 ▒ve) (1 (1 (3 fallen) (2 out)) (2 with)))))))) (2 ?)))
(3 (3 (2 (3 UK) (3 property)) (2 (3 asking) (3 prices))) (3 (3 (3 stagnating) (3 (2 ,) (4 (2 lifting) (2 hopes)))) (3 (3 of) (3 (3 (3 softer) (2 landing)) (3 (3 for) (2 (3 housing) (3 market)))))))
Resource: Train Stanford CoreNLP about the sentiment of domain-specific phrases
This is as far as I've currently gotten.
Hope this helps.