Anyone have an idea about what could be going on here?
The first block shows what I would generally expect to see - the first character of a string is in index '0', with the 'problem' string commented out, replaced by the exact same thing, however never run before.
public void finderTest(){
String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
//String wordOne = "abc"; // old, pre-used string, used to hold a comma.
String wordOne = "abc";// new, never run before with a comma
String wordTwo = "and";
System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
System.out.println();
System.out.println("All of wordOne: "+"'"+wordOne+"'");
System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
System.out.println();
System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}
Which gives output:
/*
Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H
All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay
Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/
The second block has the 'new' string commented out, and the first character of 'wordOne' is nothing. It isn't a null character, or newline. I had been using that variable to find commas in 'theDoc'… but when I ran it, index '0' held nothing, and index 1 had the comma in it. If i copy and paste the string, the problem remains. However, commenting it out / deleting it, gets rid of the issue.
public void finderTest(){
String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
String wordOne = "abc"; // now running old string, used to hold comma
//String wordOne = "abc";
String wordTwo = "and";
System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
System.out.println();
System.out.println("All of wordOne: "+"'"+wordOne+"'");
System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
System.out.println();
System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}
Which gives output:
/*
Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H
All of wordOne: 'abc'
Type of character at index '0' in wordOne: 16 // What does this mean?
Character at index '0' in wordOne: // where is the a? (well, its in wordOne index '1'... but why??)
Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/
Is there something about commas or symbols in java that would cause an issue like this? I tried using character arrays, cleaning the workspace to re-build everything, and nothing has changed this… Which is a huge problem for finding indices of 'ngrams' within sentences, when some grams are things like ", and". At one point last night, it was working, and then all of a sudden started not working. I'm quite confused.
Any ideas?
Character type 16 corresponds to Unicode DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING (U+202B). It's an unprintable character; you can print it's hex value to confirm.