I have two CSV files: "userfeatures" and "itemfeatures". Each line in the userfeature is related to specific user. e.g., the first line in the userfeature file is:
005c2e08","Action","nm0000148","dir_ nm0764316","USA"
I need to find the intersection of this line with every line of the 2nd file "itemfeatures". (Actually , I need to repeat this procedure for all the users, i.e, for all lines of "userfeatures").
So, the first comparison will be with the first line of "itemfeatures" that is:
"tt0306047","Comedy,Action","nm0267506,nm0000221,nm0356021","dir_ nm0001878","USA"
The result of intersection should be ["Action", "USA]"
but unfortunately, my code only finds ["USA"] as a match. Here is what I've tried so far:
public class Main {
public static void main(String[] args) throws Exception {
BufferedReader userfeatures = new BufferedReader(new FileReader("userFeatureVectorsTest.csv"));
BufferedReader itemfeatures = new BufferedReader(new FileReader("ItemFeatureVectorsTest.csv"));
ArrayList<String> userlines = new ArrayList<>();
ArrayList<String> itemlines = new ArrayList<>();
String Uline = null;
while ((Uline = userfeatures.readLine()) != null) {
for (String Iline = itemfeatures.readLine(); Iline != null; Iline = itemfeatures.readLine()) {
System.out.println(Uline);
System.out.println(Iline);
System.out.println(intersect(Uline, Iline));
System.out.println(union(Uline, Iline));
}
}
userfeatures.close();
itemfeatures.close();
}
static Set<String> intersect(String Uline, String Iline) {
Set<String> result = new HashSet<String>(Arrays.asList(Uline.split(",")));
Set<String> IlineSet = new HashSet<String>(Arrays.asList(Iline.split(",")));
result.retainAll(IlineSet);
return result;
}
static Set<String> union(String Uline, String Iline) {
Set<String> result = new HashSet<String>(Arrays.asList(Uline.split(",")));
Set<String> IlineSet = new HashSet<String>(Arrays.asList(Iline.split(",")));
result.addAll(IlineSet);
return result;
}
}
I think the problem is related to Uline.split(",")
and Iline.split(",")
because they consider "Comedy,Action"
as 1 word and so it cannot find [Action]
as intersection of "Comedy,Action"
and "Action"
.
I appreciate it if someone has any idea how to fix this issue.
Try removing the double quotes in both strings .
Because when you split
"tt0306047","Comedy,Action","nm0267506,nm0000221,nm0356021","dir_ nm0001878","USA"
You will get an
Action"
token , which will never match the
"Action"
token.