javacollectionsjava-streammax

Group, Combine and Dedup elements in a Java collection by maximum


I have the following classes:

public class Student {  
    String studentId;
    List<Subject> subjects;
    //+Getters, setters, constructors
}

public class Subject {  
    int subjectId;
    String grade;   
    int marks;
    //+Getters, setters, constructors
}

Example input:

Student S1 subjects=[1, A, 90], [2, B, 80], [2, C, 70]
Student S1 subjects=[2, A+, 95], [3,C,70]
Student S2 subjects=[1, D, 50]
Student S2 subjects=[1, D, 50]

Example output:

Student S1 subjects=[1, A, 90], [2, A+, 95], [3,C,70]  //Subject 2 is selected based on highest marks
Student S2 subjects=[1, D, 50] //Avoided duplicate for same mark

I want to implement a consolidation and de-dup function. The function should return one student ID only once.

Each student ID should contain each subject ID that appeared inside that student only once. The subject ID should be selected based on highest marks.

public List<Student> consolidate (List<Student> students)
{
    List<Student> consolidatedStudents = new ArrayList<Student>();
    //???
    return consolidatedStudents;
}

How can I achieve this in the most efficient way?

Example main class for testing

public class Main {
    
    public static void main(String args[])
    {
        List<Subject> s1Subjects_1 = new ArrayList<>();
        List<Subject> s1Subjects_2 = new ArrayList<>();
        Subject s1Physics = new Subject(1,"A",90);
        Subject s1Chemistry_1 = new Subject(2,"B",80);      
        Subject s1Chemistry_2 = new Subject(2,"C",70);
        Subject s1Chemistry_3 = new Subject(2,"A+",95);
        Subject s1Biology = new Subject(3,"C",70);
        
        s1Subjects_1.add(s1Physics);
        s1Subjects_1.add(s1Chemistry_1);
        s1Subjects_1.add(s1Chemistry_2);
        s1Subjects_2.add(s1Chemistry_3);
        s1Subjects_2.add(s1Biology);
        
        List<Subject> s2Subjects_1 = new ArrayList<>();
        List<Subject> s2Subjects_2 = new ArrayList<>();
        Subject s2Physics_1 = new Subject(1,"D",50);
        Subject s2Physics_2 = new Subject(1,"D",50);
        s2Subjects_1.add(s2Physics_1);
        s2Subjects_2.add(s2Physics_2);
        
        Student s1_1 = new Student("s1", s1Subjects_1);
        Student s1_2 = new Student("s1", s1Subjects_2);
        Student s2_1 = new Student("s2", s2Subjects_1);
        Student s2_2 = new Student("s2", s2Subjects_2);
        
        List<Student> input = new ArrayList<>();
        input.add(s1_1);
        input.add(s1_2);
        input.add(s2_1);
        input.add(s2_2);        
        
        List<Student> output = consolidate(input);
    }
    
    public static List<Student> consolidate (List<Student> students)
    {
        List<Student> consolidatedStudents = new ArrayList<Student>();
        //
        //???
        //
        return consolidatedStudents;
    }

}

Solution

  • Here's a fun way to do it using streams..

    Here I'm using Collectors.toMap to build a temporary data structure mapping student id to the list of subjects. I'm merging list of subjects using a helper merge function.

    The merge function picks the best subject based on marks.

    public static List<Student> consolidate(List<Student> students) {
        Map<String, List<Subject>> map = students.stream()
                .collect(Collectors.toMap(Student::getStudentId,
                        Student::getSubjects,
                        (subjects1, subjects2) -> merge(subjects1, subjects2)));
        return map.entrySet()
                .stream()
                .map(e -> new Student(e.getKey(), e.getValue()))
                .collect(Collectors.toList());
    }
    

    The merge function builds a temporary map (map of subject id -> subject) for both subjects and merges the second into the first using Map#merge. For subjects with same id, it picks the one with the highest marks.

    private static List<Subject> merge(List<Subject> subjects1, List<Subject> subjects2) {
        Map<Integer, Subject> subjects1ById = new HashMap<>(subjectsMap(subjects1));
        Map<Integer, Subject> subjects2ById = subjectsMap(subjects2);
       
        subjects2ById.forEach((subId, sub) -> subjects1ById.merge(subId, sub,
                (sub1, sub2) -> pickBest(sub1, sub2)));
        return new ArrayList<>(subjects1ById.values());
    }
    
    private static Map<Integer, Subject> subjectsMap(List<Subject> subjects) {
        return subjects.stream()
                .collect(Collectors.toMap(Subject::getSubjectId, Function.identity(),
                        (sub1, sub2) -> pickBest(sub1, sub2)));
    }
    
    private static Subject pickBest(Subject s1, Subject s2) {
        return s1.getMarks() > s2.getMarks() ? s1 : s2;
    }