algorithm data-structures genetic-algorithm genetic-programming

Modified version of Student Project Allocation algorithm

I am working on a project for a non-profit organization where they are trying to help students with special needs to match to different project topics. Each student will have four preferences and a set of supervisors will have their list of preferences on the topics they supervise.

The solution I look for is an algorithm which can find an optimum solution to match students to project topics and supervisors.

I have done extensive reading on SPA, HR and other Greedy Algorithms and even tried a flavour of Genetic algorithm. So far I have nothing but stress.

Here is the flow of the program.

We have a pool of topics for supervisors to show their interest. Supervisors can pick topics where they like to supervise and they can also suggest a topic and decide how many project groups they would like to supervise.

P1, P2, P3, P4, P5 ...... Pn ... SP1, SP2, SP3 .... SPn

In the above list, P1 ... Pn are existing topics and SP1...SPn are suggested topics.

Let's say after this round, we have a list of supervisors with the following preference.

supervisor | Topics of Interest | No. Of Groups
L1         | P1, P3, P4         |     2
L2         | P5, P2, P9         |     1
L3         | P1, P3, P4         |     1
L4         | P1, P3, P4         |     4
L5         | SP1, P3, P8        |     3
L6         | P32, P3, P40       |     3

After the above round, we know that there are only supervisors who can Supervise students on the following topics.

P1, P2, P3, P4, P8, P9, P32, P40, SP1

When we open the topics for the students, they will only be able to pick projects from the above list, with their preference/priority. For example

student    | Pref1 | Pref 2 | Pref 3 | Pref 4 |
S1         |  P4   |  P1    |  SP1   |   P5   |
S2         |  P1   |  P9    |  SP1   |   P5   |
S3         |  P3   |  P1    |  P2    |   P5   |
S4         |  P4   |  P1    |  P40   |   P5   |
S5         |  P4   |  P32   |  SP1   |   P5   |
...
Sn         |  P9   |  P1    |  SP1   |   P5   |

Now, once the students pick the preference, We will then decide a number MAX_GROUP_SIZE and we will run our algorithm to group these students into partitions where we will

a. Group students with similar interests into same group ( eg. We add Students who picked P1 as their pref1 and fill in the rest with pref2, pref3, pref4 when they don't have groups for their first choices). b. Assign a supervisor to a group where he has shown interest in the project ( Ideally, every students first preferences or best-matched project) c. We need to make sure that we don't overload the supervisor, if a supervisor has shown interest in P1, P2, P3 and mentioned that he can only supervise 2 projects, then we should only add him to 2 projects.

So far, I have been trying the above-explained algorithms and I still don't think I have a justified solution for the students. I prefer a solution which is more biased towards the students since they have special needs. If anyone can point me in the right direction or can provide me with a well-defined algorithm or an implementation I would not only appreciate the effort but I would buy you a coffee as well.

Solution

This is a (more correct) answer based on the same approach as the previous answer, however, it solves the entire problem as a single weighted bipartite matching.

The same considerations apply as for the previous answer; however, this answer will find an answer if one exists. It has to, however, condition on the number of projects used in the final solution, so it can find multiple "good" solutions for different numbers of projects used (a project is considered used if it has 1 or more students.)

#!/usr/bin/python

"""
filename: student_assign.py
purpose:  demonstrate that the problem described in 
          https://stackoverflow.com/questions/62755778/modified-version-of-student-project-allocation-algorithm
          can be solved as an instance of MCF.
"""


import networkx as nx

# For this demonstration we take data directly from the problem description
#supervisor | Topics of Interest | No. Of Groups
#L1         | P1, P3, P4         |     2
#L2         | P5, P2, P9         |     1
#L3         | P1, P3, P4         |     1
#L4         | P1, P3, P4         |     4
#L5         | SP1, P3, P8        |     3
#L6         | P32, P3, P40       |     3


supervisors = {
        'L1' : { 'topics' : ['P1', 'P3', 'P4'], 'num_groups' : 2},
        'L2' : { 'topics' : ['P5', 'P2', 'P9'], 'num_groups' : 1},
        'L3' : { 'topics' : ['P1', 'P3', 'P4'], 'num_groups' : 1},
        'L4' : { 'topics' : ['P1', 'P3', 'P4'], 'num_groups' : 4},
        'L5' : { 'topics' : ['SP1', 'P3', 'P8'], 'num_groups' : 3},
        'L6' : { 'topics' : ['P32', 'P3', 'P40'], 'num_groups' : 3},
        }

all_topics = sorted(list({ t  for s in supervisors for t in supervisors[s]['topics'] }))


# assuming there is a typo in the problem specification and 'supervisor' = 'student' below
#supervisor | Pref1 | Pref 2 | Pref 3 | Pref 4 |
#S1         |  P4   |  P1    |  SP1   |   P5   |
#S2         |  P1   |  P9    |  SP1   |   P5   |
#S3         |  P3   |  P1    |  P2    |   P5   |
#S4         |  P4   |  P1    |  P40   |   P5   |
#S5         |  P4   |  P32   |  SP1   |   P5   |
#S6         |  P9   |  P1    |  SP1   |   P5   |

students = {
        'S1' : ['P4', 'P1', 'SP1', 'P5'] ,
        'S2' : ['P1', 'P9', 'SP1', 'P5'] ,
        'S3' : ['P3', 'P1', 'P2', 'P5'] ,
        'S4' : ['P4', 'P1', 'P40', 'P5'] ,
        'S5' : ['P4', 'P32', 'SP1', 'P5'] ,
        'S6' : ['P9', 'P1', 'SP1', 'P5'] ,
        }

MAX_GROUP_SIZE = 2


def get_student_topic_supervisor_assignments(all_topics,students,supervisors,num_topics_used,max_group_size=MAX_GROUP_SIZE,do_supervisor_load_balancing=False):

    G = nx.DiGraph()
    G.add_node('sink',demand=len(students) - num_topics_used)

    for topic in all_topics:
        G.add_node(topic)
        G.add_edge(topic, 'sink', weight = 0, capacity = max_group_size-1)

    for student in students:
        prefs = students[student]
        G.add_node(student,demand=-1)
        # add increasing weight edges from student to preferences (lowest == best)
        for i, topic in enumerate(prefs):
            G.add_edge(student, topic, weight = i, capacity = 1)


    G.add_node('sink_2',demand=num_topics_used)

    for topic in all_topics:
        G.add_node(topic + "_2")
        G.add_edge(topic, topic + "_2", weight = 0, capacity = 1 )

    for supervisor in supervisors:
        supervisor_properties = supervisors[supervisor]
        for topic in supervisor_properties['topics']:
            G.add_edge(topic + "_2", supervisor, weight = 0, capacity = 1)
        if do_supervisor_load_balancing:
            for i in range(supervisor_properties['num_groups']):
                G.add_node(supervisor + "_dummy")
                G.add_edge(supervisor, supervisor + "_dummy", weight = i, capacity = 1)
                G.add_edge(supervisor + "_dummy", 'sink_2', weight = 0, capacity = 1)
        else:
            G.add_edge(supervisor, 'sink_2', weight = 0, capacity = supervisor_properties['num_groups'])

    # solve the weighted matching
    flow_dict = nx.min_cost_flow(G)
    
    for topic in all_topics:
        edges = flow_dict[topic]
        if edges['sink'] and  not edges[topic+"_2"]:
            raise RuntimeError('Solution with num_topics_used={n} is not valid.'.format(n=num_topics_used))

    # decode solution
    topic_assignments = {t : [] for t in all_topics}
    for student in students:
        edges = flow_dict[student]
        for target in edges:
            if edges[target]:
                topic_assignments[target].append(student)
                break

    supervisor_assignments = {s : [] for s in supervisors}
    for topic in all_topics:
        edges = flow_dict[topic+"_2"]
        for target in edges:
            if edges[target]:
                supervisor_assignments[target].append(topic)
    
    return topic_assignments, supervisor_assignments

num_students = len(students)
for n in range(1,num_students+1):
    try:
        topic_assignments, supervisor_assignments =\
                get_student_topic_supervisor_assignments(all_topics,students,supervisors,num_topics_used=n)
        print ' An optimal solution was found with `num_topics_used`={n}'.format(n=n)
        print ' Topic assignments:\n', topic_assignments
        print ' Supervisor assignments:\n', supervisor_assignments
    except Exception as e:
        pass

This outputs:

 An optimal solution was found with `num_topics_used`=4
 Topic assignments:
{'P2': [], 'P3': ['S3'], 'P1': ['S2', 'S4'], 'P4': ['S1', 'S5'], 'P5': [], 'SP1': [], 'P8': [], 'P9': ['S6'], 'P32': [], 'P40': []}
 Supervisor assignments:
{'L6': ['P3'], 'L4': ['P4'], 'L5': [], 'L2': ['P9'], 'L3': ['P1'], 'L1': []}
 An optimal solution was found with `num_topics_used`=5
 Topic assignments:
{'P2': [], 'P3': ['S3'], 'P1': ['S2'], 'P4': ['S1', 'S4'], 'P5': [], 'SP1': [], 'P8': [], 'P9': ['S6'], 'P32': ['S5'], 'P40': []}
 Supervisor assignments:
{'L6': ['P3', 'P32'], 'L4': ['P1'], 'L5': [], 'L2': ['P9'], 'L3': ['P4'], 'L1': []}
 An optimal solution was found with `num_topics_used`=6
 Topic assignments:
{'P2': [], 'P3': ['S3'], 'P1': ['S2'], 'P4': ['S4'], 'P5': [], 'SP1': ['S1'], 'P8': [], 'P9': ['S6'], 'P32': ['S5'], 'P40': []}
 Supervisor assignments:
{'L6': ['P3', 'P32'], 'L4': ['P1'], 'L5': ['SP1'], 'L2': ['P9'], 'L3': ['P4'], 'L1': []}

Supervisor Load Balancing

An update of this solution added an extra parameter to the function do_supervisor_load_balancing, which (when set to true) will prefer solutions where the number of topics each supervisor is assigned is similar.

Note, that using load balancing can potentially set the two criteria at odds:

balancing supervisor loads
giving students preference on which projects they work

Setting the weights of one higher than the other (by an order of magnitude) will ensure that criteria is much more highly weighted. As it stands, the solution presented here gives about equal weight to both criteria.

In the above example, when load balancing is used the following is output:

 An optimal solution was found with `num_topics_used`=4
 Topic assignments:
{'P2': [], 'P3': ['S3'], 'P1': ['S2', 'S4'], 'P4': ['S1', 'S5'], 'P5': [], 'SP1': [], 'P8': [], 'P9': ['S6'], 'P32': [], 'P40': []}
 Supervisor assignments:
{'L6': ['P3'], 'L4': [], 'L5': [], 'L2': ['P9'], 'L3': ['P4'], 'L1': ['P1']}
 An optimal solution was found with `num_topics_used`=5
 Topic assignments:
{'P2': [], 'P3': ['S3'], 'P1': ['S2'], 'P4': ['S1', 'S4'], 'P5': [], 'SP1': [], 'P8': [], 'P9': ['S6'], 'P32': ['S5'], 'P40': []}
 Supervisor assignments:
{'L6': ['P32'], 'L4': [], 'L5': ['P3'], 'L2': ['P9'], 'L3': ['P4'], 'L1': ['P1']}
 An optimal solution was found with `num_topics_used`=6
 Topic assignments:
{'P2': [], 'P3': ['S3'], 'P1': ['S2'], 'P4': ['S4'], 'P5': [], 'SP1': ['S1'], 'P8': [], 'P9': ['S6'], 'P32': ['S5'], 'P40': []}
 Supervisor assignments:
{'L6': ['P32'], 'L4': ['P3'], 'L5': ['SP1'], 'L2': ['P9'], 'L3': ['P4'], 'L1': ['P1']}