This is a complicated one, but I suspect there's some principle I can apply to make it simple - I just don't know what it is.
I need to parcel out presentation slots to a class full of students for the semester. There are multiple possible dates, and multiple presentation types. I conducted a survey where students could rank their interest in the different topics. What I'd like to do is get the best (or at least a good) distribution of presentation slots to students.
So, what I have:
What I'd like to get:
intro
), one of presentation type B (figures
) and 3 of presentation type C (aims
)I should note that I realize this looks like a homework problem, but it's real life :-). I was thinking that I might make a Student
class for each student that contains the dates for each presentation type, but I wasn't sure what the best way to populate it would be. Actually, I'm not even sure where to start.
TL;DR: I think you're giving your students too much choice :D
But I had a shot at this problem anyway. Pretty fun exercise actually, although some of the constraints were a little vague. Most of all, I had to guess what the actual students' preference distribution would look like. I went with uniformly distributed, independent variables, although that's probably not very realistic. Still I think it should work just as well on real data as it does on my randomly generated data.
I considered brute forcing it, but a rough analysis gave me an estimate of over 10^65 possible configurations. That's kind of a lot. And since we don't have a trillion trillion years to consider all of them, we'll need a heuristic approach.
Because of the size of the problem, I tried to avoid doing any backtracking. But this meant that you could get stuck; there might not be a solution where everyone only gets dates they gave 4's and 5's.
I ended up implementing a double-edged Iterative Deepening-like search, where both the best case we're still holding out hope for (i.e., assign students to a date they gave a 5) and the worst case scenario we're willing to accept (some student might have to live with a 3) are gradually lowered until a solution is found. If we get stuck, reset, lower expectations, and try again. Tasks A and B are assigned first, and C is done only after A and B are complete, because the constraints on C are far less stringent.
I also used a weighting factor to model the trade off between maximizing students happiness with satisfying the types-of-presentations-per-day limits.
Currently it seems to find a solution for pretty much every random generated set of preferences. I included an evaluation metric; the ratio between the sum of the preference values of all assigned student/date combos, and the sum of all student ideal/top 3 preference values. For example, if student X had two fives, one four and the rest threes on his list, and is assigned to one of his fives and two threes, he gets 5+3+3=11 but could ideally have gotten 5+5+4=14; he is 11/14 = 78.6% satisfied.
After some testing, it seems that my implementation tends to produce an average student satisfaction of around 95%, at lot better than I expected :) But again, that is with fake data. Real preferences are probably more clumped, and harder to satisfy.
Below is the core of the algorihtm. The full script is ~250 lines and a bit too long for here I think. Check it out at Github.
...
# Assign a date for a given task to each student,
# preferring a date that they like and is still free.
def fill(task, lowest_acceptable, spread_weight=0.1, tasks_to_spread="ABC"):
random_order = range(nStudents) # randomize student order, so everyone
random.shuffle(random_order) # has an equal chance to get their first pick
for i in random_order:
student = students[i]
if student.dates[task]: # student is already assigned for this task?
continue
# get available dates ordered by preference and how fully booked they are
preferred = get_favorite_day(student, lowest_acceptable,
spread_weight, tasks_to_spread)
for date_nr in preferred:
date = dates[date_nr]
if date.is_available(task, student.count, lowest_acceptable == 1):
date.set_student(task, student.count)
student.dates[task] = date
break
# attempt to "fill()" the schedule while gradually lowering expectations
start_at = 5
while start_at > 1:
lowest_acceptable = start_at
while lowest_acceptable > 0:
fill("A", lowest_acceptable, spread_weight, "AAB")
fill("B", lowest_acceptable, spread_weight, "ABB")
if lowest_acceptable == 1:
fill("C", lowest_acceptable, spread_weight_C, "C")
lowest_acceptable -= 1
And here is an example result as printed by the script:
Date
================================================================================
Student | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
================================================================================
1 | | A | B | | C | | | | | | | |
2 | | | | | A | | | | | B | C | |
3 | | | | | B | | | C | | A | | |
4 | | | | A | | C | | | | | | B |
5 | | | C | | | | A | B | | | | |
6 | | C | | | | | | | A | B | | |
7 | | | C | | | | | B | | | | A |
8 | | | A | | C | | B | | | | | |
9 | C | | | | | | | | A | | | B |
10 | A | B | | | | | | | C | | | |
11 | B | | | A | | C | | | | | | |
12 | | | | | | A | C | | | | B | |
13 | A | | | B | | | | | | | | C |
14 | | | | | B | | | | C | | A | |
15 | | | A | C | | B | | | | | | |
16 | | | | | | A | | | | C | B | |
17 | | A | | C | | | B | | | | | |
18 | | | | | | | C | A | B | | | |
================================================================================
Total student satisfaction: 250/261 = 95.00%