Introduction
The Set Cover Problem is a fundamental optimization problem in computer science and operations research, with many practical applications, such as network design, resource allocation, and scheduling.
In this tutorial, we'll explore how to use Python and Google's OR-Tools library to solve it.
Intended audienceThis blog post is intended for beginner-to-intermediate Python developers.
You don't need any prior knowledge of optimization or OR-Tools, but it will help if you understand basic Python concepts like loops, functions and lists.
Problem description
Imagine that a university wants to schedule a set of courses and needs to ensure that all topics are covered by the available professors. Each professor can cover multiple topics, and our goal is to select the minimum number of professors required to cover all topics.
Here’s the basic timetable:
Subject | Time Slot | Teacher |
---|---|---|
Maths | 9:00 - 10:00 AM | ? |
Physics | 10:00 - 11:00 AM | ? |
Chemistry | 11:00 - 12:00 PM | ? |
Biology | 1:00 - 2:00 PM | ? |
And here are the professors and the topics they’re able to cover:
Professor | Topics Covered |
---|---|
Prof. A | (Math, Physics) |
Prof. B | (Physics, Chemistry) |
Prof. C | (Math, Biology) |
Prof. D | (Biology, Chemistry) |
Prof. E | (Math, Chemistry, Biology) |
The simplest way to ensure that all courses are covered would be to assign a different professor to each class, e.g.:
Subject | Time Slot | Teacher |
---|---|---|
Maths | 9:00 - 10:00 AM | Prof. A |
Physics | 10:00 - 11:00 AM | Prof. B |
Chemistry | 11:00 - 12:00 PM | Prof. D |
Biology | 1:00 - 2:00 PM | Prof. E |
But, while it might be the “fairest” solution, it would also be incredibly inefficient; we’d need to pay 4 professors for a teaching day, even though each professor only teaches for a single hour.
Instead, we can use mathematical programming to to find the smallest set of professors that cover all the topics (Math, Physics, Chemistry, and Biology).
Isn't this an unrealistic problem?When creating timetables in the real world, we’ll likely have much more complex requirements relating to room availability, concurrent classes, and multi-day or multi-week timetables. The reason we’ve articulated the problem in this simplistic way is to illustrate the basic principles of the Set Cover Problem.
If you’re interested in using linear programming to solve more complex timetabling problems, check out some of the other articles on this site:
- Article 1
- Article 2
Mathematics
To solve this problem using mathematical optimization, we need to translate our constraints and goals into mathematical language. Let's break this down step by step.
Sets
First, let’s define the sets: the “populations” or “groups” of data which frame the problem.
In this problem, we have two main sets: the set of professors, and the set of topics. We’ll use mathematical notation to denote them:
- Let be the set of all professors
- Let be the set of all topics
Parameters
Next we’ll define a parameter which refers to each professor’s topics:
- Let be the subset of topics covered by professor
Decision variables
Decision variables are variables which have an as-yet-unknown value; later, when we solve the problem, we’ll work out which value they should take on.
In this problem, we’ll need one Boolean decision variable for each professor. Each variable will take the value 1 if the corresponding professor is chosen, and 0 if not:
- Let be a binary variable representing whether professor is included in the solution (1 if included, 0 otherwise)
Constraints
Next we need to define the constraints:
-
Coverage constraint (every topic must be covered by at least one professor):
What’s going on here? 🤔
The goal of the Set Cover Problem is to ensure that every topic is covered by at least one selected professor. Mathematically, this can be expressed with the above constraint. There are two parts to this:
-
Summation: : For each professor , if they are selected (), and if they cover the topic (), then that professor contributes 1 to the sum. If either the professor is not selected or does not cover the topic, they contribute 0 to the sum.
- Note: the in is not a “1”; it’s the indicator function. This might be a bit confusing if you’re new to maths. But the “1” in is not a constant value; rather, it signifies that the indicator function outputs 1 (true) or 0 (false) based on the condition .
-
Constraint: Finally, the constraint ≥1 ensures that the sum is at least 1, meaning that at least one selected professor covers the topic .
Objective function
Having defined our variables and constraints, we can now proceed to articulating our goals.
Remember: our goal is to minimize the number of selected professors. We can express this goal via the following objective function:
Solving with Python
Now that we've formulated the problem mathematically, let's implement the solution using Python and OR-Tools.
First, we'll import the necessary libraries:
from ortools.linear_solver import pywraplp import pandas as pd
Next, let's define our data:
data = { 'professor': ['Prof. A', 'Prof. B', 'Prof. C', 'Prof. D', 'Prof. E'], 'topics': [ {'Math', 'Physics'}, {'Physics', 'Chemistry'}, {'Math', 'Biology'}, {'Biology', 'Chemistry'}, {'Math', 'Chemistry', 'Biology'} ] } df = pd.DataFrame(data) topics = {'Math', 'Physics', 'Chemistry', 'Biology'}
Now, let's create our solver and define the variables:
# Create the solver solver = pywraplp.Solver.CreateSolver('SCIP') # Create variables x = {} for i in df.index: x[i] = solver.BoolVar(f'x_{i}') # Print variables to verify print('Number of variables =', solver.NumVariables())
Next, we'll add our constraints:
# Coverage constraints for topic in topics: solver.Add(solver.Sum(x[i] for i in df.index if topic in df.loc[i, 'topics']) >= 1) # Print constraints to verify print('Number of constraints =', solver.NumConstraints())
Now, let's define our objective function:
# Objective function objective = solver.Objective() for i in df.index: objective.SetCoefficient(x[i], 1) objective.SetMinimization()
Finally, we can solve the problem and print the results:
# Solve the problem status = solver.Solve() # Print solution if status == pywraplp.Solver.OPTIMAL: print('Solution:') print('Objective value =', solver.Objective().Value()) for i in df.index: if x[i].solution_value() > 0: print(f"{df.loc[i, 'professor']}: {x[i].solution_value()}") else: print('The problem does not have an optimal solution.')
When we run this code, we get the following output:
Number of variables = 5 Number of constraints = 4 Solution: Objective value = 2.0 Prof. A: 1.0 Prof. D: 1.0
Interpreting the results
Our optimization model suggests that the university should select Prof. A and Prof. D. This combination will ensure that all topics (Math, Physics, Chemistry, and Biology) are covered with the minimum number of professors (2).
Conclusion
In this article, we've explored how to solve a real-world optimization problem using mathematical programming techniques with Python and Google OR-Tools. We've seen how to:
- Formulate a problem mathematically
- Implement the solution using OR-Tools
- Interpret the results
This approach can be applied to a wide range of optimization problems beyond just scheduling professors. Whether you're optimizing resource allocation, network design, or other combinatorial problems, the principles remain the same.
Remember, while our model provides a mathematically optimal solution, real-world considerations might lead to different choices. For example, the university might have preferences for certain professors or other constraints not considered in the model.
Happy optimizing, and good luck with your scheduling!