Priority Queue in Python
Priority queue is a data structure similar to a queue, but where each element has an associated priority. A queue is a first in, first out (FIFO) data structure, whereas in a priority queue the element with the highest priority is served before lower priority elements.
Priority queue in Python? Short answer: use queue.PriorityQueue
Python comes with a built-in PriorityQueue
class, contained in the queue
module.
In the simplest case, an entry in the priority queue will be a tuple (priority_number, data)
. Here's a dummy example of how to use it:
import random
from queue import PriorityQueue
def generate_items():
items = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
priorities = [2, 1, 3, 3, 0, 1, 1, 2]
return items, priorities
if __name__ == '__main__':
items, priorities = generate_items()
print('Items/Priorities')
for item, priority in zip(items, priorities):
print((item, priority))
q = PriorityQueue()
# We add items to priority queue in the order they're listed
# a -> b -> c -> ...
for item, priority in zip(items, priorities):
q.put((priority, item))
# Let's get items from the priority queue one by one
# They will be printed in the order of their priority
print('Getting items from the priority queue')
while not q.empty():
next_item = q.get()
print(next_item)
And here's the output:
Items/Priorities
('a', 2)
('b', 1)
('c', 3)
('d', 3)
('e', 0)
('f', 1)
('g', 1)
('h', 2)
Getting items from the priority queue
(0, 'e')
(1, 'b')
(1, 'f')
(1, 'g')
(2, 'a')
(2, 'h')
(3, 'c')
(3, 'd')
Note that by default, the lower the value of the priority number, the higher the priority of the entry. That is priority 0
is higher than priority 1
.
Technically, PriorityQueue
is implemented using a min-heap data structure. That means the smallest priority value always comes out first. If we want to reverse this, we can simply use the negative of the priority value when adding elements to the queue.
As a more robust solution, we can wrap our queue items in a custom class with overloaded comparison operators implemented to give us the desired ordering. We'll use priority queue this way in our next example.
Resolving equal priorities and order stability
Let's consider a more involved example. We'll use it to reveal two important problems with our tuple-based solution from the previous section.
Assume we have a customer support center, and we want to implement a queueing policy that prioritizes requests from older customers (those who signed up for our service earlier). To make it more fair, we also want the queue to be stable with respect to the insertion order - if two or more customers with the same priority are in the queue, we must serve them in the order of arrival.
To make things more realistic, each item we push to the queue be an instance of the Task
class. It will contain a reference to a customer and some details about their request. For prioritization, we'll simply use the year of signup.
Let's start by defining the Customer
class and generating some instances:
import random
from queue import PriorityQueue
from dataclasses import dataclass
from datetime import date
import functools
@dataclass
class Customer():
signup_date: date
first_name: str
last_name: str
def signup_year(self):
return self.signup_date.year
def generate_customers(n):
first_names = ['Jessica', 'Mary', 'Patricia', 'Jennifer', 'Elizabeth', 'James', 'John', 'Robert', 'Michael', 'Noah', 'Damian']
last_names = ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Rodriguez', 'Murphy', 'Walsh', 'Wilson', 'Taylor']
customers = []
for i in range(n):
c = Customer(
first_name=random.choice(first_names),
last_name=random.choice(last_names),
signup_date = date(
year=random.randint(2020, 2022),
month=random.randint(1, 12),
day=random.randint(1, 28)
)
)
customers.append(c)
return customers
if __name__ == '__main__':
random.seed(3)
customers = generate_customers(5)
for customer in customers:
print(customer)
Running this, we get the following output:
('2022-03-12', 'Jennifer Taylor')
('2022-10-03', 'Noah Walsh')
('2021-05-18', 'Noah Smith')
('2022-08-18', 'Jennifer Brown')
('2021-11-28', 'Michael Walsh')
Let's now create the Task
class which will hold a customer and the description of why they are in the queue.
class Task():
def __init__(self, customer, description):
self.customer = customer
self.description = description
self.priority = customer.get_signup_year()
def __repr__(self):
return str((self.priority, self.description, self.customer))
For our customers, the list of tasks might look something like this:
Task((2022, 'Delivery delay', ('2022-03-12', 'Jennifer Taylor')))
Task((2022, 'Delivery delay', ('2022-10-03', 'Noah Walsh')))
Task((2021, 'Unavailable products', ('2021-05-18', 'Noah Smith')))
Task((2022, 'Delivery delay', ('2022-08-18', 'Jennifer Brown')))
Task((2021, 'Unavailable products', ('2021-11-28', 'Michael Walsh')))
If we try to add these tasks to a priority queue, it will not work. We get the following error:
TypeError: '<' not supported between instances of 'Task' and 'Task'
Why? It's because our PriorityQueue
doesn't know how to compare (i.e. prioritize) the Tasks. Comparison operators are not implemented for the class Task
.
Fair enough. Let's just employ our old trick - add a (priority, Task)
tuple to the queue. But we get the same thing again:
TypeError: '<' not supported between instances of 'Task' and 'Task'
Why doesn't this work? In the simplest case, we'll put into the queue tuples of (priority, item)
. Behind the scenes, PriorityQueue
will perform tuple comparisons to determine which queue item is the smallest, i.e. at the front of the queue. Remember that tuples are compared position by position, comparing corresponding elements. In our case, this means that if the priorities of some two entries are equal (tuple position 0), we proceed to compare the items (tuple position 1).
In many real-world applications, this can pose two problems:
- Comparison between items is not always possible, i.e. comparison operators are not defined for the item class
- For any two items of the same priority, how do we return them from the queue in the order in which they were added to the queue?
For problem #1 we have an obvious solution - wrap the items in a class where we can implement comparison operators. We can hold the priority as a field in this class and encapsulate prioritization within comparison operators. For us, this will be our Task
class.
For problem #2, sort stability, there is also a simple, although less obvious solution: add an entry counter to priority comparison. Whenever we add an item to the queue, we increase the counter. We use the counter to resolve ties in comparison between equal priority items.
The intuition here is simple. Arrive at the queue earlier, and you get a smaller entry counter value. If somebody with the same priority arrives after you, you go first because of your lower entry counter value.
If we use tuple entries, we can simply add the entry counter right after priority in the tuple: (priority, entry_counter, item)
. The entry counter in the tuple solves our sorting stability problem - in case of equal priorities, we compare the second entry in the tuple, the entry counter. But it also solves our sorting operator problem - since entry_counter
must be unique for every queue entry, we'll never need to compare items directly.
We'll leave this as an exercise for the reader and proceed with encapsulating priority and entry_counter within the Task
class. This way, we'll only push Task
instances to the queue instead of (priority, entry_counter, Task)
tuples.
That said, let's fix our Task
class:
@functools.total_ordering
class Task():
def __init__(self, customer, description, entry_counter):
self.customer = customer
self.description = description
self.priority = customer.get_signup_year()
self.entry_counter = entry_counter
def __repr__(self):
return "Task(" + str((self.priority, self.entry_counter, self.description, self.customer)) + ")"
def __lt__(self, other):
return (self.priority, self.entry_counter) < (other.priority, other.entry_counter)
def __eq__(self, other):
return (self.priority, self.entry_counter) == (other.priority, other.entry_counter)
Two important things are going on here:
- We added the
entry_counter
field, and we'll make sure to set it properly before adding a task to the queue. - We implemented comparison methods for the
Task
class. To make instances of a class comparable, we implement<
and==
operators (__lt__
and__eq__
methods, respectively). The@functools.total_ordering
decorator completes the class with other comparison operators:<=
,>
and>=
.
Now we can complete our example by adding our customers to the queue and serving them in order of priority while respecting arrival order in case of equal priorities.
import random
import functools
from datetime import date
from dataclasses import dataclass
from queue import PriorityQueue
@dataclass
class Customer():
signup_date: date
first_name: str
last_name: str
def get_signup_year(self):
return self.signup_date.year
def __repr__(self):
return str((self.signup_date.strftime('%Y-%m-%d'), self.first_name + ' ' + self.last_name))
def generate_customers(n):
first_names = ['Jessica', 'Mary', 'Patricia', 'Jennifer', 'Elizabeth', 'James', 'John', 'Robert', 'Michael', 'Noah', 'Damian']
last_names = ['Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Rodriguez', 'Murphy', 'Walsh', 'Wilson', 'Taylor']
customers = []
for i in range(n):
c = Customer(
first_name=random.choice(first_names),
last_name=random.choice(last_names),
signup_date = date(
year=random.randint(2020, 2022),
month=random.randint(1, 12),
day=random.randint(1, 28)
)
)
customers.append(c)
return customers
@functools.total_ordering
class Task():
def __init__(self, customer, description, entry_counter):
self.customer = customer
self.description = description
self.priority = customer.get_signup_year()
self.entry_counter = entry_counter
def __repr__(self):
return "Task(" + str((self.priority, self.entry_counter, self.description, self.customer)) + ")"
def __lt__(self, other):
return (self.priority, self.entry_counter) < (other.priority, other.entry_counter)
def __eq__(self, other):
return (self.priority, self.entry_counter) == (other.priority, other.entry_counter)
if __name__ == '__main__':
random.seed(3)
customers = generate_customers(5)
q = PriorityQueue()
print("Customers arriving...")
entry_counter = 0
for customer in customers:
# Create a task for the customer
task = Task(
customer=customer,
description=random.choice(['Delivery delay', 'Poor product quality', 'Unavailable products']),
entry_counter=entry_counter
)
# Add the task to queue
q.put(task)
# Increment entry counter
entry_counter += 1
print(task)
# Pull the elements from the queue one by one
print()
print("Customers served...")
while not q.empty():
next_customer = q.get()
print(next_customer)
Pay attention to lines #68 - #81. This is where we create Task
instances and assign the entry_counter
to each of them.
If you run the snippet above, you'll get the following output:
Customers arriving...
Task((2022, 0, 'Delivery delay', ('2022-03-12', 'Jennifer Taylor')))
Task((2022, 1, 'Delivery delay', ('2022-10-03', 'Noah Walsh')))
Task((2021, 2, 'Unavailable products', ('2021-05-18', 'Noah Smith')))
Task((2022, 3, 'Delivery delay', ('2022-08-18', 'Jennifer Brown')))
Task((2021, 4, 'Unavailable products', ('2021-11-28', 'Michael Walsh')))
Customers served...
Task((2021, 2, 'Unavailable products', ('2021-05-18', 'Noah Smith')))
Task((2021, 4, 'Unavailable products', ('2021-11-28', 'Michael Walsh')))
Task((2022, 0, 'Delivery delay', ('2022-03-12', 'Jennifer Taylor')))
Task((2022, 1, 'Delivery delay', ('2022-10-03', 'Noah Walsh')))
Task((2022, 3, 'Delivery delay', ('2022-08-18', 'Jennifer Brown')))
See how our customers arrived in a random order but were served (taken out of the queue) by the year of their signup (older customers first). Between customers from the same year, we kept the order in which they arrived in the queue.
As an exercise, try removing entry_counter
from comparison operators and observe how this affects the serving order.
Is the Python priority queue thread safe?
Yes, all classes from the built-in queue
module are thread-safe. According to the official documentation:
Internally, those three types of queues [Queue, LifoQueue, PriorityQueue] use locks to temporarily block competing threads; however, they are not designed to handle reentrancy within a thread.
Is priority queue the same thing as heap?
Yes and no. While the terms are sometimes used interchangeably, the difference is subtle.
A heap is a data structure (a tree) with an important property - in a max-heap the largest element is always at the root; in a min-heap it is the smallest element. Heaps can efficiently maintain this property (called heap property) while adding or removing elements to the data structure.
This makes heap an obvious choice for implementing priority queues. Priority queue is an abstract data structure with properties described in the opening section of this article:
a priority queue is an abstract data type similar to a regular queue or stack data structure in which each element additionally has a priority associated with it. In a priority queue, an element with high priority is served before an element with low priority.
Python's queue.PriorityQueue
is, in fact, built on top of the heapq
module. This module provides a lower lever interface to a heap-based priority queue.
Aside from the heap, there are other ways to implement a priority queue, although mostly less efficient.