Python sort lambda - How to sort with lambda key function in Python
Sorting in Python is simple. You'll pass a list, dict, set or any other kind of iterable object to the built-in function sorted
. It returns a sorted list of elements of that object.
Here's an example:
A = [4, 2, 5, 3, 1] # An unsorted array of integers
B = sorted(A) # sorted(A) returns a sorted copy of A
print(B)
[1, 2, 3, 4, 5]
Sorting complex objects using a lambda key function
Let's say we have a list of city names along with their country, population and area. We'd like to sort the list in various ways:
- Sort by population
- Sort alphabetically by country. If there are cities from the same country order them alphabetically by name
- Sort by population density
For the sake of example let's represent cities as simple Python dicts:
cities = [
{
"name": "New York City",
"country": "United States",
"population": 20.14,
"area": 1223.59
},
{
"name": "Tokyo",
"country": "Japan",
"population": 37.47,
"area": 2194.07,
},
{
"name": "Los Angeles",
"country": "United States",
"population": 13.2,
"area": 1299.01,
},
{
"name": "Madrid",
"country": "Spain",
"population": 6.79,
"area": 604.31,
},
{
"name": "Osaka",
"country": "Japan",
"population": 19.3,
"area": 5107.0,
},
{
"name": "London",
"country": "United Kingdom",
"population": 14.26,
"area": 8382.0,
}
]
What do you think happens if we just call sorted(dicts)
? How will the sorted
function know which field(s) of our cities to use for sorting?
We'll need to explicitly tell the sorted
method how we want to sort our objects. For that we'll use the key parameter.
Here's how we'd do this for our examples:
- Sort by population
We'll tell sorted
that for each city, we want it to take population as the sorting key. This means that each city will be compared against other cities by looking only at their population field.
# Sort by population
cities = sorted(cities, key=lambda city: city['population'])
A very common pattern is to reverse the sorting order by using negative value of the sorting key (note the minus sign in front of city['population']
)
# Sort by population DESCENDING
cities = sorted(cities, key=lambda city: -city['population'])
2. Sort alphabetically by country, then by name
Here, for each city we'll provide a tuple (country, name) to the sorting function. As tuples are sorted by comparing them field by field, this will effectively sort our cities by country, then by name.
# Sort by country, then by name
cities = sorted(cities, key=lambda city: (city['country'], city['name']))
3. Sort by population density
A nice trick with key function is that we can provide an arbitrary function that accepts an object to be sorted, for example we can calculate population density on the fly.
# Sort by population density
cities = sorted(cities, key=lambda city: (city['population'] / city['area']))
How it actually works?
In order to understand how sorting by key works in the background, let's first see how sorted
works when sorting lists of lists (or tuples).
Let's say we have a list of movies with their ratings and year of release:
movies = [
(1994, 'The Shawshank Redemption', 9.2),
(1999, 'Fight Club', 8.8),
(1994, 'Pulp Fiction', 8.9),
(1972, 'The Godfather', 9.2),
(2008, 'The Dark Knight', 9.0)
]
Calling sorted(movies)
returns the following list:
(1972, 'The Godfather', 9.2)
(1994, 'Pulp Fiction', 8.9)
(1994, 'The Shawshank Redemption', 9.2)
(1999, 'Fight Club', 8.8)
(2008, 'The Dark Knight', 9.0)
Can you guess what happened?
The tuples get sorted by first comparing their elements at position 0
, in this case the year. In case of a tie, the tuples are then sorted by comparing the second element, then the third, and so on.
In our movie example, this means that the movies get sorted by comparing:
movie[0]
- year or release, ascendingmovie[1]
- title, lexicographically (case sensitive), ascendingmovie[2]
- rating, ascending
Note that sorted
always sorts in ascending order unless we modify its behaviour using key
or reverse
parameters.
Understanding how lists of tuples or lists of lists are sorted is crucial to understanding more complex sorting patterns - those using the key
parameter.
How it works - Sorted key
parameter
In our previous example, what if we wanted to sort the movies by rating first, then by year?
Python's sorted
function offers a simple solution - use the key
parameter. The official Python docs explain this in simple terms:
Bothlist.sort()
andsorted()
have a key parameter to specify a function (or other callable) to be called on each list element prior to making comparisons.
The value of the key parameter should be a function (or other callable) that takes a single argument and returns a key to use for sorting purposes.
Let's use this knowledge to sort our movies by rating, in descending order.
movies = sorted(movies, key=lambda movie: movie[2], reverse=True)
This will get us the following ordering of movies (note the descending ordering of ratings)
(1994, 'The Shawshank Redemption', 9.2)
(1972, 'The Godfather', 9.2)
(2008, 'The Dark Knight', 9.0)
(1994, 'Pulp Fiction', 8.9)
(1999, 'Fight Club', 8.8)
The key thing to unpack here is key=lambda movie: movie[2]
- what exactly is going on here?
Lambda functions - what are they?
First we need to understand what a lambda
is. A really simple concept with a fancy name, lambda functions or simply lambdas are a syntactic sugar in Python.
The lambda
keyword allows us to declare small, anonymous, single-line functions, taking one or more parameters and returning a value of an expression.
So when we say:
f = lambda x, y: x + y
It is equivalent to:
def f(x, y):
return x + y
Lambdas really shine when we need to supply a simple function as a parameter to another function, such as supplying a function as a key
parameter to sorted
.
Back to our sorting example, when we write:
movies = sorted(movies, key=lambda movie: movie[2], reverse=True)
it is just a shorthand for:
def movie_key(movie):
return movie[2]
movies = sorted(movies, key=movie_key, reverse=True)
Sorting by key under the hood
In order to understand how the key
parameter is used in sorted
, let's implement a sorting algorithm and arm it with a key parameter.
For the sake of simplicity we'll implement Bubble sort, but please note that the builtin sorted
method definitely uses more efficient sorting algorithms.
def bubble_sort(array: list):
new_array = [e for e in array] # Create a copy of array to work on
while True:
swapped = False
for i in range(0, len(new_array) - 1):
if new_array[i+1] < new_array[i]:
new_array[i], new_array[i+1] = new_array[i+1], new_array[i] # Swap elements that are out of order
swapped = True
if not swapped:
break
return new_array
The most important part is:
if new_array[i+1] < new_array[i]:
new_array[i], new_array[i+1] = new_array[i+1], new_array[i]
In case a smaller element is found after a larger element, they get swapped. In order to facilitate sorting by arbitrary key
function we'll modify our algorithm as follows:
def bubble_sort(array: list, key):
new_array = [e for e in array] # Create a copy of array to work on
while True:
swapped = False
for i in range(0, len(new_array) - 1):
if key(new_array[i+1]) < key(new_array[i]):
new_array[i], new_array[i+1] = new_array[i+1], new_array[i] # Swap elements that are out of order
swapped = True
if not swapped:
break
return new_array
The important line here is:
if key(new_array[i+1]) < key(new_array[i]):
new_array[i], new_array[i+1] = new_array[i+1], new_array[i]
Note that we're now comparing key(a)
vs key(b)
instead of a
vs b
. This allows us to impose arbitrary sorting rules by using a simple function returning a sorting key. You can imagine that there's a very similar line somewhere inside the code for Python's sorted
function.
Let's test our bubble_sort
function - we'll sort our movies by rating in descending order:
movies = [
(1994, 'The Shawshank Redemption', 9.2),
(1999, 'Fight Club', 8.8),
(1994, 'Pulp Fiction', 8.9),
(1972, 'The Godfather', 9.2),
(2008, 'The Dark Knight', 9.0)
]
movies = sorted(movies, key=lambda movie: -movie[2])
for movie in movies:
print(movie)
And voilà!
(1994, 'The Shawshank Redemption', 9.2)
(1972, 'The Godfather', 9.2)
(2008, 'The Dark Knight', 9.0)
(1994, 'Pulp Fiction', 8.9)
(1999, 'Fight Club', 8.8)