Python sort lambda - How to sort with lambda key function in Python

Sorting in Python is simple. You'll pass a list, dict, set or any other kind of iterable object to the built-in function sorted. It returns a sorted list of elements of that object.

Here's an example:

A = [4, 2, 5, 3, 1] # An unsorted array of integers

B = sorted(A) # sorted(A) returns a sorted copy of A

print(B) 
[1, 2, 3, 4, 5]

Sorting complex objects using a lambda key function

Let's say we have a list of city names along with their country, population and area. We'd like to sort the list in various ways:

  • Sort by population
  • Sort alphabetically by country. If there are cities from the same country order them alphabetically by name
  • Sort by population density

For the sake of example let's represent cities as simple Python dicts:

cities = [
	{
		"name": "New York City",
		"country": "United States",
		"population": 20.14,
		"area": 1223.59
	},
	{
		"name": "Tokyo",
		"country": "Japan",
		"population": 37.47,
		"area": 2194.07,
	},
	{
		"name": "Los Angeles",
		"country": "United States",
		"population": 13.2,
		"area": 1299.01,
	},
	{
		"name": "Madrid",
		"country": "Spain",
		"population": 6.79,
		"area": 604.31,
	},
	{
		"name": "Osaka",
		"country": "Japan",
		"population": 19.3,
		"area": 5107.0,
	},
	{
		"name": "London",
		"country": "United Kingdom",
		"population": 14.26,
		"area": 8382.0,
	}
]

What do you think happens if we just call sorted(dicts)? How will the sorted function know which field(s) of our cities to use for sorting?

We'll need to explicitly tell the sorted method how we want to sort our objects. For that we'll use the key parameter.

Here's how we'd do this for our examples:

  1. Sort by population

We'll tell sorted that for each city, we want it to take population as the sorting key. This means that each city will be compared against other cities by looking only at their population field.

# Sort by population
cities = sorted(cities, key=lambda city: city['population'])

A very common pattern is to reverse the sorting order by using negative value of the sorting key (note the minus sign in front of city['population'])

# Sort by population DESCENDING
cities = sorted(cities, key=lambda city: -city['population'])
 

2. Sort alphabetically by country, then by name

Here, for each city we'll provide a tuple (country, name) to the sorting function. As tuples are sorted by comparing them field by field, this will effectively sort our cities by country, then by name.

# Sort by country, then by name
cities = sorted(cities, key=lambda city: (city['country'], city['name']))

3. Sort by population density

A nice trick with key function is that we can provide an arbitrary function that accepts an object to be sorted, for example we can calculate population density on the fly.

# Sort by population density
cities = sorted(cities, key=lambda city: (city['population'] / city['area']))

How it actually works?

In order to understand how sorting by key works in the background, let's first see how sorted works when sorting lists of lists (or tuples).

Let's say we have a list of movies with their ratings and year of release:

movies = [
	(1994, 'The Shawshank Redemption', 9.2),
	(1999, 'Fight Club', 8.8),
	(1994, 'Pulp Fiction', 8.9),
	(1972, 'The Godfather', 9.2),
	(2008, 'The Dark Knight', 9.0)
]

Calling sorted(movies) returns the following list:

(1972, 'The Godfather', 9.2)
(1994, 'Pulp Fiction', 8.9)
(1994, 'The Shawshank Redemption', 9.2)
(1999, 'Fight Club', 8.8)
(2008, 'The Dark Knight', 9.0)

Can you guess what happened?

The tuples get sorted by first comparing their elements at position 0, in this case the year. In case of a tie, the tuples are then sorted by comparing the second element, then the third, and so on.

In our movie example, this means that the movies get sorted by comparing:

  1. movie[0] - year or release, ascending
  2. movie[1] - title, lexicographically (case sensitive), ascending
  3. movie[2] - rating, ascending

Note that sorted always sorts in ascending order unless we modify its behaviour using key or reverse parameters.

Understanding how lists of tuples or lists of lists are sorted is crucial to understanding more complex sorting patterns - those using the key parameter.

How it works - Sorted key parameter

In our previous example, what if we wanted to sort the movies by rating first, then by year?

Python's sorted function offers a simple solution - use the key parameter. The official Python docs explain this in simple terms:

Both list.sort() and sorted() have a key parameter to specify a function (or other callable) to be called on each list element prior to making comparisons.

The value of the key parameter should be a function (or other callable) that takes a single argument and returns a key to use for sorting purposes.

Let's use this knowledge to sort our movies by rating, in descending order.

movies = sorted(movies, key=lambda movie: movie[2], reverse=True)

This will get us the following ordering of movies (note the descending ordering of ratings)

(1994, 'The Shawshank Redemption', 9.2)
(1972, 'The Godfather', 9.2)
(2008, 'The Dark Knight', 9.0)
(1994, 'Pulp Fiction', 8.9)
(1999, 'Fight Club', 8.8)

The key thing to unpack here is key=lambda movie: movie[2] - what exactly is going on here?

Lambda functions - what are they?

First we need to understand what a lambda is. A really simple concept with a fancy name, lambda functions or simply lambdas are a syntactic sugar in Python.

The lambda keyword allows us to declare small, anonymous, single-line functions, taking one or more parameters and returning a value of an expression.

So when we say:

f = lambda x, y: x + y

It is equivalent to:

def f(x, y):
	return x + y

Lambdas really shine when we need to supply a simple function as a parameter to another function, such as supplying a function as a key parameter to sorted.

Back to our sorting example, when we write:

movies = sorted(movies, key=lambda movie: movie[2], reverse=True)

it is just a shorthand for:

def movie_key(movie):
	return movie[2]

movies = sorted(movies, key=movie_key, reverse=True)

Sorting by key under the hood

In order to understand how the key parameter is used in sorted, let's implement a sorting algorithm and arm it with a key parameter.

For the sake of simplicity we'll implement Bubble sort, but please note that the builtin sorted method definitely uses more efficient sorting algorithms.  

def bubble_sort(array: list):
	new_array = [e for e in array] # Create a copy of array to work on

	while True:
		swapped = False
		for i in range(0, len(new_array) - 1):
			if new_array[i+1] < new_array[i]:
				new_array[i], new_array[i+1] = new_array[i+1], new_array[i] # Swap elements that are out of order
				swapped = True

		if not swapped:
			break

	return new_array

The most important part is:

if new_array[i+1] < new_array[i]:
	new_array[i], new_array[i+1] = new_array[i+1], new_array[i]

In case a smaller element is found after a larger element, they get swapped. In order to facilitate sorting by arbitrary key function we'll modify our algorithm as follows:

def bubble_sort(array: list, key):
	new_array = [e for e in array] # Create a copy of array to work on

	while True:
		swapped = False
		for i in range(0, len(new_array) - 1):
			if key(new_array[i+1]) < key(new_array[i]):
				new_array[i], new_array[i+1] = new_array[i+1], new_array[i] # Swap elements that are out of order
				swapped = True

		if not swapped:
			break

	return new_array

The important line here is:

if key(new_array[i+1]) < key(new_array[i]):
	new_array[i], new_array[i+1] = new_array[i+1], new_array[i]

Note that we're now comparing key(a) vs key(b) instead of a vs b. This allows us to impose arbitrary sorting rules by using a simple function returning a sorting key. You can imagine that there's a very similar line somewhere inside the code for Python's sorted function.

Let's test our bubble_sort function - we'll sort our movies by rating in descending order:

movies = [
	(1994, 'The Shawshank Redemption', 9.2),
	(1999, 'Fight Club', 8.8),
	(1994, 'Pulp Fiction', 8.9),
	(1972, 'The Godfather', 9.2),
	(2008, 'The Dark Knight', 9.0)
]

movies = sorted(movies, key=lambda movie: -movie[2])

for movie in movies:
	print(movie)

And voilà!

(1994, 'The Shawshank Redemption', 9.2)
(1972, 'The Godfather', 9.2)
(2008, 'The Dark Knight', 9.0)
(1994, 'Pulp Fiction', 8.9)
(1999, 'Fight Club', 8.8)