Python enumerate - How it actually works?
What does enumerate
do in Python?
Let's motivate the answer with a very simple task:
Given a list, print the elements of the list along with their position in the list (starting from 1)
The most obvious way to do this is by looping while maintaining an index variable i
:
cities = [
'Tokyo',
'New York',
'London',
'Los Angeles',
]
if __name__ == '__main__':
i = 0
while i < len(cities):
print(i+1, cities[i])
i += 1
1 Tokyo
2 New York
3 London
4 Los Angeles
That looks ugly, right. We can do it in a more Pythonic way, by iterating through the list and updating i
along the way, starting from 1:
cities = [
'Tokyo',
'New York',
'London',
'Los Angeles',
]
if __name__ == '__main__':
i = 1
for city in cities:
print(i, city)
i += 1
That looks a little less like C++.
But we can do even better.
The enumerate
built-in function - simple iteration with a counter
Instead of iterating over the original sequence, enumerate
lets us iterate over a sequence of tuples (count, element)
.
cities = [
'Tokyo',
'New York',
'London',
'Los Angeles',
]
if __name__ == '__main__':
for i, city in enumerate(cities, start=1):
print(i, city)
By default the counter starts at 0. In our example we wanted it to start from 1, so we passed it as the start
parameter to enumerate
.
So, intuitively, instead of iterating over a list:
[
'Tokyo',
'New York',
'London',
'Los Angeles',
]
enumerate
enables us to iterate through the list:
[
(0, 'Tokyo'),
(1, 'New York'),
(2, 'London'),
(3, 'Los Angeles'),
]
By taking advantage of tuple unpacking, we get the counter/index variable for free in our loops:
for i, city in enumerate(cities):
...
How enumerate actually works?
Our explanation above is not entirely correct, though.
To make things simple and clear, we said that instead of iterating through the original list, enumerate
lets us iterate through a list of (counter, element)
tuples.
Technically this is not true. enumerate
doesn't build any new lists internally. Additionally it works on any iterable type, such as dict
, set
or tuple
.
In fact, calling enumerate()
on a container/sequence, creates a new iterator object. But what's an iterator object? 🤔
The shortest intro to Python iterators
An iterator is an object that exposes two special methods: __iter__()
and __next__()
.
In the simplest (and most common) cases, __iter__()
just returns the iterator object itself, and is invoked implicitly when looping using the iterator. For example, when we say:
for city in cities:
...
the __iter__()
method of the cities
list will be invoked to start the iteration. To explicitly get an iterator for a container, we use the built-in method iter()
.
The __next__()
method is more interesting. It's used to advance the iterator to the next iteration step. When iterating over a list, __next__()
gives us the next element of the list. Note that it gets invoked behind the scenes before each iteration of a for loop. To invoke __next__()
explicitly we'll use the built-in function next()
on an iterator.
Knowing this, we can play around and rewrite any for
loop in a more elaborate and ugly way using iterators explicitly. Understanding how it works will be crucial for the next step, where we build enumerate
from scratch. So:
for city in cities:
print(city)
becomes:
iterator = iter(cities)
while True:
try:
city = next(iterator)
except StopIteration:
break
print(city)
Even with this limited knowledge of iterators, we're dangerous enough to build our own enumerate
.
enumerate
from scratch
Let's call it EnumerateFromScratch
. Here's what it looks like:
class EnumerateFromScratch:
def __init__(self, container, start=0):
self.container_iterator = iter(container)
self.counter = start
def __iter__(self):
return self
def __next__(self):
return_val = (self.counter, next(self.container_iterator))
self.counter += 1
return return_val
Let's test it:
cities = [
'Tokyo',
'New York',
'London',
'Los Angeles',
]
if __name__ == '__main__':
for i, city in EnumerateFromScratch(cities):
print(i, city)
0 Tokyo
1 New York
2 London
3 Los Angeles
🎉
So how does EnumerateFromScratch
work?
Let's start with the __next__()
method. Remember that simply iterating through a container (eg. list) yields one element per iteration. Our goal here is to have a counter incrementing with each iteration step. That is exactly what we do in __next__()
:
- we prepare the return tuple for this step of iteration
return_val = (self.counter, next(self.container_iterator))
- we increment the counter
- we return the iteration tuple
If you know how classes work, it should be obvious how the counter works - we initialize it when constructing the iterator, and increment whenever we hop to the next step.
But how do we know what is the next element from the original container? Note that in __init__()
we get a fresh new iterator for the sequence. We never save the reference to the container itself - the iterator is enough. Then, in each call to EnumerateFromScratch.__next__()
we'll explicitly advance the container iterator by calling:
next(self.container_iterator)
That's all there is to it.
Seeing a custom iterator class for the first time can be confusing. Luckily, it takes just a few simple exercises to really understand how everything works. Starting with our EnumerateFromScratch
class, try this:
- change
EnumerateFromScratch
so that it iterates over every element, but increments the counter by 2 - change
EnumerateFromScratch
so that it's works the same as a simple iteration of the original sequence (ie. no counter is added) - change
EnumerateFromScratch
so that it skips elements at odd indices - 1, 3, 5, ...