Leetcode Two Sum - The Optimal Solution in Python
Two Sum is a rather famous coding interview problem, often asked in Python coding interviews. Being problem #1 on LeetCode, you may also know it as the LeetCode Two Sum.
The problem is stated as follows:
Given an array of integersA = [a1, a2, ...]
and an integerS
, write a functiontwo_sum(A,S)
that returns all pairs of indices(i, j), i≠j
such thatA[i] + A[j] == S
.
If there are no such pairs, return an empty array[]
.
In other words, we want to find all pairs of integers from A
that add up to S
and return their positions in the array.
Here are a few examples:
two_sum(A=[5, 7, 2], S=9) -> [(1, 2)] # A[1] + A[2] == 9
two_sum(A=[3, 5, 7, 9], S=12) -> [(0, 3), (1, 2)]
two_sum(A=[3, 23, 19], S=21) -> [] # There is no such pair
two_sum(A=[1], S=10) -> [] # Only one integer, no valid pairs
Seems easy, so what's the catch? We'll there's a better solution than the naive one. Let's go step by step.
Naive Solution - Two Nested Loops, O(n²)
Let's consider the following solution: for each distinct pair of indices (i,j)
check if A[i] + A[j] == S
. If true, add the pair to the results.
def two_sum(A, S):
pairs = []
for i in range(len(A)):
for j in range(i+1, len(A)):
if A[i] + A[j] == S:
pairs.append((i, j))
return pairs
It wouldn't be such a well known problem if this was the desired solution, right?
The time complexity of this solution is O(n²). In fact, we'll have exactly $$\sum_{n=1}^{len(A)-1}n=\frac{\left(len(A)-1\right)\cdot\left(1+len(A)-1\right)}{2}=\frac{\left(len(A)-1\right)\cdot len(A)}{2}$$ iterations of the inner loop. Here $len(A)$ denotes the number of elements in our input array A
.
This means that for an input array with 20,000 integers we'll check 1.999.900.000 pairs. This takes ~20 seconds on an intel i7 machine.
Using a Hash Map - O(n)
Let's analyze our solution closely.
We have 2 nested loops. The outer loop considers each integer in the array, A[i]
. The inner loop then searches for the complement, A[j]
such that A[i] + A[j] == S
.
Do you see the catch? For each A[i]
we know exactly what number we're looking for - it's the complement C
, such that C == S - A[i]
.
Here's the trick: If we already knew all the positions of the complement in A
we could just add the pairs of indices to the result and move to the next i
.
So, let's do exactly that with a hash map, which is nothing other than dict
in Python. We'll first construct a dict
which for each integer x
from A
holds an array of indices where x
occurs.
For example, for the following array A
:
[2, 6, 10, 3, 7, 5, 9, 6, 6, 1]
We want to build the following dictionary:
{
1: [9],
2: [0],
3: [3],
5: [5],
6: [1, 7, 8 ],
7: [4],
9: [6],
10: [2]
}
Once we have it, it takes O(1) time to find out that 6 occurs in A at indices [1, 7, 8]
, 9 occurs at index [6]
, and so on.
The following function builds this dictionary for us:
def build_index_dict(A):
indices = {}
for i in range(len(A)):
a = A[i]
if a not in indices:
indices[a] = []
indices[a].append(i)
return indices
Now let's use this lookup dict to speed up our initial solution:
def build_index_dict(A):
indices = {}
for i in range(len(A)):
a = A[i]
if a not in indices:
indices[a] = []
indices[a].append(i)
return indices
def two_sum(A, S):
index_dict = build_index_dict(A)
pairs = []
for i in range(len(A)):
C = S - A[i] # C <- complement of A[i] to S
if C in index_dict:
for j in index_dict[C]: # run through all indices of C
if j > i: # make sure we're not repeating any pairs
pairs.append((i, j))
return pairs
Now, you may argue that this is still an O(n²) solution. You'd be right.
To see why this is still O(n²), consider the following input ([1, 1, 1, 1, 1, 1], 2)
. However, it's not hard to see that our hash map solution will be at least as good as the naive solution, for any input.
Now consider a slightly altered problem formulation - assume that all integers in A[i] are unique. In that case, we can change the solution as follows:
- Indices dict won't need to hold an array of indices. For each integer there is only one position in
A
where it occurs. - For each
A[i]
the complement either exists at a single position, or it doesn't exist at all. The inner loop is not really a loop anymore.
def build_index_dict(A):
indices = {}
for i in range(len(A)):
a = A[i]
indices[a] = i
return indices
def two_sum(A, S):
index_dict = build_index_dict(A)
pairs = []
for i in range(len(A)):
C = S - A[i]
if C in index_dict:
j = index_dict[C]
if j > i:
pairs.append((i, j))
return pairs
This makes our solution run in O(n) time: we build index_dict
in O(n) and find all pairs of indices in O(n).