Leetcode Two Sum - The Optimal Solution in Python

Two Sum is a rather famous coding interview problem, often asked in Python coding interviews. Being problem #1 on LeetCode, you may also know it as the LeetCode Two Sum.

The problem is stated as follows:

Given an array of integers A = [a1, a2, ...] and an integer S , write a function two_sum(A,S) that returns all pairs of indices (i, j), i≠j such that A[i] + A[j] == S .
If there are no such pairs, return an empty array [] .

In other words, we want to find all pairs of integers from A that add up to S and return their positions in the array.

Here are a few examples:

two_sum(A=[5, 7, 2], S=9) -> [(1, 2)] # A[1] + A[2] == 9
two_sum(A=[3, 5, 7, 9], S=12) -> [(0, 3), (1, 2)]
two_sum(A=[3, 23, 19], S=21) -> [] # There is no such pair
two_sum(A=[1], S=10) -> [] # Only one integer, no valid pairs

Seems easy, so what's the catch? We'll there's a better solution than the naive one. Let's go step by step.

Naive Solution - Two Nested Loops, O(n²)

Let's consider the following solution: for each distinct pair of indices (i,j) check if A[i] + A[j] == S. If true, add the pair to the results.

def two_sum(A, S):
	pairs = []
	for i in range(len(A)):
		for j in range(i+1, len(A)):
			if A[i] + A[j] == S:
				pairs.append((i, j))

	return pairs

It wouldn't be such a well known problem if this was the desired solution, right?

The time complexity of this solution is O(n²). In fact, we'll have exactly $$\sum_{n=1}^{len(A)-1}n=\frac{\left(len(A)-1\right)\cdot\left(1+len(A)-1\right)}{2}=\frac{\left(len(A)-1\right)\cdot len(A)}{2}$$ iterations of the inner loop. Here $len(A)$ denotes the number of elements in our input array A .

This means that for an input array with 20,000 integers we'll check   1.999.900.000 pairs. This takes ~20 seconds on an intel i7 machine.

Using a Hash Map - O(n)

Let's analyze our solution closely.

We have 2 nested loops. The outer loop considers each integer in the array, A[i]. The inner loop then searches for the complement, A[j] such that A[i] + A[j] == S.

Do you see the catch? For each A[i] we know exactly what number we're looking for - it's the complement C , such that C == S - A[i].

Here's the trick: If we already knew all the positions of the complement in A we could just add the pairs of indices to the result and move to the next i.

So, let's do exactly that with a hash map, which is nothing other than dict in Python. We'll first construct a dict which for each integer x from A holds an array of indices where x occurs.

For example, for the following array A:

[2, 6, 10, 3, 7, 5, 9, 6, 6, 1]

We want to build the following dictionary:

{
  1: [9],
  2: [0],
  3: [3],
  5: [5],
  6: [1, 7, 8 ],
  7: [4],
  9: [6],
  10: [2]
}

Once we have it, it takes O(1) time to find out that 6 occurs in A at indices [1, 7, 8], 9 occurs at index [6], and so on.

The following function builds this dictionary for us:

def build_index_dict(A):
	indices = {}

	for i in range(len(A)):
		a = A[i]
		if a not in indices:
			indices[a] = []
		indices[a].append(i)
        
	return indices

Now let's use this lookup dict to speed up our initial solution:

def build_index_dict(A):
	indices = {}

	for i in range(len(A)):
		a = A[i]
		if a not in indices:
			indices[a] = []
		indices[a].append(i)
        
	return indices
    

def two_sum(A, S):
	index_dict = build_index_dict(A)

	pairs = []
	for i in range(len(A)):
		C = S - A[i] # C <- complement of A[i] to S
		if C in index_dict:
			for j in index_dict[C]: # run through all indices of C
				if j > i: # make sure we're not repeating any pairs
					pairs.append((i, j))

	return pairs

Now, you may argue that this is still an O(n²) solution. You'd be right.

To see why this is still O(n²), consider the following input ([1, 1, 1, 1, 1, 1], 2) . However, it's not hard to see that our hash map solution will be at least as good as the naive solution, for any input.

Now consider a slightly altered problem formulation - assume that all integers in A[i] are unique. In that case, we can change the solution as follows:

  • Indices dict won't need to hold an array of indices. For each integer there is only one position in A where it occurs.
  • For each A[i] the complement either exists at a single position, or it doesn't exist at all. The inner loop is not really a loop anymore.
def build_index_dict(A):
	indices = {}

	for i in range(len(A)):
		a = A[i]
		indices[a] = i

	return indices


def two_sum(A, S):
	index_dict = build_index_dict(A)

	pairs = []
	for i in range(len(A)):
		C = S - A[i]
		if C in index_dict:
			j = index_dict[C]
			if j > i:
				pairs.append((i, j))

	return pairs

This makes our solution run in O(n) time: we build index_dict in O(n) and find all pairs of indices in O(n).