Two Sum is a rather famous coding interview problem, often asked in Python coding interviews. Being the problem #1 on LeetCode, you may also know it as the LeetCode Two Sum.

The problem is stated as follows:

Given an array of integers A = [a1, a2, ...] and an integer S , write a function two_sum(A,S) that returns all pairs of indices (i, j), i≠j such that A[i] + A[j] == S .
If there are no such pairs, return an empty array [] .

In other words, we want to find all pairs of integers from A that add up to S and return their positions in the array.

Here are a few examples:

two_sum(A=[5, 7, 2], S=9) -> [(1, 2)] # A + A == 9
two_sum(A=[3, 5, 7, 9], S=12) -> [(0, 3), (1, 2)]
two_sum(A=[3, 23, 19], S=21) -> [] # There is no such pair
two_sum(A=, S=10) -> [] # Only one integer, no valid pairs

Seems easy, so what's the catch? We'll there's a better solution than the naive one. Let's go step by step.

## Naive Solution - Two Nested Loops, O(n²)

Let's consider the following solution: for each distinct pair of indices (i,j) check if A[i] + A[j] == S. If true, add the pair to the results.

def two_sum(A, S):
pairs = []
for i in range(len(A)):
for j in range(i+1, len(A)):
if A[i] + A[j] == S:
pairs.append((i, j))

return pairs



It wouldn't be such a well known problem if this was the desired solution, right?

The time complexity of this solution is O(n²). In fact, we'll have exactly $$\sum_{n=1}^{len(A)-1}n=\frac{\left(len(A)-1\right)\cdot\left(1+len(A)-1\right)}{2}=\frac{\left(len(A)-1\right)\cdot len(A)}{2}$$ iterations of the inner loop. Here $len(A)$ denotes the number of elements in our input array A .

This means that for an input array with 20,000 integers we'll check   1.999.900.000 pairs. This takes ~20 seconds on an intel i7 machine.

## Using a Hash Map - O(n)

Let's analyze our solution closely.

We have 2 nested loops. The outer loop considers each integer in the array, A[i]. The inner loop then searches for the complement, A[j] such that A[i] + A[j] == S.

Do you see the catch? For each A[i] we know exactly what number we're looking for - it's the complement C , such that C == S - A[i].

Here's the trick: If we already knew all the positions of the complement in A we could just add the pairs of indices to the result and move to the next i.

So, let's do exactly that with a hash map, which is nothing other than dict in Python. We'll first construct a dict which for each integer x from A holds an array of indices where x occurs.

For example, for the following array A:

[2, 6, 10, 3, 7, 5, 9, 6, 6, 1]

We want to build the following dictionary:

{
1: ,
2: ,
3: ,
5: ,
6: [1, 7, 8 ],
7: ,
9: ,
10: 
}


Once we have it, it takes O(1) time to find out that 6 occurs in A at indices [1, 7, 8], 9 occurs at index , and so on.

The following function builds this dictionary for us:

def build_index_dict(A):
indices = {}

for i in range(len(A)):
a = A[i]
if a not in indices:
indices[a] = []
indices[a].append(i)

return indices


Now let's use this lookup dict to speed up our initial solution:

def build_index_dict(A):
indices = {}

for i in range(len(A)):
a = A[i]
if a not in indices:
indices[a] = []
indices[a].append(i)

return indices

def two_sum(A, S):
index_dict = build_index_dict(A)

pairs = []
for i in range(len(A)):
C = S - A[i] # C <- complement of A[i] to S
if C in index_dict:
for j in index_dict[C]: # run through all indices of C
if j > i: # make sure we're not repeating any pairs
pairs.append((i, j))

return pairs


Now, you may argue that this is still an O(n²) solution. You'd be right.

To see why this is still O(n²), consider the following input ([1, 1, 1, 1, 1, 1], 2) . However, it's not hard to see that our hash map solution will be at least as good as the naive solution, for any input.

Now consider a slightly altered problem formulation - assume that all integers in A[i] are unique. In that case, we can change the solution as follows:

• Indices dict won't need to hold an array of indices. For each integer there is only one position in A where it occurs.
• For each A[i] the complement either exists at a single position, or it doesn't exist at all. The inner loop is not really a loop anymore.
def build_index_dict(A):
indices = {}

for i in range(len(A)):
a = A[i]
indices[a] = i

return indices

def two_sum(A, S):
index_dict = build_index_dict(A)

pairs = []
for i in range(len(A)):
C = S - A[i]
if C in index_dict:
j = index_dict[C]
if j > i:
pairs.append((i, j))

return pairs


This makes our solution run in O(n) time: we build index_dict in O(n) and find all pairs of indices in O(n).