# Leetcode Two Sum - The Optimal Solution in Python

*Two Sum* is a rather famous coding interview problem, often asked in Python coding interviews. Being problem #1 on LeetCode, you may also know it as the *LeetCode Two Sum.*

The problem is stated as follows:

Given an array of integers`A = [a1, a2, ...]`

and an integer`S`

, write a function`two_sum(A,S)`

that returns all pairs of indices`(i, j), i≠j`

such that`A[i] + A[j] == S`

.

If there are no such pairs, return an empty array`[]`

.

In other words, we want to find all pairs of integers from `A`

that add up to `S`

and return their positions in the array.

Here are a few examples:

```
two_sum(A=[5, 7, 2], S=9) -> [(1, 2)] # A[1] + A[2] == 9
two_sum(A=[3, 5, 7, 9], S=12) -> [(0, 3), (1, 2)]
two_sum(A=[3, 23, 19], S=21) -> [] # There is no such pair
two_sum(A=[1], S=10) -> [] # Only one integer, no valid pairs
```

Seems easy, so what's the catch? We'll there's a better solution than the naive one. Let's go step by step.

## Naive Solution - Two Nested Loops, O(n²)

Let's consider the following solution: for each distinct pair of indices `(i,j)`

check if `A[i] + A[j] == S`

. If true, add the pair to the results.

```
def two_sum(A, S):
pairs = []
for i in range(len(A)):
for j in range(i+1, len(A)):
if A[i] + A[j] == S:
pairs.append((i, j))
return pairs
```

It wouldn't be such a well known problem if this was the desired solution, right?

The time complexity of this solution is O(n²). In fact, we'll have exactly $$\sum_{n=1}^{len(A)-1}n=\frac{\left(len(A)-1\right)\cdot\left(1+len(A)-1\right)}{2}=\frac{\left(len(A)-1\right)\cdot len(A)}{2}$$ iterations of the inner loop. Here $len(A)$ denotes the number of elements in our input array `A`

.

This means that for an input array with 20,000 integers we'll check 1.999.900.000 pairs. This takes ~20 seconds on an intel i7 machine.

## Using a Hash Map - O(n)

Let's analyze our solution closely.

We have 2 nested loops. The outer loop considers each integer in the array, `A[i]`

. The inner loop then searches for the complement, `A[j]`

such that `A[i] + A[j] == S`

.

Do you see the catch? For each `A[i]`

we know exactly what number we're looking for - it's the complement `C`

, such that `C == S - A[i]`

.

Here's the trick: If we already knew all the positions of the complement in `A`

we could just add the pairs of indices to the result and move to the next `i`

.

So, let's do exactly that with a hash map, which is nothing other than `dict`

in Python. We'll first construct a `dict`

which for each integer `x`

from `A`

holds an array of indices where `x`

occurs.

For example, for the following array `A`

:

`[2, 6, 10, 3, 7, 5, 9, 6, 6, 1]`

We want to build the following dictionary:

```
{
1: [9],
2: [0],
3: [3],
5: [5],
6: [1, 7, 8 ],
7: [4],
9: [6],
10: [2]
}
```

Once we have it, it takes O(1) time to find out that `6 occurs in A at indices [1, 7, 8]`

, `9 occurs at index [6]`

, and so on.

The following function builds this dictionary for us:

```
def build_index_dict(A):
indices = {}
for i in range(len(A)):
a = A[i]
if a not in indices:
indices[a] = []
indices[a].append(i)
return indices
```

Now let's use this lookup dict to speed up our initial solution:

```
def build_index_dict(A):
indices = {}
for i in range(len(A)):
a = A[i]
if a not in indices:
indices[a] = []
indices[a].append(i)
return indices
def two_sum(A, S):
index_dict = build_index_dict(A)
pairs = []
for i in range(len(A)):
C = S - A[i] # C <- complement of A[i] to S
if C in index_dict:
for j in index_dict[C]: # run through all indices of C
if j > i: # make sure we're not repeating any pairs
pairs.append((i, j))
return pairs
```

Now, you may argue that this is still an O(n²) solution. You'd be right.

To see why this is still O(n²), consider the following input `([1, 1, 1, 1, 1, 1], 2)`

. However, it's not hard to see that our hash map solution will be at least as good as the naive solution, for any input.

Now consider a slightly altered problem formulation - *assume that all integers in A[i] are unique*. In that case, we can change the solution as follows:

- Indices dict won't need to hold an array of indices. For each integer there is only one position in
`A`

where it occurs. - For each
`A[i]`

the complement either exists at a single position, or it doesn't exist at all. The inner loop is not really a loop anymore.

```
def build_index_dict(A):
indices = {}
for i in range(len(A)):
a = A[i]
indices[a] = i
return indices
def two_sum(A, S):
index_dict = build_index_dict(A)
pairs = []
for i in range(len(A)):
C = S - A[i]
if C in index_dict:
j = index_dict[C]
if j > i:
pairs.append((i, j))
return pairs
```

This makes our solution run in O(n) time: we build `index_dict`

in O(n) and find all pairs of indices in O(n).