Original

We would like to select **k** items from a large data stream of unknown size **n**. All the items in the stream should be equally likely to be chosen. However, there is usually no sufficient memory to save the data stream and that's where **Reservoir Sampling** comes into play.

Assume that we have a data stream of n items [i1, i2, ... in ].

When **k = 1**， the size of the reservoir **x** is one. We have to iterate through the data stream to consider each item until all the data are exhausted. The probability of each item being selected should be **1 / n**.

- When the 1st item i1 is considered, let x = i1. So the probability of i1 being selected Pi1 = 1.
- When the 2rd item i2 is considered, we replace x with i2 with probability 1/2. Therefore, Pi1 = 1 × (1 - 1/2) = 1/2, Pi2 = 1/2.
- When the 3rd item i3 is considered, we replace x with i3 with probability 1/3. Therefore, Pi1 = 1/2 × (1 - 1/3) = 1/3, Pi2 = 1/2 × (1 - 1/3) = 1/3, pi3 = 1/3.
- ... ...
- Finally when the last item in is considered, we replace x with in with probability 1/n. Therefore, Pi1 = Pi2 = .... = Pin = 1/n.

When **k > 1**, the probability of each item being selected should be **k / n**.

- For the first k items, we add them to the reservoir. So, Pi1 = Pi2 = .... = Pik = 1.
- For the k + 1 th item ik+1, we keep it with probability k / (k + 1) and let it substitute a random item in the current reservoir. For an item among i1 - ik, the probability of it being in the reservoir now is 1 - 1 / k × k / (k + 1) = k / (k + 1).
- For the k + 2 th item ik+2, we keep it with probalibility k / (k + 2). For an item among i1 - ik+1, the probability of it being in the reservoir now is k / (k + 1) × (1 - 1 / k × k / (k + 2) )= k / (k + 2).
- ... ...
- Finally for the n th item, we keep it with probability k / n. Similarly, for an item among i1 - ikn, the probability of it being in the reservoir now is k / n.

```
import random
def reservoirSampling(nums, k):
n = len(nums)
results = nums[:k]
for i in range(k, n):
rand = random.randrange(n)
if rand < k:
results[rand] = nums[i]
return results
```

**References:**

[1] Leetcode 382. Linked List Random Node

[2] Illustration

## Comments