Find the median of an unsorted stream of integers at any given time

An unsorted integer stream is coming, we need to always keep the median of all the stream elements. The median can be defined as follows.

To find the median of any given array,

  1. Sort the array
  2. Find integer(s) in the middle of the sorted array
  3. The average of middle these integers is median ( If total elements are in odd size then we get only 1 middle integer, otherwise these are 2 in which we have to take an average).

For example integer {10, 5, 8, 2, 20, 13}.

Sorted array is { 2, 5, 8, 10, 13, 20 }. Here the size of the array is even (which is 6), we have to calculate the average of integers in the middle which are {8, 10} is (8+10)/2 which is 9.

Median in stream of integers

As per the input, calculating median in integers won’t solve our problem. Here we are getting a stream of integers and we have to calculate median at any point of time when stream keeps on coming.

This problem can be solved by having two equally sized heaps. One holds the first half of the sorted integers (which is max-heap) whereas the other holds the second half of the sorted integers (which is min-heap). Before we go deep into this problem, I would recommend you to understand the min-heap and max heap data structures.

Let see below pictorial representation of the two heap solution.

Consider we have max-heap with {5} and min-heap with {10} and trying to insert 8 in it.

As 8 is greater than integer 5. So, 8 is going to insert in right side min-heap and the result looks like below.

After insertion of integer “8” in the min-heap the median is “8”. Now we are trying to insert integer 20.

20 is also greater than integer “8” which is on top of the min-heap. It should actually insert in the min-heap, but as max-heap size is lesser here, we have move the minimum integer “8” from the min-heap to max-heap. Now the heaps are as showing below.

This way we have to insert each incoming integer by balancing both the heaps.

Let’s see the code for this two heaps solution

float findMedian(MaxHeap *heap1, MinHeap *heap2, int val)
{
    int size1 = heap1->getSize();
    int size2 = heap2->getSize();
    if (size1 == size2)
    {
        if (size1 == 0)
        {
            heap1->insert(val);
        }
        else
        {
            if (val < heap1->getMax())
            {
                heap1->insert(val);
            }
            else
            {
                heap2->insert(val);
            }
        }
    }
    else
    {
        if (val < heap1->getMax())
        {
            if (size1 > size2)
            {
                heap2->insert(heap1->getMax());
                heap1->removeTop();
                heap1->insert(val);
            }
            else
            {
                heap1->insert(val);
            }
        }
        else
        {
            if (size2 > size1)
            {
                heap1->insert(heap2->getMin());
                heap2->removeTop();
                heap2->insert(val);
            }
            else
            {
                heap2->insert(val);
            }
        }
    }
    if (heap1->getSize() == heap2->getSize())
    {
        return (heap1->getMax() + heap2->getMin()) / 2.0f;
    }
    else
    {
        if (heap1->getSize() > heap2->getSize())
        {
            return (float)heap1->getMax();
        }
        else
        {
            return (float)heap2->getMin();
        }
    }
}

Complexity to get median by inserting into heap is: O(logN).

For any group of “N” size stream the complexity will be O(NlogN).

Complete code with main is as follows.

// Output
Median: 10
Median: 7.5
Median: 8
Median: 6.5
Median: 8
Median: 9
Median: 10

Runtime complexity here is O(NlogN) and Space is O(N).

This is the best possible solution that we have to get the median of integer stream at any given point of time. If you know any better solution than this, please let our readers know by commenting here below. Happy Learning!.