Boyer–Moore majority vote algorithm

Links: Original Paper published in 1991 (pg. 105-117)

I was solving 169. Majority Element and was struggling to find a one passalgorithm with O(1) space (these are called streaming algorithms). The best I could figure out was in O(n^2) time + O(1) space or O(nlogn) time + O(n) space. But then I saw this mind-blowing one pass approach called MJRTY – Vast Majority Vote Algorithm by Boyer and Moore.

The approach becomes evident with the image below:

https://www.cs.utexas.edu/~moore/best-ideas/mjrty/example.html#step04

We start counting the first element, if a different element appears, instead of start counting that, we decrease the current count. When count becomes 0, we start counting whatever element we encounter next. At the end of the iteration, what we are left with is the majority element.

This only works when an element appears more than n/2 times through the array. The why is pretty intuitive and logical.

Implementation

int majorityElement(vector<int>& nums) {
int x, cnt = 0;
for (int i = 0; i < nums.size(); i++){
if (cnt == 0){
x = nums[i];
cnt++;
}
else{
if (nums[i] == x) cnt++;
else cnt--;
}
}
return x;
}

or faster if written like this:

int majorityElement(vector<int>& nums) {
int x = nums[0];
int cnt = 1;
for(int i=1;i<nums.size();i++){
if(nums[i] == x){
cnt++;
}
else{
cnt--;
}
if(cnt == 0){
x = nums[i];
cnt = 1;
}
}
return x;
}

Queue – Basics

Implementation

may use dynamic array and an index pointing to the head of the queue.

should support enqueue (appends a new element), dequeue (removes the first element).

class MyQueue {
    private:
        vector<int> data;      
        int p_start;            
    public:
        MyQueue() {p_start = 0;}
        bool enQueue(int x) {
            data.push_back(x);
            return true;
        }

        bool deQueue() {
            if (isEmpty()) {
                return false;
            }
            p_start++;
            return true;
        };
        int Front() {
            return data[p_start];
        };
        bool isEmpty()  {
            return p_start >= data.size();
        }
};

But this has drawbacks. Memory is wasted every time we dequeue.

Thus, need a circular queue. 

Circular Queue

fixed-size array, 2 pointers indicating head and tail. Goal is to reuse the wasted storage.

In designing a DS, it’s good to have as few attributes as possible and maintain a simple manipulation logic. Sometimes, there can be some redundancy to minimize time complexity. 

Attributes:

  • queue: fixed size array.
  • headIndex: integer which indicates the current head element
  • count: the current length. with `headIndex`, we can locate the tail element.
  • capacity: max number of elements that can be held in the queue. With this, we reduce TC for calling len(queue). 
class MyCircularQueue {
private:
    vector<int> data;
    int head;
    int tail;
    int size;

public:
    MyCircularQueue(int k) {
        data.resize(k);
        head = -1;
        tail = -1;
        size = k;
    }
    bool enQueue(int value) {
        if (isFull()) {
            return false;
        }
        if (isEmpty()) {
            head = 0;
        }
        tail = (tail + 1) % size;
        data[tail] = value;
        return true;
    }
    bool deQueue() {
        if (isEmpty()) {
            return false;
        }
        if (head == tail) {
            head = -1;
            tail = -1;
            return true;
        }
        head = (head + 1) % size;
        return true;
    }    
int Front() {
        if (isEmpty()) {
            return -1;
        }
        return data[head];
    }
    int Rear() {
        if (isEmpty()) {
            return -1;
        }
        return data[tail];
    }
    bool isEmpty() {
        return head == -1;
    }
    bool isFull() {
        return ((tail + 1) % size) == head;
    }
};

Built In Queue C++

use stl queue in practice.

#include <queue>

int main() {
    queue<int> q;

    q.push(5);

    if (q.empty()) {}

    q.pop();

    q.front() ; get first element

    q.back() ; get last element

    q.size() ; get size
}

Hoare’s Quickselect & Lomuto’s Partition Scheme

Quickselect Algorithm

Hoare’s selecting algorithm is a textbook algorithm used to solve “find kth something”: kth smallest, kth largest, kth most frequent, kth less frequent, etc.

Time Complexity: Average O(n), Worst case O(n^2)

See 347. Top K Frequent Elements. Below, we solve this question with Hoare’s Quickselect.

Hoare’s Quickselect

In a hashmap of <element, frequency>, select a random pivot point and use a partition scheme to place the pivot into a position where less frequent elements come to left side of pivot and more or equally frequent comes to the right.

If pivot index is N-k, there are k elements to the right side and all those are more or equally frequent elements, so return them. Else, choose the side of the array to proceed recursively.

Lomuto’s Partition Scheme

places elements >= pivot in the right side of pivot and elements < pivot in the left side.

  1. swap the pivot with last element (or just start with the end element being pivot)
  2. start a pointer from the left (index 0). Call this store_index.
  3. iterating from 0 to end, if the element is less than pivot val, swap the element with element at store_index. If the element is >= pivot val, just pass.
  4. After the iteration, swap pivot with element at store_index (put pivot back in the right position).

Solution to 347. Top K Frequent Elements

class Solution {
private:
vector<int> unique;
map<int, int> count_map;

public:
int partition(int left, int right, int pivot_index) {
int pivot_frequency = count_map[unique[pivot_index]];
// 1. Move the pivot to the end
swap(unique[pivot_index], unique[right]);

// 2. Move all less frequent elements to the left
int store_index = left;
for (int i = left; i <= right; i++) {
if (count_map[unique[i]] < pivot_frequency) {
swap(unique[store_index], unique[i]);
store_index += 1;
}
}

// 3. Move the pivot to its final place
swap(unique[right], unique[store_index]);

return store_index;
}

void quickselect(int left, int right, int k_smallest) {
/*
Sort a list within left..right till kth less frequent element
takes its place.
*/

// base case: the list contains only one element
if (left == right) return;

int pivot_index = left + rand() % (right - left + 1);

// Find the pivot position in a sorted list
pivot_index = partition(left, right, pivot_index);

//If the pivot is in its final sorted position
if (k_smallest == pivot_index) {
return;
} else if (k_smallest < pivot_index) {
// go left
quickselect(left, pivot_index - 1, k_smallest);
} else {
// go right
quickselect(pivot_index + 1, right, k_smallest);
}
}

vector<int> topKFrequent(vector<int>& nums, int k) {
// build hash map: element and how often it appears
for (int n : nums) {
count_map[n] += 1;
}

// array of unique elements
int n = count_map.size();
for (pair<int, int> p : count_map) {
unique.push_back(p.first);
}

// kth top frequent element is (n - k)th less frequent.
// Do a partial sort: from less frequent to the most frequent, till
// (n - k)th less frequent element takes its place (n - k) in a sorted array.
// All elements on the left are less frequent.
// All the elements on the right are more frequent.
quickselect(0, n - 1, n - k);
// Return top k frequent elements
vector<int> top_k_frequent(k);
copy(unique.begin() + n - k, unique.end(), top_k_frequent.begin());
return top_k_frequent;
}
};

Hashmap in C++

Basic Usages

unordered_map<int, int> hashmap;
// 2. insert a new (key, value) pair
hashmap.insert(make_pair(2, 3));
// 3. insert a new (key, value) pair or update the value of existed key
hashmap[1] = 1;
hashmap[1] = 2;
// 4. get the value of a specific key
hashmap[1]
// 5. delete a key
hashmap.erase(2);
// 6. check if a key is in the hash map
if (hashmap.count(2) <= 0);
// 7. get the size of the hash map
hashmap.size()
// 8. iterate the hash map
for (auto it = hashmap.begin(); it != hashmap.end(); ++it) {
cout << it->first << "," << it->second << endl;
}
// 9. clear the hash map
hashmap.clear();
// 10. check if the hash map is empty
if (hashmap.empty())
Iterating over list of keys:
for (Type key : keys) {
        if (hashmap.count(key) > 0) {
            if (hashmap[key] satisfies the requirement) {
                return needed_information;
            }
        }
        // Value can be any information you needed (e.g. index)
        hashmap[key] = value;
    }

(to be continued…)

Iterating over map
for (auto& it : myMap) {
vec.push_back(it);
}
Sorting map based on value
#include <iostream>
#include <map>
#include <vector>
#include <algorithm>

// Comparator function to sort pairs by second value
bool sortByVal(const std::pair<int, int>& a, const std::pair<int, int>& b) {
return (a.second < b.second);
}

int main() {
// Define a map
std::map<int, int> myMap = {{1, 40}, {2, 30}, {3, 60}, {4, 20}};

// Copy elements to a vector of pairs
std::vector<std::pair<int, int>> vec;
for (auto& it : myMap) {
vec.push_back(it);
}

// Sort the vector by value
std::sort(vec.begin(), vec.end(), sortByVal);

// Print the sorted vector
for (auto& it : vec) {
std::cout << it.first << ": " << it.second << std::endl;
}

return 0;
}

Tips

  • When counting frequency, don’t need to check if the hashmap[x] exists.
// don't do this:
if (hashmap.count(x) > 0) hashmap[x]++;
else hashmap[x] = 1;
// do this:
hashmap[x]++;

// same for adding up the frequency:
for (...){
sum += hashmap[x];
}