C++ Policy Based Data Structures

C++ Policy Based Data Structures

in

C++ : POLICY BASED DATA STRUCTES

Whether you’re solving brain-teasers in a coding contest or building the next big software, the right data structures can mean the difference between “nailed it” and “why is this still running?!”

Think of the Standard Template Library (STL) as your trusty toolbox. It’s got the hammers, screwdrivers, and wrenches you need for most jobs. But what about those times when you need a laser cutter or a 3D printer? That’s where GNU C++ PBDS (Policy-Based Data Structures) comes in—it’s the advanced arsenal that turns you from a good coder into a coding superhero.

GNU C++ PBDS (Policy-Based Data Structures) is a powerful extension of STL that introduces a variety of advanced data structures, perfect for competitive programming and C++ development. With features like order statistics, dynamic range queries, and priority queue optimizations, PBDS enables you to tackle problems with efficiency and elegance. Whether you need to efficiently find the k-th smallest element, maintain dynamic order statistics, or handle graph algorithms with ease, PBDS has you covered.

In this blog, we’ll dive into the key components of PBDS, showcasing practical examples and exploring the real-world benefits of using these advanced data structures.

pb_ds : The Hidden Gem

  • Efficiency Boost: Say goodbye to writing custom data structures from scratch.
  • Versatility: Handle complex operations like order statistics, range queries, and priority updates effortlessly.
  • Compact Syntax: Write less, achieve more.

Here are some scenarios where pb_ds shines:

  • Finding the k-th smallest/largest element
  • Prefix-based search and string matching
  • Efficient handling of graph algorithms like Dijkstra’s or Prim’s

The underlying data-structures currently supported in pb_ds are :

image

Let’s discuss a few of them, we will focus on how these data structures can be useful to us in solving real-world problems and competitive programming challenges, rather than diving into the internal details of their implementations.yy

Key Components of pb_ds

1. Tree-Based Containers

Let’s talk about the star of the show : ordered_set, implemented as a red-black tree with extended functionality. These containers are perfect for problems involving dynamic order statistics or range queries.

Usage

To use this data-structure :

#include <ext/pb_ds/assoc_container.hpp> 
#include <ext/pb_ds/tree_policy.hpp>  
using namespace __gnu_pbds; 
typedef tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;

Methods

Other than the standard std::set methods, ordered_set also offers some other methods:

  • find_by_order
  • order_of_key

Time Complexity : O(log(n))

Example :

int main(){
    ordered_set s;
    s.insert(2);
    s.insert(9);
    s.insert(3);
    
    // Count elements less than a given value
    cout << s.order_of_key(9) << "\n"; //Outputs 2
   
    // Find k-th smallest element (2nd smallest acc to 0-based indexing)
    cout << *(s.find_by_order(1)) << "\n"; //Outputs 3
}

There’s also the ordered_multiset that is commonly implemented one of the two ways :

  1. Use an ordered_set of pairs to count occurences in p.second.
  2. Use the comparison operator less_equal<data_type>. (Note that this exchanges the two functions s.lower_bound(x) and s.upper_bound(x)).

Applications of ordered_set :

  • Inversion Count: Count the number of inversions in an array efficiently.
  • Median Maintenance: Dynamically track the median in a stream of numbers.
  • Range Counting: Count the number of elements in a specific range.

Above we have discussed about the order-statistics tree implemented by the red-black trees. A general tree from PBDS is discussed below.

We can define a tree as :

template<
    typename Key, // Key type
    typename Mapped, // Mapped-policy
    typename Cmp_Fn = std::less<Key>, // Key comparison function
    typename Tag = rb_tree_tag, // Inner Data structure
    typename Node_Update = null_node_update, // Node update policy
    typename Allocator = std::allocator<char> // Allocator type
    > class tree;
  • Keeping the Mapped-policy to null_type declares a set. Otherwise it creates a map.
  • Tag specifies the underlying data structure. It can be among these :
    • rb_tree_tag (default) = red-black tree
    • splay_tree_tag = splay tree
    • ov_tree_tag = ordered-vector tree
  • Node_Update is the policy which takes care of internal node invariants.
    • null_node_update (default), memory efficient
    • tree_order_statistics_node_update , enables the order-statistic methods : find_by_order and order_of_key.

For more , refer the official documentation at tree Interface and tree Design.

More Methods :

Even without the tree_order_statistics_node_update policy, PBDS trees support :

  • Join
  • Split

Time Complexity : O(log(n)) Examples can be found at : 1 , 2

Observations :

  • rb_tree_tag is the most efficient for order-statistic methods. (reference)
  • ov_tree_tag is the most efficient for split and join methods.(reference)

2. Hash-Based Containers

Many of us have experienced the frustration of unordered_map underperforming in certain scenarios. Despite its theoretical efficiency, the constant factors involved sometimes slow down the performance considerably. (which may lead to an unexpected TLE in competitive programming). Hash-based containers in pb_ds, such as gp_hash_table are designed for ultra-fast operations with lower memory consumption. These containers are designed like std::unordered_map and offer more flexibility, better performance with certain workloads, and the ability to use custom hash functions as well as resize_policy. We can get around get 2x to 5x improvement in execution time. The relevant tests can be found here. The primary types of hash-based containers in PBDS are:

  • gp_hash_table – General-probing hash table.
  • cc_hash_table – Collision-chaining hash table.

Why PBDS Hash Tables Outshine std::unordered_map

Feature std::unordered_map PBDS Hash Tables
Speed Moderate Lightning Fast
Memory Efficiency High Overhead Optimized
Customizability Limited Flexible (hash/resizing)
TLE Risk in Competitive Programming High Low

Usage

To use this data-structure :

#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/hash_policy.hpp>
using namespace __gnu_pbds;

typedef cc_hash_table<int, int> cc_table;
typedef gp_hash_table<int, int> gp_table;

Specifications

  • Collision-chaining hash tables (cc_hash_table) offer better flexibility and timing performance because of their internal structure. They handle memory less efficiently than probing containers but are preferred for general use cases.
  • General Probing container (gp_hash_table), on the other hand, are more memory-efficient, especially for simple data types.
  • They are also advantageous in multi-threaded applications due to reduced memory allocation contention.
  • For operations like eliminating duplicates or counting occurrences, probing containers might be more efficient.

Examples

1) gp_hash_table with a custom hash function

// Custom hash function
struct CustomHash {
    size_t operator()(const std::string& key) const {
        size_t hash = 0;
        for (char c : key) {
            hash = hash * 31 + c; // Simple polynomial hash
        }
        return hash;
    }
};

int main() {
    // Define a gp_hash_table mapping strings to integers
    gp_hash_table<std::string, int, CustomHash> my_table;

    my_table["one"] = 1;
    my_table["two"] = 2;

    for (const auto& pair : my_table) {
        std::cout << pair.first << ": " << pair.second << "\n";
    }

    return 0;
}

2) gp_hash_table with resize policy

The code can be found in the repository.

3) The solution to this problem is analysed in this blog

It depicts faster time and AC solution when gp_hash_table is used in place of std::unordered_map.


3. Better Priority Queues

The priority queue alternative offered by pb_ds outperforms the STL’s std::priority_queue in terms of flexibility and time complexity.

The priority queue alternative offered by pb_ds outperforms the STL’s std::priority_queue in terms of flexibility and time complexity.

While std::priority_queue is based on a binary heap, GNU pb_ds gives you immense freedom by offering multiple underlying implementations, each tailored to different scenarios. The available tags for priority queues include:

  • pairing_heap : Pairing heap (self-adjusting heap with tree structure).
  • binary_heap : Binary heap (complete binary tree).
  • binomial_heap : Binomial heap (collection of binomial trees linked together).
  • rc_binomial_heap : Relaxed binomial heap (variation of binomial heap with relaxed constraints).
  • thin_heap : Thin heap (space-efficient variation of Fibonacci heap).

The default tag is binary_heap_tag. Each of these implementations comes with its own perks and specific use cases, offering better alternatives to STL’s priority_queue.

Usage

#include <ext/pb_ds/priority_queue.hpp>
using namespace __gnu_pbds;

// replace thin_heap with the <tag> of your choice
typename __gnu_pbds::priority_queue<int, less<int>, thin_heap_tag> myPQ;

Methods

Other than the standard std::priority_queue methods, pb_ds also offers some other methods :

  • modify
  • erase
  • join
  • split

Dealing with Datatype <string>

  • pairing_heap offers the best time and memory performance for push and pop operations. (reference : 1, 2, 3)
  • join method: binomial_heap tag is the best for to join operations on multiple priority queues. (4).
  • modification : pairing_heap for decrease_key method (5) and thin_heap for increase_key method (6).

    For Datatype <integer>

Using the binary_heap (default) PBDS priority_queue would be better due to its efficiency in handling common operations like push and pop.

Tests for the same can be found at (7), (8).

The following table shows the complexities of the different underlying data structures in terms of orders of growth.

priority_queue (tags) push pop modify erase join
std : : priority_queue Θ(n) worst & Θ(log(n)) amortized Θ(log(n)) worst Θ(nlog(n)) worst Θ(nlog(n)) Θ(nlog(n))
pairing_heap_tag O(1) Θ(n) worst & Θ(log(n)) amortized Θ(n) worst & Θ(log(n)) amortized Θ(n) worst & Θ(log(n)) amortized O(1)
binary_heap_tag Θ(n) worst & Θ(log(n)) amortized Θ(n) worst & Θ(log(n)) amortized Θ(n) Θ(n) Θ(n)
binomial_heap_tag Θ(log(n)) worst & O(1) amortized Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n))
rc_binomial_heap_tag O(1) Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n))
thin_heap_tag O(1) Θ(n) worst & Θ(log(n)) amortized Θ(log(n)) worst & Θ(1) amortized Θ(n) worst & Θ(log(n)) amortized Θ(n)

Reference : (9)

Examples

1) Modify Method

int main() {
    typedef __gnu_pbds::priority_queue<int, std::less<int>, binary_heap_tag> PQ;

    PQ pq;
    PQ::point_iterator it = pq.push(15);
    pq.push(10);
    pq.push(5);

    // Modify the element 15 to 7
    pq.modify(it, 7);

    // Output the elements after modification
    while (!pq.empty()) {
        std::cout << pq.top() << " ";
        pq.pop();
    }
    // Output: 10 7 5

    return 0;
}

2) Join Method

pb_ds supports merging two priority queues of the same type, which is especially useful in scenarios like graph algorithms.

int main() {
    typedef __gnu_pbds::priority_queue<int, std::less<int>, binomial_heap_tag> PQ;

    PQ pq1, pq2;
    pq1.push(10);
    pq1.push(20);

    pq2.push(15);
    pq2.push(25);

    // Merge pq2 into pq1 using binomial heap join
    pq1.join(pq2);

    // To verify the above method
    while (!pq1.empty()) {
        std::cout << pq1.top() << " ";
        pq1.pop();
    }
    // Output: 25 20 15 10
    return 0;
}

3.) Priority Queue Split and Join example.


4. Trie-Based Containers

PBDS supports trie-based containers for prefix-based searches, longest-prefix matching, and other trie-centric algorithms. Unlike std::map or std::unordered_map, trie containers are optimized for hierarchical or lexicographical keys. Tries are particularly useful for:

  • Prefix-based searches: If you’re working with autocompletion systems or word suggestion algorithms, Tries are excellent for efficiently finding all words that share a common prefix.
  • String matching: Tries can be employed in algorithms where you need to find specific patterns in a large set of strings.

Usage

#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/trie_policy.hpp>
using namespace __gnu_pbds;

// To define the trie-based set container with prefix search
typedef trie<string, null_type, trie_string_access_traits<>, pat_trie_tag, trie_prefix_search_node_update> trie_set;

// To define the trie-based map container
typedef trie<string, int, trie_string_access_traits<>, pat_trie_tag, null_trie_node_update> trie_map;

You can look more into it at this link.

PATRICIA Tries

Patricia tries (practical prefix trees) optimize space usage by collapsing non-branching paths into single nodes.

  • Trie-Based Set (trie_set): This container allows you to store keys, where the keys are stored in a Trie structure. The Trie guarantees that common prefixes are stored only once, resulting in a more compact representation.
  • Trie-Based Map (trie_map): Similar to the Trie-based set, but instead of just storing keys, this container also stores values associated with each key. This makes it ideal for situations where you need to store a mapping from a key to a value.

Patricia tries prove highly useful for :

  • IP routing tables.
  • Efficient prefix compression in strings.
  • Fast substring matching.

Example

Using trie_set

int main() {
    trie_set t;

    // Insert elements
    t.insert("apple");
    t.insert("app");
    t.insert("banana");
    t.insert("bat");

    // Search for a specific key
    if (t.find("app") != t.end())
        cout << "Found: app\n";

    // Prefix search
    auto range = t.prefix_range("ba");
    for (auto it = range.first; it != range.second; ++it)
        cout << "Prefix match: " << *it << "\n";
    // Outputs "banana" and "bat"
    return 0;
}

pb_ds in Action

1) Count Number of Inversions in an Array

Using ordered_set

#include <bits/stdc++.h>
using namespace std;

#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;

typedef tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;

int countInversions(vector<int>& arr) {
    ordered_set os;
    int inversions = 0;
    for (int i = arr.size() - 1; i >= 0; i--) {
        inversions += os.order_of_key(arr[i]);
        os.insert(arr[i]);
    }
    return inversions;
}

int main() {
    vector<int> arr = {3, 1, 2};
    cout << countInversions(arr) << endl; // Output: 2
    return 0;
}
  • Time Complexity : O(nlogn)
  • Space Complexity : O(n)

2) Shortest Path using Pairing Heap

The below code uses priority queue implemented using pairing heap.

#include <bits/stdc++.h>
using namespace std;

#include <ext/pb_ds/priority_queue.hpp>
using namespace __gnu_pbds;

using PairingHeap = __gnu_pbds::priority_queue<std::pair<int, int>, std::greater<std::pair<int, int>>, pairing_heap_tag>;

void dijkstra(int start, const std::vector<std::vector<std::pair<int, int>>>& graph) {
    PairingHeap pq;
    std::vector<int> dist(graph.size(), INT_MAX);
    pq.push({0, start});
    dist[start] = 0;

    while (!pq.empty()) {
        auto top = pq.top();
        pq.pop();
        int d = top.first, u = top.second;
        if (d > dist[u]) continue;

        for (const auto& neighbor : graph[u]) {
            int weight = neighbor.first, v = neighbor.second;
            if (dist[u] + weight < dist[v]) {
                dist[v] = dist[u] + weight;
                pq.push({dist[v], v});
            }
        }
    }

    for (int i = 0; i < dist.size(); ++i) {
        std::cout << "Distance to " << i << ": " << dist[i] << "\n";
    }
}

int main() {
    std::vector<std::vector<std::pair<int, int>>> graph(4);
    graph[0].push_back( {1, 1} );
    graph[0].push_back( {4, 2} );
    graph[1].push_back( {1, 0} );
    graph[1].push_back( {2, 2} );
    graph[1].push_back( {6, 3} );
    graph[2].push_back( {4, 0} );
    graph[2].push_back( {2, 1} );
    graph[2].push_back( {3, 3} );
    graph[3].push_back( {6, 1} );
    graph[3].push_back( {3, 2} );

    dijkstra(0, graph);
    return 0;
    /* Output:
        Distance to 0: 0
        Distance to 1: 1
        Distance to 2: 3
        Distance to 3: 6
    */
}

The pb_ds pairing heap version of Dijkstra’s algorithm has a better amortized complexity O(V log V + E) than the traditional STL binary heap version O((V + E) log V). It performs particularly well in dense graphs and scenarios where decrease-key operations dominate.

3) Finding the Frequency of Elements in a Sliding Window

The gp_hash_table provides O(1) average time complexity for insertions, deletions, and lookups, making it well-suited for dynamic updates required in a sliding window problem.

#include <bits/stdc++.h>
using namespace std;

#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/hash_policy.hpp>
using namespace __gnu_pbds;

// Define gp_hash_table
typedef gp_hash_table<int, int> hash_table;

void sliding_window_frequencies(const std::vector<int>& arr, int k) {
    if (k > arr.size()) {
        std::cout << "Window size is larger than the array size.\n";
        return;
    }

    hash_table freq;

    // Initialize the first window
    for (int i = 0; i < k; ++i) {
        freq[arr[i]]++;
    }

    // Output frequencies for the first window
    std::cout << "Window [0, " << k - 1 << "]:\n";
    for (const auto& pair : freq) {
        std::cout << pair.first << ": " << pair.second << "\n";
    }

    // Slide the window
    for (int i = k; i < arr.size(); ++i) {
        int outgoing = arr[i - k]; // Element leaving the window
        int incoming = arr[i];    // Element entering the window

        // Decrease the count of the outgoing element
        if (--freq[outgoing] == 0) {
            freq.erase(outgoing);
        }

        // Increase the count of the incoming element
        freq[incoming]++;

        // Output frequencies for the current window
        std::cout << "Window [" << i - k + 1 << ", " << i << "]:\n";
        for (const auto& pair : freq) {
            std::cout << pair.first << ": " << pair.second << "\n";
        }
    }
}

int main() {
    std::vector<int> arr = {1, 2, 1, 3, 4, 7, 2, 4};
    int k = 5;

    sliding_window_frequencies(arr, k);

    return 0;
}

These are some more examples of the class methods from the github repository.

Conclusion

pb_ds is a treasure trove for competitive programmers. It saves time, simplifies code, and enables you to focus on solving problems rather than debugging custom data structures. With its variety of priority queue implementations, tree-based containers, hash-based structures and many more, it’s a must-have tool in your arsenal. Thank you!

References :