Google OR-Tools v9.11
a fast and portable software suite for combinatorial optimization
Loading...
Searching...
No Matches
operations_research::DenseIntDuplicateRemover Class Reference

#include <duplicate_remover.h>

Public Member Functions

 DenseIntDuplicateRemover (int n)
 
template<class IntContainer >
void RemoveDuplicates (IntContainer *container)
 
template<class IntContainer >
void AppendAndLazilyRemoveDuplicates (int x, IntContainer *container)
 
template<>
void Append (int x, std::vector< int > *container)
 
template<>
void Append (int x, google::protobuf::RepeatedField< int > *container)
 
template<>
void Truncate (size_t new_size, std::vector< int > *container)
 
template<>
void Truncate (size_t new_size, google::protobuf::RepeatedField< int > *container)
 

Detailed Description

This class offers an alternative to gtl::linked_hash_set<> which is:

  • stateless: it works directly on a vector<int> or any similar container, without storing extra data anywhere;
  • faster when the number of unique values is 5K or above.

The memory usage can be O(num_distinct_values) at any time if you use AppendAndLazilyRemoveDuplicates(). In fact, unit tests verify that the average number of elements kept is ≤ 1.5 * num_distinct_values, making it comparable to a flat_hash_set<int> (whose overhead factor is ~1.68).

Usage pattern:

///< One instance of this can handle many sets on the same [0, n) domain. int N = 100'000; DenseIntDuplicateRemover deduper(N); ///< Uses N/8 bytes of memory. std::vector<int> values; ///< Your container. Could be RepeatedField<int>. for (int x : ...) { deduper.AppendAndLazilyRemoveDuplicates(x, &values); ///< O(1) amortized. } deduper.RemoveDuplicates(&values); ///< O(values.size())

Definition at line 51 of file duplicate_remover.h.

Constructor & Destructor Documentation

◆ DenseIntDuplicateRemover()

operations_research::DenseIntDuplicateRemover::DenseIntDuplicateRemover ( int n)
inlineexplicit

Definition at line 53 of file duplicate_remover.h.

Member Function Documentation

◆ Append() [1/2]

template<>
void operations_research::DenseIntDuplicateRemover::Append ( int x,
google::protobuf::RepeatedField< int > * container )
inline

Definition at line 123 of file duplicate_remover.h.

◆ Append() [2/2]

template<>
void operations_research::DenseIntDuplicateRemover::Append ( int x,
std::vector< int > * container )
inline

Definition at line 117 of file duplicate_remover.h.

◆ AppendAndLazilyRemoveDuplicates()

template<class IntContainer >
void operations_research::DenseIntDuplicateRemover::AppendAndLazilyRemoveDuplicates ( int x,
IntContainer * container )

ALGORITHM: In order to remain stateless, yet call RemoveDuplicates() often enough that the size of the container remains O(num_distinct_elements), but not too often since we must remain O(1) time amortized, we randomize: every time we append an element, we'll call RemoveDuplicates() with probability 1/k, where k is the current size of the container. That way, the added expected complexity is O(k)*1/k = O(1), yet we know that we'll eventually call it. See the unit tests that verify the claims. As an important optimization, since drawing the pseudo-random number is expensive, we only perform it every kCheckPeriod, and to compensate we multiply the probability by the same amount.

Definition at line 89 of file duplicate_remover.h.

◆ RemoveDuplicates()

template<class IntContainer >
void operations_research::DenseIntDuplicateRemover::RemoveDuplicates ( IntContainer * container)

Implementation of the templates.

Definition at line 83 of file duplicate_remover.h.

◆ Truncate() [1/2]

template<>
void operations_research::DenseIntDuplicateRemover::Truncate ( size_t new_size,
google::protobuf::RepeatedField< int > * container )
inline

Definition at line 135 of file duplicate_remover.h.

◆ Truncate() [2/2]

template<>
void operations_research::DenseIntDuplicateRemover::Truncate ( size_t new_size,
std::vector< int > * container )
inline

Definition at line 129 of file duplicate_remover.h.


The documentation for this class was generated from the following files: