Google OR-Tools v9.11
a fast and portable software suite for combinatorial optimization
|
#include <duplicate_remover.h>
Public Member Functions | |
DenseIntDuplicateRemover (int n) | |
template<class IntContainer > | |
void | RemoveDuplicates (IntContainer *container) |
template<class IntContainer > | |
void | AppendAndLazilyRemoveDuplicates (int x, IntContainer *container) |
template<> | |
void | Append (int x, std::vector< int > *container) |
template<> | |
void | Append (int x, google::protobuf::RepeatedField< int > *container) |
template<> | |
void | Truncate (size_t new_size, std::vector< int > *container) |
template<> | |
void | Truncate (size_t new_size, google::protobuf::RepeatedField< int > *container) |
This class offers an alternative to gtl::linked_hash_set<> which is:
The memory usage can be O(num_distinct_values) at any time if you use AppendAndLazilyRemoveDuplicates(). In fact, unit tests verify that the average number of elements kept is ≤ 1.5 * num_distinct_values, making it comparable to a flat_hash_set<int> (whose overhead factor is ~1.68).
Usage pattern:
///< One instance of this can handle many sets on the same [0, n) domain. int N = 100'000; DenseIntDuplicateRemover deduper(N); ///< Uses N/8 bytes of memory. std::vector<int> values; ///< Your container. Could be RepeatedField<int>. for (int x : ...) { deduper.AppendAndLazilyRemoveDuplicates(x, &values); ///< O(1) amortized. } deduper.RemoveDuplicates(&values); ///< O(values.size())
Definition at line 51 of file duplicate_remover.h.
|
inlineexplicit |
Definition at line 53 of file duplicate_remover.h.
|
inline |
Definition at line 123 of file duplicate_remover.h.
|
inline |
Definition at line 117 of file duplicate_remover.h.
void operations_research::DenseIntDuplicateRemover::AppendAndLazilyRemoveDuplicates | ( | int | x, |
IntContainer * | container ) |
ALGORITHM: In order to remain stateless, yet call RemoveDuplicates() often enough that the size of the container remains O(num_distinct_elements), but not too often since we must remain O(1) time amortized, we randomize: every time we append an element, we'll call RemoveDuplicates() with probability 1/k, where k is the current size of the container. That way, the added expected complexity is O(k)*1/k = O(1), yet we know that we'll eventually call it. See the unit tests that verify the claims. As an important optimization, since drawing the pseudo-random number is expensive, we only perform it every kCheckPeriod, and to compensate we multiply the probability by the same amount.
Definition at line 89 of file duplicate_remover.h.
void operations_research::DenseIntDuplicateRemover::RemoveDuplicates | ( | IntContainer * | container | ) |
|
inline |
Definition at line 135 of file duplicate_remover.h.
|
inline |
Definition at line 129 of file duplicate_remover.h.