Google OR-Tools v9.12
a fast and portable software suite for combinatorial optimization
Loading...
Searching...
No Matches
set_cover_heuristics.h
Go to the documentation of this file.
1// Copyright 2010-2025 Google LLC
2// Licensed under the Apache License, Version 2.0 (the "License");
3// you may not use this file except in compliance with the License.
4// You may obtain a copy of the License at
5//
6// http://www.apache.org/licenses/LICENSE-2.0
7//
8// Unless required by applicable law or agreed to in writing, software
9// distributed under the License is distributed on an "AS IS" BASIS,
10// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11// See the License for the specific language governing permissions and
12// limitations under the License.
13
14#ifndef OR_TOOLS_ALGORITHMS_SET_COVER_HEURISTICS_H_
15#define OR_TOOLS_ALGORITHMS_SET_COVER_HEURISTICS_H_
16
17#include <vector>
18
19#include "absl/types/span.h"
23
24namespace operations_research {
25
26// Solver classes for the weighted set covering problem.
27//
28// The solution procedure is based on the general scheme known as local search.
29// Once a solution exists, it is improved by modifying it slightly, for example
30// by flipping a binary variable, so as to minimize the cost.
31// But first, we have to generate a first solution that is as good as possible.
32//
33// The first solution is then improved by using local search descent, which
34// eliminates the S_j's that have no interest in the solution.
35//
36// A mix of the guided local search (GLS) and Tabu Search (TS) metaheuristic
37// is also provided.
38//
39// The term 'focus' hereafter means a subset of the S_j's designated by their
40// indices. Focus make it possible to run the algorithms on the corresponding
41// subproblems.
42//
43// TODO(user): make the different algorithms concurrent, solving independent
44// subproblems in different threads.
45//
46
47// An obvious idea is to take all the S_j's (or equivalently to set all the
48// x_j's to 1). It's very silly but fast, and we can improve on it later using
49// local search.
50
51// The consistency level is maintained up to kFreeAndUncovered.
53 public:
54 explicit TrivialSolutionGenerator(SetCoverInvariant* inv) : inv_(inv) {}
55
56 // Returns true if a solution was found.
57 // TODO(user): Add time-outs and exit with a partial solution. This seems
58 // unlikely, though.
59 bool NextSolution();
60
61 // Computes the next partial solution considering only the subsets whose
62 // indices are in focus.
63 bool NextSolution(absl::Span<const SubsetIndex> focus);
64
65 private:
66 // The data structure that will maintain the invariant for the model.
68};
69
70// A slightly more complicated but better way to compute a first solution is to
71// select columns randomly. Less silly than the previous one, and provides
72// much better results.
73// TODO(user): make it possible to use other random generators. Idea: bias the
74// generator towards the columns with the least marginal costs.
75
76// The consistency level is maintained up to kFreeAndUncovered.
78 public:
79 explicit RandomSolutionGenerator(SetCoverInvariant* inv) : inv_(inv) {}
80
81 // Returns true if a solution was found.
82 bool NextSolution();
83
84 // Computes the next partial solution considering only the subsets whose
85 // indices are in focus.
86 bool NextSolution(const std::vector<SubsetIndex>& focus);
87
88 private:
89 // The data structure that will maintain the invariant for the model.
91};
92
93// The first solution is obtained using the Chvatal heuristic, that guarantees
94// that the solution is at most 1 + log(n) times the optimal value.
95// Vasek Chvatal, 1979. A greedy heuristic for the set-covering problem.
96// Mathematics of Operations Research, 4(3):233-235, 1979.
97// http://www.jstor.org/stable/3689577
98//
99// Chvatal's heuristic works as follows: Choose the subset that covers as many
100// remaining uncovered elements as possible for the least possible cost per
101// element and iterate.
102//
103// The following papers dive into the details of this class of algorithms.
104//
105// Young, Neal E. 2008. “Greedy Set-Cover Algorithms.” In Encyclopedia of
106// Algorithms, 379–81. Boston, MA: Springer US. Draft at:
107// http://www.cs.ucr.edu/~neal/non_arxiv/Young08SetCover.pdf
108//
109// Cormode, Graham, Howard Karloff, and Anthony Wirth. 2010. “Set Cover
110// Algorithms for Very Large Datasets.” In CIKM ’10. ACM Press.
111// https://doi.org/10.1145/1871437.1871501.
112
113// The consistency level is maintained up to kFreeAndUncovered.
115 public:
116 explicit GreedySolutionGenerator(SetCoverInvariant* inv) : inv_(inv) {}
117
118 // Returns true if a solution was found.
119 // TODO(user): Add time-outs and exit with a partial solution.
120 bool NextSolution();
121
122 // Computes the next partial solution considering only the subsets whose
123 // indices are in focus.
124 bool NextSolution(absl::Span<const SubsetIndex> focus);
125
126 // Same with a different set of costs.
127 bool NextSolution(absl::Span<const SubsetIndex> focus,
128 const SubsetCostVector& costs);
129
130 private:
131 // The data structure that will maintain the invariant for the model.
132 SetCoverInvariant* inv_;
133};
134
135// Solution generator based on the degree of elements.
136// The degree of an element is the number of subsets covering it.
137// The generator consists in iteratively choosing a non-covered element with the
138// smallest degree, and selecting a subset that covers it with the least
139// ratio cost / number of uncovered elements. The number of uncovered elements
140// are updated for each impacted subset. The newly-covered elements degree
141// are also updated and set to zero.
142
143// The consistency level is maintained up to kFreeAndUncovered.
145 public:
147
148 // Returns true if a solution was found.
149 // TODO(user): Add time-outs and exit with a partial solution.
150 bool NextSolution();
151
152 // Computes the next partial solution considering only the subsets whose
153 // indices are in focus.
154 bool NextSolution(absl::Span<const SubsetIndex> focus);
155
156 // Same with a different set of costs.
157 bool NextSolution(absl::Span<const SubsetIndex> focus,
158 const SubsetCostVector& costs);
159
160 private:
161 // Same with a different set of costs, and the focus defined as a vector of
162 // Booleans. This is the actual implementation of NextSolution.
163 bool NextSolution(const SubsetBoolVector& in_focus,
164 const SubsetCostVector& costs);
165
166 // The data structure that will maintain the invariant for the model.
167 SetCoverInvariant* inv_;
168};
169
170// Solution generator based on the degree of elements.
171// The heuristic is the same as ElementDegreeSolutionGenerator, but the number
172// of uncovered elements for a subset is computed on-demand. In empirical
173// tests, this seems to be faster than ElementDegreeSolutionGenerator because
174// a very small percentage of need to be computed, and even fewer among them
175// need to be computed again later on.
176
177// Because the number of uncovered elements is computed on-demand, the
178// consistency level only needs to be set to kCostAndCoverage.
180 public:
182 : inv_(inv) {}
183
184 // Returns true if a solution was found.
185 // TODO(user): Add time-outs and exit with a partial solution.
186 bool NextSolution();
187
188 // Computes the next partial solution considering only the subsets whose
189 // indices are in focus.
190 bool NextSolution(absl::Span<const SubsetIndex> focus);
191
192 // Same with a different set of costs.
193 bool NextSolution(absl::Span<const SubsetIndex> focus,
194 const SubsetCostVector& costs);
195
196 private:
197 // Same with a different set of costs, and the focus defined as a vector of
198 // Booleans. This is the actual implementation of NextSolution.
199 bool NextSolution(const SubsetBoolVector& in_focus,
200 const SubsetCostVector& costs);
201
202 // The data structure that will maintain the invariant for the model.
203 SetCoverInvariant* inv_;
204};
205
206// Once we have a first solution to the problem, there may be (most often,
207// there are) elements in E that are covered several times. To decrease the
208// total cost, SteepestSearch tries to eliminate some redundant S_j's from
209// the solution or equivalently, to flip some x_j's from 1 to 0. the algorithm
210// gets its name because it goes in the steepest immediate direction, taking
211// the S_j with the largest total cost.
212
213// The consistency level is maintained up to kFreeAndUncovered.
215 public:
216 explicit SteepestSearch(SetCoverInvariant* inv) : inv_(inv) {}
217
218 // Returns true if a solution was found within num_iterations.
219 // TODO(user): Add time-outs and exit with a partial solution.
220 bool NextSolution(int num_iterations);
221
222 // Computes the next partial solution considering only the subsets whose
223 // indices are in focus.
224 bool NextSolution(absl::Span<const SubsetIndex> focus, int num_iterations);
225
226 // Same as above, with a different set of costs.
227 bool NextSolution(absl::Span<const SubsetIndex> focus,
228 const SubsetCostVector& costs, int num_iterations);
229
230 private:
231 // Same with a different set of costs, and the focus defined as a vector of
232 // Booleans. This is the actual implementation of NextSolution.
233 bool NextSolution(const SubsetBoolVector& in_focus,
234 const SubsetCostVector& costs, int num_iterations);
235
236 // Updates the priorities on the impacted_subsets.
237 void UpdatePriorities(absl::Span<const SubsetIndex> impacted_subsets);
238
239 // The data structure that will maintain the invariant for the model.
240 SetCoverInvariant* inv_;
241};
242
243// A Tabu list is a fixed-sized set with FIFO replacement. It is expected to
244// be of small size, usually a few dozens of elements.
245template <typename T>
246class TabuList {
247 public:
248 explicit TabuList(T size) : array_(0), fill_(0), index_(0) {
249 array_.resize(size.value(), T(-1));
250 }
251
252 // Returns the size of the array.
253 int size() const { return array_.size(); }
254
255 // Initializes the array of the Tabu list.
256 void Init(int size) {
257 array_.resize(size, T(-1));
258 fill_ = 0;
259 index_ = 0;
260 }
261
262 // Adds t to the array. When the end of the array is reached, re-start at 0.
263 void Add(T t) {
264 const int size = array_.size();
265 array_[index_] = t;
266 ++index_;
267 if (index_ >= size) {
268 index_ = 0;
269 }
270 if (fill_ < size) {
271 ++fill_;
272 }
273 }
274
275 // Returns true if t is in the array. This is O(size), but small.
276 bool Contains(T t) const {
277 for (int i = 0; i < fill_; ++i) {
278 if (t == array_[i]) {
279 return true;
280 }
281 }
282 return false;
283 }
284
285 private:
286 std::vector<T> array_;
287 int fill_;
288 int index_;
289};
290
291// As usual and well-known with local search, SteepestSearch reaches a local
292// minimum. We therefore implement Guided Tabu Search, which is a crossover of
293// Guided Local Search and Tabu Search.
294//
295// Guided Local Search penalizes the parts of the solution that have been often
296// used. It behaves as a long-term memory which "learns" the most used
297// features and introduces some diversification in the search.
298//
299// C. Voudouris (1997) "Guided local search for combinatorial optimisation
300// problems", PhD Thesis, University of Essex, Colchester, UK, July, 1997.
301//
302// Tabu Search makes it possible to degrade the solution temporarily
303// by disallowing to go back for a certain time (changes are put in a "Tabu"
304// list).
305//
306// Tabu behaves like a short-term memory and is the intensification part of the
307// local search metaheuristic.
308//
309// F. Glover (1989) "Tabu Search – Part 1". ORSA Journal on Computing.
310// 1 (2):190–206. doi:10.1287/ijoc.1.3.190.
311// F. Glover (1990) "Tabu Search – Part 2". ORSA Journal on Computing.
312// 2 (1): 4–32. doi:10.1287/ijoc.2.1.4.
313
314// The consistency level is maintained up to kFreeAndUncovered.
316 public:
318 : inv_(inv),
319 lagrangian_factor_(kDefaultLagrangianFactor),
320 penalty_factor_(kDefaultPenaltyFactor),
321 epsilon_(kDefaultEpsilon),
322 augmented_costs_(),
323 times_penalized_(),
324 tabu_list_(SubsetIndex(kDefaultTabuListSize)) {
325 Initialize();
326 }
327
328 // Initializes the Guided Tabu Search algorithm.
329 void Initialize();
330
331 // Returns the next solution by running the Tabu Search algorithm for maximum
332 // num_iterations iterations.
333 bool NextSolution(int num_iterations);
334
335 // Computes the next partial solution considering only the subsets whose
336 // indices are in focus.
337 bool NextSolution(absl::Span<const SubsetIndex> focus, int num_iterations);
338
339 bool NextSolution(const SubsetBoolVector& in_focus, int num_iterations);
340
341 // TODO(user): re-introduce this is the code. It was used to favor
342 // subsets with the same marginal costs but that would cover more elements.
343 // But first, see if it makes sense to compute it.
344 void SetLagrangianFactor(double factor) { lagrangian_factor_ = factor; }
345 double GetLagrangianFactor() const { return lagrangian_factor_; }
346
347 void SetEpsilon(double r) { epsilon_ = r; }
348 double GetEpsilon() const { return epsilon_; }
349
350 // Setters and getters for the Guided Tabu Search algorithm parameters.
351 void SetPenaltyFactor(double factor) { penalty_factor_ = factor; }
352 double GetPenaltyFactor() const { return penalty_factor_; }
353
354 void SetTabuListSize(int size) { tabu_list_.Init(size); }
355 int GetTabuListSize() const { return tabu_list_.size(); }
356
357 private:
358 // Updates the penalties on the subsets in focus.
359 void UpdatePenalties(absl::Span<const SubsetIndex> focus);
360
361 // The data structure that will maintain the invariant for the model.
362 SetCoverInvariant* inv_;
363
364 // Search handling variables and default parameters.
365 static constexpr double kDefaultLagrangianFactor = 100.0;
366 double lagrangian_factor_;
367
368 // Guided local search-related data.
369 static constexpr double kPenaltyUpdateEpsilon = 1e-1;
370
371 // Guided Tabu Search parameters.
372 static constexpr double kDefaultPenaltyFactor = 0.3;
373 double penalty_factor_;
374
375 // Tabu Search parameters.
376 static constexpr double kDefaultEpsilon = 1e-6;
377 double epsilon_;
378
379 // Penalized costs for each subset as used in Guided Tabu Search.
380 SubsetCostVector augmented_costs_;
381
382 // The number of times each subset was penalized during Guided Tabu Search.
383 SubsetToIntVector times_penalized_;
384
385 // TODO(user): remove and use priority_queue.
386 // Utilities for the different subsets. They are updated ("penalized") costs.
387 SubsetCostVector utilities_;
388
389 // Tabu search-related data.
390 static constexpr int kDefaultTabuListSize = 17; // Nice prime number.
391 TabuList<SubsetIndex> tabu_list_;
392};
393
394// Guided Local Search penalizes the parts of the solution that have been often
395// used. It behaves as a long-term memory which "learns" the most used
396// features and introduces some diversification in the search.
397// At each iteration, the algorithm selects a subset from the focus with maximum
398// utility of penalization and penalizes it.
399
400// It has been observed that good values for the penalisation factor can be
401// found by dividing the value of the objective function of a local minimum
402// with the number of features present in it [1]. In our case, the penalisation
403// factor is the sum of the costs of the subsets selected in the focus divided
404// by the number of subsets in the focus times a tunable factor alpha_.
405// [1] C. Voudouris (1997) "Guided local search for combinatorial optimisation
406// problems", PhD Thesis, University of Essex, Colchester, UK, July, 1997.
407
408// The consistency level is maintained up to kRedundancy.
410 public:
412 : inv_(inv), epsilon_(kDefaultEpsilon), alpha_(kDefaultAlpha) {
413 Initialize();
414 }
415
416 // Initializes the Guided Local Search algorithm.
417 void Initialize();
418
419 // Returns the next solution by running the Guided Local algorithm for
420 // maximum num_iterations iterations.
421 bool NextSolution(int num_iterations);
422
423 // Computes the next partial solution considering only the subsets whose
424 // indices are in focus.
425 bool NextSolution(absl::Span<const SubsetIndex> focus, int num_iterations);
426
427 bool NextSolution(const SubsetBoolVector& in_focus, int num_iterations);
428
429 private:
430 // The data structure that will maintain the invariant for the model.
431 SetCoverInvariant* inv_;
432
433 // Setters and getters for the Guided Local Search algorithm parameters.
434 void SetEpsilon(double r) { epsilon_ = r; }
435
436 double GetEpsilon() const { return epsilon_; }
437
438 void SetAlpha(double r) { alpha_ = r; }
439
440 double GetAlpha() const { return alpha_; }
441
442 // The epsilon value for the Guided Local Search algorithm.
443 // Used to penalize the subsets within epsilon of the maximum utility.
444 static constexpr double kDefaultEpsilon = 1e-8;
445 double epsilon_;
446
447 // The alpha value for the Guided Local Search algorithm.
448 // Tunable factor used to penalize the subsets.
449 static constexpr double kDefaultAlpha = 0.5;
450 double alpha_;
451
452 // The penalization value for the Guided Local Search algorithm.
453 double penalization_factor_;
454
455 // The penalties of each feature during Guided Local Search.
456 SubsetToIntVector penalties_;
457
458 // Computes the delta of the cost of the solution if subset state changed.
459 Cost ComputeDelta(SubsetIndex subset) const;
460
461 // The priority heap used to select the subset with the maximum priority to be
462 // updated.
463 AdjustableKAryHeap<float, SubsetIndex::ValueType, 2, true> priority_heap_;
464
465 // The utility heap used to select the subset with the maximum utility to be
466 // penalized.
467 AdjustableKAryHeap<float, SubsetIndex::ValueType, 2, true> utility_heap_;
468};
469
470// Randomly clears a proportion num_subsets variables in the solution.
471// Returns a list of subset indices to be potentially reused as a focus.
472// Randomly clears at least num_subsets variables in the
473// solution. There can be more than num_subsets variables cleared because the
474// intersecting subsets are also removed from the solution. Returns a list of
475// subset indices that can be reused as a focus.
476
477// The consistency level is maintained up to kCostAndCoverage.
478std::vector<SubsetIndex> ClearRandomSubsets(BaseInt num_subsets,
479 SetCoverInvariant* inv);
480
481// Same as above, but clears the subset indices in focus.
482std::vector<SubsetIndex> ClearRandomSubsets(absl::Span<const SubsetIndex> focus,
483 BaseInt num_subsets,
484 SetCoverInvariant* inv);
485
486// Clears the variables (subsets) that cover the most covered elements. This is
487// capped by num_subsets. If the cap is reached, the subsets are chosen
488// randomly.
489// Returns the list of the chosen subset indices.
490// This indices can then be used ax a focus.
491
492// The consistency level is maintained up to kCostAndCoverage.
493std::vector<SubsetIndex> ClearMostCoveredElements(BaseInt num_subsets,
494 SetCoverInvariant* inv);
495
496// Same as above, but clears the subset indices in focus.
497std::vector<SubsetIndex> ClearMostCoveredElements(
498 absl::Span<const SubsetIndex> focus, BaseInt num_subsets,
499 SetCoverInvariant* inv);
500} // namespace operations_research
501
502#endif // OR_TOOLS_ALGORITHMS_SET_COVER_HEURISTICS_H_
bool NextSolution(const SubsetBoolVector &in_focus, int num_iterations)
void Initialize()
Initializes the Guided Local Search algorithm.
void Initialize()
Initializes the Guided Tabu Search algorithm.
void SetPenaltyFactor(double factor)
Setters and getters for the Guided Tabu Search algorithm parameters.
bool NextSolution(const SubsetBoolVector &in_focus, int num_iterations)
bool NextSolution()
Returns true if a solution was found.
void Init(int size)
Initializes the array of the Tabu list.
bool Contains(T t) const
Returns true if t is in the array. This is O(size), but small.
int size() const
Returns the size of the array.
void Add(T t)
Adds t to the array. When the end of the array is reached, re-start at 0.
In SWIG mode, we don't want anything besides these top-level includes.
util_intops::StrongVector< SubsetIndex, Cost > SubsetCostVector
std::vector< SubsetIndex > ClearRandomSubsets(BaseInt num_subsets, SetCoverInvariant *inv)
The consistency level is maintained up to kCostAndCoverage.
util_intops::StrongVector< SubsetIndex, BaseInt > SubsetToIntVector
std::vector< SubsetIndex > ClearMostCoveredElements(BaseInt max_num_subsets, SetCoverInvariant *inv)
The consistency level is maintained up to kCostAndCoverage.
util_intops::StrongVector< SubsetIndex, bool > SubsetBoolVector
double Cost
Basic non-strict type for cost. The speed penalty for using double is ~2%.