Google OR-Tools v9.11
a fast and portable software suite for combinatorial optimization
Loading...
Searching...
No Matches
set_cover_heuristics.h
Go to the documentation of this file.
1// Copyright 2010-2024 Google LLC
2// Licensed under the Apache License, Version 2.0 (the "License");
3// you may not use this file except in compliance with the License.
4// You may obtain a copy of the License at
5//
6// http://www.apache.org/licenses/LICENSE-2.0
7//
8// Unless required by applicable law or agreed to in writing, software
9// distributed under the License is distributed on an "AS IS" BASIS,
10// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11// See the License for the specific language governing permissions and
12// limitations under the License.
13
14#ifndef OR_TOOLS_ALGORITHMS_SET_COVER_HEURISTICS_H_
15#define OR_TOOLS_ALGORITHMS_SET_COVER_HEURISTICS_H_
16
17#include <cstddef>
18#include <vector>
19
20#include "absl/base/nullability.h"
21#include "absl/types/span.h"
25
26namespace operations_research {
27
28// Priority aggregate for the subset priority queue.
30 public:
31 using Index = SubsetIndex::ValueType;
32 using Priority = float;
33 SubsetIndexWithPriority() : index_(-1), priority_(0) {}
36 Priority priority() const { return priority_; }
37 Index index() const { return index_; }
38 inline bool operator<(const SubsetIndexWithPriority other) const {
39 if (other.priority() != priority()) {
40 return priority() < other.priority();
41 }
42 return index() < other.index();
43 }
44
45 private:
46 Index index_;
47 Priority priority_;
48};
49
50// Solver classes for the weighted set covering problem.
51//
52// The solution procedure is based on the general scheme known as local search.
53// Once a solution exists, it is improved by modifying it slightly, for example
54// by flipping a binary variable, so as to minimize the cost.
55// But first, we have to generate a first solution that is as good as possible.
56//
57// The first solution is then improved by using local search descent, which
58// eliminates the S_j's that have no interest in the solution.
59//
60// A mix of the guided local search (GLS) and Tabu Search (TS) metaheuristic
61// is also provided.
62//
63// The term 'focus' hereafter means a subset of the S_j's designated by their
64// indices. Focus make it possible to run the algorithms on the corresponding
65// subproblems.
66//
67// TODO(user): make the different algorithms concurrent, solving independent
68// subproblems in different threads.
69//
70
71// The preprocessor finds the elements that can only be covered by one subset.
72// Obviously, such subsets which are the only ones that can cover a given
73// element are chosen.
75 public:
76 explicit Preprocessor(absl::Nonnull<SetCoverInvariant*> inv)
77 : inv_(inv), num_columns_fixed_by_singleton_row_(0) {}
78
79 // Returns true if a solution was found.
80 // TODO(user): Add time-outs and exit with a partial solution. This seems
81 // unlikely, though.
82 bool NextSolution();
83
84 // Computes the next partial solution considering only the subsets whose
85 // indices are in focus.
86 bool NextSolution(absl::Span<const SubsetIndex> focus);
87
88 // Returns the number of columns that are the only one for a given row.
90 return num_columns_fixed_by_singleton_row_;
91 }
92
93 private:
94 // The data structure that will maintain the invariant for the model.
96
97 // The number of columns that are the only one for a given row, i.e.
98 // the subsets that are unique in covering a particular element.
99 BaseInt num_columns_fixed_by_singleton_row_;
100};
101
102// An obvious idea is to take all the S_j's (or equivalently to set all the
103// x_j's to 1). It's a bit silly but fast, and we can improve on it later using
104// local search.
106 public:
107 explicit TrivialSolutionGenerator(SetCoverInvariant* inv) : inv_(inv) {}
108
109 // Returns true if a solution was found.
110 // TODO(user): Add time-outs and exit with a partial solution. This seems
111 // unlikely, though.
112 bool NextSolution();
113
114 // Computes the next partial solution considering only the subsets whose
115 // indices are in focus.
116 bool NextSolution(absl::Span<const SubsetIndex> focus);
117
118 private:
119 // The data structure that will maintain the invariant for the model.
120 SetCoverInvariant* inv_;
121};
122
123// A slightly more complicated but better way to compute a first solution is to
124// select columns randomly. Less silly than the previous one, and provides
125// much better results.
126// TODO(user): make it possible to use other random generators. Idea: bias the
127// generator towards the columns with the least marginal costs.
129 public:
130 explicit RandomSolutionGenerator(SetCoverInvariant* inv) : inv_(inv) {}
131
132 // Returns true if a solution was found.
133 bool NextSolution();
134
135 // Computes the next partial solution considering only the subsets whose
136 // indices are in focus.
137 bool NextSolution(const std::vector<SubsetIndex>& focus);
138
139 private:
140 // The data structure that will maintain the invariant for the model.
141 SetCoverInvariant* inv_;
142};
143
144// The first solution is obtained using the Chvatal heuristic, that guarantees
145// that the solution is at most 1 + log(n) times the optimal value.
146// Vasek Chvatal, 1979. A greedy heuristic for the set-covering problem.
147// Mathematics of Operations Research, 4(3):233-235, 1979.
148// http://www.jstor.org/stable/3689577
149//
150// Chvatal's heuristic works as follows: Choose the subset that covers as many
151// remaining uncovered elements as possible for the least possible cost per
152// element and iterate.
153//
154// The following papers dive into the details of this class of algorithms.
155//
156// Young, Neal E. 2008. “Greedy Set-Cover Algorithms.” In Encyclopedia of
157// Algorithms, 379–81. Boston, MA: Springer US. Draft at:
158// http://www.cs.ucr.edu/~neal/non_arxiv/Young08SetCover.pdf
159//
160// Cormode, Graham, Howard Karloff, and Anthony Wirth. 2010. “Set Cover
161// Algorithms for Very Large Datasets.” In CIKM ’10. ACM Press.
162// https://doi.org/10.1145/1871437.1871501.
163
165 public:
166 explicit GreedySolutionGenerator(SetCoverInvariant* inv) : inv_(inv) {}
167
168 // Returns true if a solution was found.
169 // TODO(user): Add time-outs and exit with a partial solution.
170 bool NextSolution();
171
172 // Computes the next partial solution considering only the subsets whose
173 // indices are in focus.
174 bool NextSolution(const std::vector<SubsetIndex>& focus);
175
176 // Same with a different set of costs.
177 bool NextSolution(const std::vector<SubsetIndex>& focus,
178 const SubsetCostVector& costs);
179
180 private:
181 // The data structure that will maintain the invariant for the model.
182 SetCoverInvariant* inv_;
183};
184
185// Solution generator based on the degree of elements.
186// The degree of an element is the number of subsets covering it.
187// The generator consists in iteratively choosing a non-covered element with the
188// smallest degree, and selecting a subset that covers it with the least cost.
189// The newly-covered elements degree are also updated.
191 public:
193
194 // Returns true if a solution was found.
195 // TODO(user): Add time-outs and exit with a partial solution.
196 bool NextSolution();
197
198 // Computes the next partial solution considering only the subsets whose
199 // indices are in focus.
200 bool NextSolution(absl::Span<const SubsetIndex> focus);
201
202 // Same with a different set of costs.
203 bool NextSolution(absl::Span<const SubsetIndex> focus,
204 const SubsetCostVector& costs);
205
206 private:
207 // Same with a different set of costs, and the focus defined as a vector of
208 // Booleans. This is the actual implementation of NextSolution.
209 bool NextSolution(const SubsetBoolVector& in_focus,
210 const SubsetCostVector& costs);
211
212 // The data structure that will maintain the invariant for the model.
213 SetCoverInvariant* inv_;
214};
215
216// Once we have a first solution to the problem, there may be (most often,
217// there are) elements in E that are covered several times. To decrease the
218// total cost, SteepestSearch tries to eliminate some redundant S_j's from
219// the solution or equivalently, to flip some x_j's from 1 to 0. the algorithm
220// gets its name because it goes in the steepest immediate direction, taking
221// the S_j with the largest total cost.
223 public:
224 explicit SteepestSearch(SetCoverInvariant* inv) : inv_(inv) {}
225
226 // Returns true if a solution was found within num_iterations.
227 // TODO(user): Add time-outs and exit with a partial solution.
228 bool NextSolution(int num_iterations);
229
230 // Computes the next partial solution considering only the subsets whose
231 // indices are in focus.
232 bool NextSolution(absl::Span<const SubsetIndex> focus, int num_iterations);
233
234 // Same as above, with a different set of costs.
235 bool NextSolution(absl::Span<const SubsetIndex> focus,
236 const SubsetCostVector& costs, int num_iterations);
237
238 private:
239 // Same with a different set of costs, and the focus defined as a vector of
240 // Booleans. This is the actual implementation of NextSolution.
241 bool NextSolution(const SubsetBoolVector& in_focus,
242 const SubsetCostVector& costs, int num_iterations);
243
244 // Updates the priorities on the impacted_subsets.
245 void UpdatePriorities(absl::Span<const SubsetIndex> impacted_subsets);
246
247 // The data structure that will maintain the invariant for the model.
248 SetCoverInvariant* inv_;
249};
250
251// A Tabu list is a fixed-sized set with FIFO replacement. It is expected to
252// be of small size, usually a few dozens of elements.
253template <typename T>
254class TabuList {
255 public:
256 explicit TabuList(T size) : array_(0), fill_(0), index_(0) {
257 array_.resize(size.value(), T(-1));
258 }
259
260 // Returns the size of the array.
261 int size() const { return array_.size(); }
262
263 // Initializes the array of the Tabu list.
264 void Init(int size) {
265 array_.resize(size, T(-1));
266 fill_ = 0;
267 index_ = 0;
268 }
269
270 // Adds t to the array. When the end of the array is reached, re-start at 0.
271 void Add(T t) {
272 const int size = array_.size();
273 array_[index_] = t;
274 ++index_;
275 if (index_ >= size) {
276 index_ = 0;
277 }
278 if (fill_ < size) {
279 ++fill_;
280 }
281 }
282
283 // Returns true if t is in the array. This is O(size), but small.
284 bool Contains(T t) const {
285 for (int i = 0; i < fill_; ++i) {
286 if (t == array_[i]) {
287 return true;
288 }
289 }
290 return false;
291 }
292
293 private:
294 std::vector<T> array_;
295 int fill_;
296 int index_;
297};
298
299// As usual and well-known with local search, SteepestSearch reaches a local
300// minimum. We therefore implement Guided Tabu Search, which is a crossover of
301// Guided Local Search and Tabu Search.
302//
303// Guided Local Search penalizes the parts of the solution that have been often
304// used. It behaves as a long-term memory which "learns" the most used
305// features and introduces some diversification in the search.
306//
307// C. Voudouris (1997) "Guided local search for combinatorial optimisation
308// problems", PhD Thesis, University of Essex, Colchester, UK, July, 1997.
309//
310// Tabu Search makes it possible to degrade the solution temporarily
311// by disallowing to go back for a certain time (changes are put in a "Tabu"
312// list).
313//
314// Tabu behaves like a short-term memory and is the intensification part of the
315// local search metaheuristic.
316//
317// F. Glover (1989) "Tabu Search – Part 1". ORSA Journal on Computing.
318// 1 (2):190–206. doi:10.1287/ijoc.1.3.190.
319// F. Glover (1990) "Tabu Search – Part 2". ORSA Journal on Computing.
320// 2 (1): 4–32. doi:10.1287/ijoc.2.1.4.
322 public:
324 : inv_(inv),
325 lagrangian_factor_(kDefaultLagrangianFactor),
326 penalty_factor_(kDefaultPenaltyFactor),
327 epsilon_(kDefaultEpsilon),
328 augmented_costs_(),
329 times_penalized_(),
330 tabu_list_(SubsetIndex(kDefaultTabuListSize)) {
331 Initialize();
332 }
333
334 // Initializes the Guided Tabu Search algorithm.
335 void Initialize();
336
337 // Returns the next solution by running the Tabu Search algorithm for maximum
338 // num_iterations iterations.
339 bool NextSolution(int num_iterations);
340
341 // Computes the next partial solution considering only the subsets whose
342 // indices are in focus.
343 bool NextSolution(absl::Span<const SubsetIndex> focus, int num_iterations);
344
345 bool NextSolution(const SubsetBoolVector& in_focus, int num_iterations);
346
347 // TODO(user): re-introduce this is the code. It was used to favor
348 // subsets with the same marginal costs but that would cover more elements.
349 // But first, see if it makes sense to compute it.
350 void SetLagrangianFactor(double factor) { lagrangian_factor_ = factor; }
351 double GetLagrangianFactor() const { return lagrangian_factor_; }
352
353 void SetEpsilon(double r) { epsilon_ = r; }
354 double GetEpsilon() const { return epsilon_; }
355
356 // Setters and getters for the Guided Tabu Search algorithm parameters.
357 void SetPenaltyFactor(double factor) { penalty_factor_ = factor; }
358 double GetPenaltyFactor() const { return penalty_factor_; }
359
360 void SetTabuListSize(int size) { tabu_list_.Init(size); }
361 int GetTabuListSize() const { return tabu_list_.size(); }
362
363 private:
364 // Updates the penalties on the subsets in focus.
365 void UpdatePenalties(absl::Span<const SubsetIndex> focus);
366
367 // The data structure that will maintain the invariant for the model.
368 SetCoverInvariant* inv_;
369
370 // Search handling variables and default parameters.
371 static constexpr double kDefaultLagrangianFactor = 100.0;
372 double lagrangian_factor_;
373
374 // Guided local search-related data.
375 static constexpr double kPenaltyUpdateEpsilon = 1e-1;
376
377 // Guided Tabu Search parameters.
378 static constexpr double kDefaultPenaltyFactor = 0.3;
379 double penalty_factor_;
380
381 // Tabu Search parameters.
382 static constexpr double kDefaultEpsilon = 1e-6;
383 double epsilon_;
384
385 // Penalized costs for each subset as used in Guided Tabu Search.
386 SubsetCostVector augmented_costs_;
387
388 // The number of times each subset was penalized during Guided Tabu Search.
389 SubsetToIntVector times_penalized_;
390
391 // TODO(user): remove and use priority_queue.
392 // Utilities for the different subsets. They are updated ("penalized") costs.
393 SubsetCostVector utilities_;
394
395 // Tabu search-related data.
396 static constexpr int kDefaultTabuListSize = 17; // Nice prime number.
397 TabuList<SubsetIndex> tabu_list_;
398};
399
400// Guided Local Search penalizes the parts of the solution that have been often
401// used. It behaves as a long-term memory which "learns" the most used
402// features and introduces some diversification in the search.
403// At each iteration, the algorithm selects a subset from the focus with maximum
404// utility of penalization and penalizes it.
405
406// It has been observed that good values for the penalisation factor can be
407// found by dividing the value of the objective function of a local minimum
408// with the number of features present in it [1]. In our case, the penalisation
409// factor is the sum of the costs of the subsets selected in the focus divided
410// by the number of subsets in the focus times a tunable factor alpha_.
411// [1] C. Voudouris (1997) "Guided local search for combinatorial optimisation
412// problems", PhD Thesis, University of Essex, Colchester, UK, July, 1997.
414 public:
416 : inv_(inv), epsilon_(kDefaultEpsilon), alpha_(kDefaultAlpha) {
417 Initialize();
418 }
419
420 // Initializes the Guided Local Search algorithm.
421 void Initialize();
422
423 // Returns the next solution by running the Guided Local algorithm for
424 // maximum num_iterations iterations.
425 bool NextSolution(int num_iterations);
426
427 // Computes the next partial solution considering only the subsets whose
428 // indices are in focus.
429 bool NextSolution(absl::Span<const SubsetIndex> focus, int num_iterations);
430
431 bool NextSolution(const SubsetBoolVector& in_focus, int num_iterations);
432
433 private:
434 // The data structure that will maintain the invariant for the model.
435 SetCoverInvariant* inv_;
436
437 // Setters and getters for the Guided Local Search algorithm parameters.
438 void SetEpsilon(double r) { epsilon_ = r; }
439
440 double GetEpsilon() const { return epsilon_; }
441
442 void SetAlpha(double r) { alpha_ = r; }
443
444 double GetAlpha() const { return alpha_; }
445
446 // The epsilon value for the Guided Local Search algorithm.
447 // Used to penalize the subsets within epsilon of the maximum utility.
448 static constexpr double kDefaultEpsilon = 1e-8;
449 double epsilon_;
450
451 // The alpha value for the Guided Local Search algorithm.
452 // Tunable factor used to penalize the subsets.
453 static constexpr double kDefaultAlpha = 0.5;
454 double alpha_;
455
456 // The penalization value for the Guided Local Search algorithm.
457 double penalization_factor_;
458
459 // The penalties of each feature during Guided Local Search.
460 SubsetToIntVector penalties_;
461
462 // Computes the delta of the cost of the solution if subset state changed.
463 Cost ComputeDelta(SubsetIndex subset) const;
464
465 // The priority heap used to select the subset with the maximum priority to be
466 // updated.
468
469 // The utility heap used to select the subset with the maximum utility to be
470 // penalized.
472};
473
474// Randomly clears a proportion num_subsets variables in the solution.
475// Returns a list of subset indices to be potentially reused as a focus.
476// Randomly clears at least num_subsets variables in the
477// solution. There can be more than num_subsets variables cleared because the
478// intersecting subsets are also removed from the solution. Returns a list of
479// subset indices that can be reused as a focus.
480std::vector<SubsetIndex> ClearRandomSubsets(std::size_t num_subsets,
481 SetCoverInvariant* inv);
482
483// Same as above, but clears the subset indices in focus.
484std::vector<SubsetIndex> ClearRandomSubsets(absl::Span<const SubsetIndex> focus,
485 std::size_t num_subsets,
486 SetCoverInvariant* inv);
487
488// Clears the variables (subsets) that cover the most covered elements. This is
489// capped by num_subsets. If the cap is reached, the subsets are chosen
490// randomly.
491// Returns the list of the chosen subset indices.
492// This indices can then be used ax a focus.
493std::vector<SubsetIndex> ClearMostCoveredElements(std::size_t num_subsets,
494 SetCoverInvariant* inv);
495
496// Same as above, but clears the subset indices in focus.
497std::vector<SubsetIndex> ClearMostCoveredElements(
498 absl::Span<const SubsetIndex> focus, std::size_t num_subsets,
499 SetCoverInvariant* inv);
500} // namespace operations_research
501
502#endif // OR_TOOLS_ALGORITHMS_SET_COVER_HEURISTICS_H_
IntegerValue size
bool NextSolution(const SubsetBoolVector &in_focus, int num_iterations)
void Initialize()
Initializes the Guided Local Search algorithm.
void Initialize()
Initializes the Guided Tabu Search algorithm.
void SetPenaltyFactor(double factor)
Setters and getters for the Guided Tabu Search algorithm parameters.
bool NextSolution(const SubsetBoolVector &in_focus, int num_iterations)
Preprocessor(absl::Nonnull< SetCoverInvariant * > inv)
int num_columns_fixed_by_singleton_row() const
Returns the number of columns that are the only one for a given row.
bool NextSolution()
Returns true if a solution was found.
Priority aggregate for the subset priority queue.
SubsetIndexWithPriority(Priority priority, Index index)
bool operator<(const SubsetIndexWithPriority other) const
void Init(int size)
Initializes the array of the Tabu list.
bool Contains(T t) const
Returns true if t is in the array. This is O(size), but small.
int size() const
Returns the size of the array.
void Add(T t)
Adds t to the array. When the end of the array is reached, re-start at 0.
In SWIG mode, we don't want anything besides these top-level includes.
std::vector< SubsetIndex > ClearMostCoveredElements(std::size_t max_num_subsets, SetCoverInvariant *inv)
util_intops::StrongVector< SubsetIndex, BaseInt, IntAllocator > SubsetToIntVector
std::vector< SubsetIndex > ClearRandomSubsets(std::size_t num_subsets, SetCoverInvariant *inv)
double Cost
Basic non-strict type for cost. The speed penalty for using double is ~2%.