Google OR-Tools v9.12
a fast and portable software suite for combinatorial optimization
Loading...
Searching...
No Matches
set_cover_model.h
Go to the documentation of this file.
1// Copyright 2010-2025 Google LLC
2// Licensed under the Apache License, Version 2.0 (the "License");
3// you may not use this file except in compliance with the License.
4// You may obtain a copy of the License at
5//
6// http://www.apache.org/licenses/LICENSE-2.0
7//
8// Unless required by applicable law or agreed to in writing, software
9// distributed under the License is distributed on an "AS IS" BASIS,
10// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11// See the License for the specific language governing permissions and
12// limitations under the License.
13
14#ifndef OR_TOOLS_ALGORITHMS_SET_COVER_MODEL_H_
15#define OR_TOOLS_ALGORITHMS_SET_COVER_MODEL_H_
16
17#include <cstdint>
18#include <string>
19#include <vector>
20
21#include "absl/log/check.h"
22#include "absl/strings/str_cat.h"
23#include "ortools/algorithms/set_cover.pb.h"
26
27// Representation class for the weighted set-covering problem.
28//
29// Let E be a "universe" set, let (S_j) be a family (j in J) of subsets of E,
30// and c_j costs associated to each S_j. Note that J = {j in 1..|S|}.
31//
32// The minimum-cost set-covering problem consists in finding K (for covering),
33// a subset of J such that the union of all the S_j for k in K is equal to E
34// (the subsets indexed by K "cover" E), while minimizing total cost sum c_k (k
35// in K).
36//
37// In Mixed-Integer Programming and matrix terms, the goal is to find values
38// of binary variables x_j, where x_j is 1 when subset S_j is in K, 0
39// otherwise, that minimize the sum of c_j * x_j subject to M.x >= 1. Each row
40// corresponds to an element in E.
41//
42// The matrix M for linear constraints is defined as follows:
43// - it has as many rows as there are elements in E.
44// - its columns are such that M(i, j) = 1 iff the i-th element of E is present
45// in S_j.
46//
47// We also use m to denote |E|, the number of elements, and n to denote |S|, the
48// number of subsets.
49// Finally, NNZ denotes the numbers of non-zeros, i.e. the sum of the
50// cardinalities of all the subsets.
51
52namespace operations_research {
53// Basic non-strict type for cost. The speed penalty for using double is ~2%.
54using Cost = double;
55
56// Base non-strict integer type for counting elements and subsets.
57// Using ints makes it possible to represent problems with more than 2 billion
58// (2e9) elements and subsets. If need arises one day, BaseInt can be split
59// into SubsetBaseInt and ElementBaseInt.
60// Quick testing has shown a slowdown of about 20-25% when using int64_t.
61using BaseInt = int32_t;
62
63// We make heavy use of strong typing to avoid obvious mistakes.
64// Subset index.
66
67// Element index.
69
70// Position in a vector. The vector may either represent a column, i.e. a
71// subset with all its elements, or a row, i,e. the list of subsets which
72// contain a given element.
75
79
82
85
88
89// Views of the sparse vectors. These need not be aligned as it's their contents
90// that need to be aligned.
93
96
97// Useful for representing permutations,
102
103// Main class for describing a weighted set-covering problem.
105 public:
106 // Constructs an empty weighted set-covering problem.
108 : num_elements_(0),
109 num_subsets_(0),
110 num_nonzeros_(0),
111 row_view_is_valid_(false),
112 elements_in_subsets_are_sorted_(false),
113 subset_costs_(),
114 columns_(),
115 rows_(),
116 all_subsets_() {}
117
118 // Constructs a weighted set-covering problem from a seed model, with
119 // num_elements elements and num_subsets subsets.
120 // - The distributions of the degrees of the elements and the cardinalities of
121 // the subsets are based on those of the seed model. They are scaled
122 // affininely by row_scale and column_scale respectively.
123 // - By affine scaling, we mean that the minimum value of the distribution is
124 // not scaled, but the variation above this minimum value is.
125 // - For a given subset with a given cardinality in the generated model, its
126 // elements are sampled from the distribution of the degrees as computed
127 // above.
128 // - The costs of the subsets in the new model are sampled from the
129 // distribution of the costs of the subsets in the seed model, scaled by
130 // cost_scale.
131 // IMPORTANT NOTICE: The algorithm may not succeed in generating a model
132 // where all the elements can be covered. In that case, the model will be
133 // empty.
134
135 static SetCoverModel GenerateRandomModelFrom(const SetCoverModel& seed_model,
138 double row_scale,
139 double column_scale,
140 double cost_scale);
141
142 // Returns true if the model is empty, i.e. has no elements, no subsets, and
143 // no nonzeros.
144 bool IsEmpty() const { return rows_.empty() || columns_.empty(); }
145
146 // Current number of elements to be covered in the model, i.e. the number of
147 // elements in S. In matrix terms, this is the number of rows.
148 BaseInt num_elements() const { return num_elements_; }
149
150 // Current number of subsets in the model. In matrix terms, this is the
151 // number of columns.
152 BaseInt num_subsets() const { return num_subsets_; }
153
154 // Current number of nonzeros in the matrix.
155 int64_t num_nonzeros() const { return num_nonzeros_; }
156
157 double FillRate() const {
158 return 1.0 * num_nonzeros() / (1.0 * num_elements() * num_subsets());
159 }
160
161 // Vector of costs for each subset.
162 const SubsetCostVector& subset_costs() const { return subset_costs_; }
163
164 // Column view of the set covering problem.
165 const SparseColumnView& columns() const { return columns_; }
166
167 // Row view of the set covering problem.
168 const SparseRowView& rows() const {
169 DCHECK(row_view_is_valid_);
170 return rows_;
171 }
172
173 // Returns true if rows_ and columns_ represent the same problem.
174 bool row_view_is_valid() const { return row_view_is_valid_; }
175
176 // Access to the ranges of subsets and elements.
180
183 ElementIndex(num_elements_));
184 }
185
186 // Returns the list of indices for all the subsets in the model.
187 std::vector<SubsetIndex> all_subsets() const { return all_subsets_; }
188
189 // Adds an empty subset with a cost to the problem. In matrix terms, this
190 // adds a column to the matrix.
191 void AddEmptySubset(Cost cost);
192
193 // Adds an element to the last subset created. In matrix terms, this adds a
194 // 1 on row 'element' of the current last column of the matrix.
195 void AddElementToLastSubset(BaseInt element);
196 void AddElementToLastSubset(ElementIndex element);
197
198 // Sets 'cost' to an already existing 'subset'.
199 // This will CHECK-fail if cost is infinite or a NaN.
200 void SetSubsetCost(BaseInt subset, Cost cost);
201 void SetSubsetCost(SubsetIndex subset, Cost cost);
202
203 // Adds 'element' to an already existing 'subset'.
204 // No check is done if element is already in the subset.
205 void AddElementToSubset(BaseInt element, BaseInt subset);
206 void AddElementToSubset(ElementIndex element, SubsetIndex subset);
207
208 // Sorts the elements in each subset. Should be called before exporting the
209 // model to a proto.
211
212 // Creates the sparse ("dual") representation of the problem. This also sorts
213 // the elements in each subset.
214 void CreateSparseRowView();
215
216 // Returns true if the problem is feasible, i.e. if the subsets cover all
217 // the elements.
218 bool ComputeFeasibility() const;
219
220 // Reserves num_subsets columns in the model.
222 void ReserveNumSubsets(SubsetIndex num_subsets);
223
224 // Reserves num_elements rows in the column indexed by subset.
227 SubsetIndex subset);
228
229 // Returns the model as a SetCoverProto. Note that the elements of each subset
230 // are sorted locally before being exported to the proto. This is done to
231 // ensure that the proto is deterministic. The function is const because it
232 // does not modify the model. Therefore, the model as exported by this
233 // function may be different from the initial model.
234 SetCoverProto ExportModelAsProto() const;
235
236 // Imports the model from a SetCoverProto.
237 void ImportModelFromProto(const SetCoverProto& message);
238
239 // A struct enabling to show basic statistics on rows and columns.
240 // The meaning of the fields is obvious.
241 struct Stats {
242 double min;
243 double max;
244 double median;
245 double mean;
246 double stddev;
247
248 std::string DebugString() const {
249 return absl::StrCat("min = ", min, ", max = ", max, ", mean = ", mean,
250 ", median = ", median, ", stddev = ", stddev, ", ");
251 }
252 };
253
254 // Computes basic statistics on costs and returns a Stats structure.
255 Stats ComputeCostStats();
256
257 // Computes basic statistics on rows and returns a Stats structure.
258 Stats ComputeRowStats();
259
260 // Computes basic statistics on columns and returns a Stats structure.
261 Stats ComputeColumnStats();
262
263 // Computes deciles on rows and returns a vector of deciles.
264 std::vector<int64_t> ComputeRowDeciles() const;
265
266 // Computes deciles on columns and returns a vector of deciles.
267 std::vector<int64_t> ComputeColumnDeciles() const;
268
269 // Computes basic statistics on the deltas of the row and column elements and
270 // returns a Stats structure. The deltas are computed as the difference
271 // between two consecutive indices in rows or columns. The number of bytes
272 // computed is meant using a variable-length base-128 encoding.
273 // TODO(user): actually use this to compress the rows and columns.
274 Stats ComputeRowDeltaSizeStats() const;
275 Stats ComputeColumnDeltaSizeStats() const;
276
277 private:
278 // Updates the all_subsets_ vector so that it always contains 0 to
279 // columns.size() - 1
280 void UpdateAllSubsetsList();
281
282 // Number of elements.
283 BaseInt num_elements_;
284
285 // Number of subsets. Maintained for ease of access.
286 BaseInt num_subsets_;
287
288 // Number of nonzeros in the matrix. The value is an int64_t because there can
289 // be more than 1 << 31 nonzeros even with BaseInt = int32_t.
290 int64_t num_nonzeros_;
291
292 // True when the SparseRowView is up-to-date.
293 bool row_view_is_valid_;
294
295 // True when the elements in each subset are sorted.
296 bool elements_in_subsets_are_sorted_;
297
298 // Costs for each subset.
299 SubsetCostVector subset_costs_;
300
301 // Vector of columns. Each column corresponds to a subset and contains the
302 // elements of the given subset.
303 // This takes NNZ (number of non-zeros) BaseInts, or |E| * |S| * fill_rate.
304 // On classical benchmarks, the fill rate is in the 2 to 5% range.
305 // Some synthetic benchmarks have fill rates of 20%, while benchmarks for
306 // rail rotations have a fill rate of 0.2 to 0.4%.
307 // TODO(user): try using a compressed representation like VarInt or LEB128,
308 // since the data is only iterated upon.
309 SparseColumnView columns_;
310
311 // Vector of rows. Each row corresponds to an element and contains the
312 // subsets containing the element.
313 // The size is exactly the same as for columns_.
314 SparseRowView rows_;
315
316 // Vector of indices from 0 to columns.size() - 1. (Like std::iota, but built
317 // incrementally.) Used to (un)focus optimization algorithms on the complete
318 // problem.
319 // This takes |S| BaseInts.
320 // TODO(user): use this to enable deletion and recycling of columns/subsets.
321 // TODO(user): replace this with an iterator?
322 std::vector<SubsetIndex> all_subsets_;
323};
324
325// The IntersectingSubsetsIterator is a forward iterator that returns the next
326// intersecting subset for a fixed seed_subset.
327// The iterator is initialized with a model and a seed_subset and
328// allows a speedup in getting the intersecting subsets
329// by not storing them in memory.
330// The iterator is at the end when the last intersecting subset has been
331// returned.
332// TODO(user): Add the possibility for range-for loops.
334 public:
336 SubsetIndex seed_subset)
337 : intersecting_subset_(-1),
338 element_entry_(0),
339 subset_entry_(0),
340 seed_subset_(seed_subset),
341 model_(model),
342 subset_seen_(model_.columns().size(), false) {
343 CHECK(model_.row_view_is_valid());
344 subset_seen_[seed_subset] = true; // Avoid iterating on `seed_subset`.
345 ++(*this); // Move to the first intersecting subset.
346 }
347
348 // Returns (true) whether the iterator is at the end.
349 bool at_end() const {
350 return element_entry_.value() == model_.columns()[seed_subset_].size();
351 }
352
353 // Returns the intersecting subset.
354 SubsetIndex operator*() const { return intersecting_subset_; }
355
356 // Move the iterator to the next intersecting subset.
358 DCHECK(model_.row_view_is_valid());
359 DCHECK(!at_end());
360 const SparseRowView& rows = model_.rows();
361 const SparseColumn& column = model_.columns()[seed_subset_];
362 for (; element_entry_ < ColumnEntryIndex(column.size()); ++element_entry_) {
363 const ElementIndex current_element = column[element_entry_];
364 const SparseRow& current_row = rows[current_element];
365 for (; subset_entry_ < RowEntryIndex(current_row.size());
366 ++subset_entry_) {
367 intersecting_subset_ = current_row[subset_entry_];
368 if (!subset_seen_[intersecting_subset_]) {
369 subset_seen_[intersecting_subset_] = true;
370 return *this;
371 }
372 }
373 subset_entry_ = RowEntryIndex(0); // 'carriage-return'
374 }
375 return *this;
376 }
377
378 private:
379 // The intersecting subset.
380 SubsetIndex intersecting_subset_;
381
382 // The position of the entry in the column corresponding to `seed_subset_`.
383 ColumnEntryIndex element_entry_;
384
385 // The position of the entry in the row corresponding to `element_entry`.
386 RowEntryIndex subset_entry_;
387
388 // The seed subset.
389 SubsetIndex seed_subset_;
390
391 // The model to which the iterator is applying.
392 const SetCoverModel& model_;
393
394 // A vector of Booleans indicating whether the current subset has been
395 // already seen by the iterator.
396 SubsetBoolVector subset_seen_;
397};
398
399} // namespace operations_research
400
401#endif // OR_TOOLS_ALGORITHMS_SET_COVER_MODEL_H_
bool at_end() const
Returns (true) whether the iterator is at the end.
SubsetIndex operator*() const
Returns the intersecting subset.
IntersectingSubsetsIterator & operator++()
Move the iterator to the next intersecting subset.
IntersectingSubsetsIterator(const SetCoverModel &model, SubsetIndex seed_subset)
Main class for describing a weighted set-covering problem.
void ImportModelFromProto(const SetCoverProto &message)
Imports the model from a SetCoverProto.
static SetCoverModel GenerateRandomModelFrom(const SetCoverModel &seed_model, BaseInt num_elements, BaseInt num_subsets, double row_scale, double column_scale, double cost_scale)
void ReserveNumElementsInSubset(BaseInt num_elements, BaseInt subset)
Reserves num_elements rows in the column indexed by subset.
Stats ComputeColumnStats()
Computes basic statistics on columns and returns a Stats structure.
std::vector< int64_t > ComputeRowDeciles() const
Computes deciles on rows and returns a vector of deciles.
util_intops::StrongIntRange< ElementIndex > ElementRange() const
void AddElementToLastSubset(BaseInt element)
Stats ComputeCostStats()
Computes basic statistics on costs and returns a Stats structure.
Stats ComputeRowStats()
Computes basic statistics on rows and returns a Stats structure.
const SparseRowView & rows() const
Row view of the set covering problem.
void ReserveNumSubsets(BaseInt num_subsets)
Reserves num_subsets columns in the model.
int64_t num_nonzeros() const
Current number of nonzeros in the matrix.
util_intops::StrongIntRange< SubsetIndex > SubsetRange() const
Access to the ranges of subsets and elements.
void AddElementToSubset(BaseInt element, BaseInt subset)
void SetSubsetCost(BaseInt subset, Cost cost)
bool row_view_is_valid() const
Returns true if rows_ and columns_ represent the same problem.
const SubsetCostVector & subset_costs() const
Vector of costs for each subset.
SetCoverProto ExportModelAsProto() const
SetCoverModel()
Constructs an empty weighted set-covering problem.
std::vector< int64_t > ComputeColumnDeciles() const
Computes deciles on columns and returns a vector of deciles.
const SparseColumnView & columns() const
Column view of the set covering problem.
std::vector< SubsetIndex > all_subsets() const
Returns the list of indices for all the subsets in the model.
In SWIG mode, we don't want anything besides these top-level includes.
util_intops::StrongVector< RowEntryIndex, SubsetIndex > SparseRow
util_intops::StrongVector< SubsetIndex, Cost > SubsetCostVector
util_intops::StrongVector< SubsetIndex, BaseInt > SubsetToIntVector
util_intops::StrongIntRange< SubsetIndex > SubsetRange
util_intops::StrongVector< SubsetIndex, SubsetIndex > SubsetToSubsetVector
util_intops::StrongVector< ElementIndex, Cost > ElementCostVector
util_intops::StrongVector< ElementIndex, ElementIndex > ElementToElementVector
Useful for representing permutations,.
util_intops::StrongIntRange< ColumnEntryIndex > ColumnEntryRange
util_intops::StrongVector< ElementIndex, BaseInt > ElementToIntVector
util_intops::StrongVector< SubsetIndex, bool > SubsetBoolVector
util_intops::StrongVector< SubsetIndex, SparseColumn > SparseColumnView
util_intops::StrongVector< ElementIndex, bool > ElementBoolVector
double Cost
Basic non-strict type for cost. The speed penalty for using double is ~2%.
util_intops::StrongIntRange< ElementIndex > ElementRange
util_intops::StrongVector< ColumnEntryIndex, ElementIndex > SparseColumn
util_intops::StrongVector< ElementIndex, SparseRow > SparseRowView
#define DEFINE_STRONG_INT_TYPE(int_type_name, value_type)
Definition strong_int.h:168