Introduction to Sets & Maps
Authors: Darren Yao, Benjamin Qi, Allen Li, Jesse Choe, Nathan Gong
Maintaining collections of distinct elements/keys with sets and maps.
Prerequisites
Resources | ||||
---|---|---|---|---|
IUSACO | module is based off this | |||
CPH | covers similar material |
C++
C++ contains two versions of sets and maps; one using sorting and the other using hashing.
Java
Java contains two versions of sets and maps; one using sorting and the other using hashing.
Python
Python has built-in sets and maps, however they do not store elements in sorted order.
Sets
Focus Problem – try your best to solve this problem before continuing!
View Internal SolutionA set is a collection of elements that contains no duplicates.
C++
Unordered Sets
In unordered sets, elements are stored in an arbitrary order through hashing.
Insertions, deletions, and searches are all (with a high
constant factor). Unordered sets are implemented by
std:unordered_set
in the <unordered_set>
header.
Some operations on an std::unordered_set
named s
include:
s.insert(x)
, which adds the elementx
tos
if not already present.s.erase(x)
, which removes the elementx
froms
if present.s.count(x)
, which returns1
ifs
containsx
and0
if it doesn't.
Unordered sets work with primitive types, but require a custom hash function for structures/classes like vectors and pairs.
Warning: Unordered Set Performance
unordered_set
actually has worst-case behavior, and
a program that uses it times out on Distinct Numbers.
For more on unordered maps and sets, check out this module.
In any case, just default to using ordered sets in C++.
unordered_set<int> s;s.insert(1); // {1}s.insert(4); // {1, 4}s.insert(2); // {1, 4, 2}s.insert(1); // does nothing because 1's already in the setcout << s.count(1) << endl; // 1s.erase(1); // {2, 4}cout << s.count(5) << endl; // 0s.erase(0); // does nothing because 0 wasn't in the set
Sorted Sets
In sorted sets, the elements are sorted in order of element. Insertions,
deletions, and searches are all , where is the number
of elements in the set. Sorted sets are implemented by
std:set
in the <set>
header.
std::set
includes all the essential operations that std::unordered_set
has
(including insertion, deletion, searches, etc.), but also some additional ones.
Refer to the
More Operations on Sorted Sets
module for more detail.
You can iterate through a set in sorted order using a for-each loop.
set<int> s;s.insert(1); // [1]s.insert(4); // [1, 4]s.insert(2); // [1, 2, 4]// Outputs 1, 2, and 4 on separate linesfor (int element : s) { cout << element << endl; }
Java
Unordered Sets
In unordered sets, elements are stored in an arbitrary order. Insertions,
deletions, and searches are all (with a high constant factor).
Unordered sets are implemented in Java by the
HashSet
class (which comes from the java.util
library).
Some operations on a HashSet
named set
include:
set.add(x)
, which adds the elementx
toset
if not already present.set.remove(x)
, which removes the elementx
fromset
if present.set.contains(x)
, which checks whetherset
contains the elementx
.
Unordered sets work with primitive types and their object wrappers, but require a custom hash function for custom classes.
Set<Integer> set = new HashSet<>();set.add(1); // {1}set.add(4); // {1, 4}set.add(2); // {1, 4, 2}set.add(1); // {1, 4, 2}// the add method did nothing because 1 was already in the setSystem.out.println(set.contains(1)); // trueset.remove(1); // {4, 2}System.out.println(set.contains(5)); // falseset.remove(0); // does nothing because 0 wasn't in the set
Sorted Sets
In sorted sets, the elements are stored in order. Insertions, deletions, and
searches are all , where is the number of elements in
the set. Sorted sets are implemented in Java by the TreeSet
class.
TreeSet
includes all the operations that HashSet
has, but also includes some
additional ones. Refer to the
More Operations on Sorted Sets
module for more detail.
You can iterate through a TreeSet
in sorted order using a for-each loop.
Set<Integer> set = new TreeSet<>();set.add(1); // {1}set.add(4); // {1, 4}set.add(2); // {1, 2, 4}// Outputs 1, 2, and 4 on separate linesfor (int element : set) { System.out.println(element); }
Python
Python's built-in set
uses hashing to support insertion,
deletion, and searches. Some operations on a Python set
named s
include:
s.add(x)
: adds elementx
tos
if not already presents.remove(x)
: removes an elementx
fromset
if presentx in s
: checks whethers
contains the elementx
s = set()s.add(1) # {1}s.add(4) # {1, 4}s.add(2) # {1, 4, 2}s.add(1) # {1, 4, 2}# the add method did nothing because 1 was already in the setprint(1 in s) # Trues.remove(1) # {4, 2}print(5 in s) # Falses.remove(0) # {4, 2}# if the element to be removed does not exist, nothing happens
Solution - Distinct Numbers
This problem asks us to calculate the number of distinct values in a given list.
Method 1 - Set
This is probably the easier of the two methods, but requires knowledge of sets. Because sets only store one copy of each value, we can insert all the numbers into a set, and then print out the size of the set.
C++
#include <bits/stdc++.h>using namespace std;int main() {int n;cin >> n;set<int> distinctNumbers;for (int i = 0; i < n; i++) {
Java
// Source: Danielimport java.io.*;import java.util.*;public class DistinctNumbers {public static void main(String[] args) throws IOException {Kattio io = new Kattio();int n = io.nextInt();Set<Integer> set = new HashSet<>();
Python
n = int(input()) # unusednums = [int(x) for x in input().split()]distinct_nums = set(nums)print(len(distinct_nums))
We can do this more efficiently by skipping the creation of the list, and use a set comprehension directly:
n = int(input()) # unuseddistinct_nums = {int(x) for x in input().split()}print(len(distinct_nums))
Warning!
The solutions above do not receive full credit on CSES, because it is possible
to construct test cases where Python set
s and dict
s are extremely slow.
See this CF post for more information.
Hack Case Generator
To fix this, we can use str
s as keys instead of int
s:
n = int(input()) # unuseddistinct_nums = set(input().split())print(len(distinct_nums))
Should I worry about anti-hash tests in USACO?
No - historically, no USACO problem has included an anti-hash test. However, these sorts of tests often appear in Codeforces, especially in educational rounds, where open hacking is allowed.
Method 2 - Sorting
Check out the solution involving sorting.
Maps
Focus Problem – try your best to solve this problem before continuing!
A map is a set of entries, each consisting of a key and a value. In a map, all keys are required to be unique, but values can be repeated. Maps have three primary methods:
- one to add a specified key-value pairing
- one to retrieve the value for a given key
- one to remove a key-value pairing from the map
C++
In sorted maps, the pairs are sorted in order of key. Insertions, deletions,
and searches are all , where is the number of pairs in
the map. In unordered maps, the pairs aren't kept in sorted order and all
insertions, deletions, and searches are all . Sorted maps are
implemented with std::map
and unordered maps are implemented with
std::unordered_map
.
Some operations on an std::map
and std::unordered_map
named m
include:
m[key]
, which returns a reference to the value associated with the keykey
.- If
key
is not present in the map, then the value associated withkey
is constructed using the default constructor of the value type. For example, if the value type isint
, then callingm[key]
for a key not within the map sets the value associated with that key to0
. As another example, if the value type isstd::string
, then callingm[key]
for a key not within the map sets the value associated with that key to the empty string. More discussion regarding what happens in this case can be found here. - Alternatively,
m.at(key)
behaves the same asm[key]
ifkey
is contained withinm
but throws an exception otherwise. m[key] = value
will assign the valuevalue
to the keykey
.
m.count(key)
, which returns the number of times the key is in the map (either one or zero), and therefore checks whether a key exists in the map.m.erase(key)
, which removes the map entry associated with the specified key if the key was present in the map.
map<int, int> m;m[1] = 5; // [(1, 5)]m[3] = 14; // [(1, 5); (3, 14)]m[2] = 7; // [(1, 5); (2, 7); (3, 14)]m[0] = -1; // [(0, -1); (1, 5); (2, 7); (3, 14)]m.erase(2); // [(0, -1); (1, 5); (3, 14)]cout << m[1] << endl; // 5cout << m.count(7) << endl; // 0cout << m.count(1) << endl; // 1cout << m[2] << endl; // 0
Java
In sorted maps, the pairs are sorted in order of key. Insertions, deletions, and searches are all , where is the number of pairs in the map.
In unordered maps, the pairs aren't kept in sorted order and all insertions,
deletions, and searches are all . Sorted maps are implemented
with TreeMap
and unordered maps are implemented with HashMap
.
In both TreeMap
and HashMap
, the put(key, value)
method assigns a value to
a key and places the key and value pair into the map. The get(key)
method
returns the value associated with the key. The containsKey(key)
method checks
whether a key exists in the map. Lastly, remove(key)
removes the map entry
associated with the specified key.
Map<Integer, Integer> map = new TreeMap<Integer, Integer>();map.put(1, 5); // [(1, 5)]map.put(3, 14); // [(1, 5); (3, 14)]map.put(2, 7); // [(1, 5); (2, 7); (3, 14)]map.remove(2); // [(1, 5); (3, 14)]System.out.println(map.get(1)); // 5System.out.println(map.containsKey(7)); // falseSystem.out.println(map.containsKey(1)); // true
Python
Colloquially, maps are referred to as dicts in python. They act as hash maps, so they have insertion, deletion, and searches.
d = {}d[1] = 5 # {1: 5}d[3] = 14 # {1: 5, 3: 14}d[2] = 7 # {1: 5, 2: 7, 3: 14}del d[2] # {1: 5, 3: 14}print(d[1]) # 5print(7 in d) # Falseprint(1 in d) # True
Iterating Over Maps
C++
An std::map
stores entries as pairs in the
form {key, value}
. To iterate over maps, you can use a for
loop.
The auto
keyword suffices to iterate over any type of pair
(here, auto
substitutes for pair<int, int>
).
// Both of these output the same thingfor (const auto &x : m) { cout << x.first << " " << x.second << endl; }for (auto x : m) { cout << x.first << " " << x.second << endl; }
The first method (iterating over const references) is generally preferred over the second because the second will make a copy of each element that it iterates over. Additionally, you can pass by reference when iterating over a map, allowing you to modify the values (but not the keys) of the pairs stored in the map:
for (auto &x : m) {x.second = 1234; // Change all values to 1234}
Java
To iterate over maps, you can use a for-each loop over the keys:
for (int k : m.keySet()) { System.out.println(k + " " + m.get(k)); }
You can also use a for-each loop over the entries:
for (Map.Entry entry : m.entrySet()) {System.out.println(entry.getKey() + " " + entry.getValue());}
It's also possible to change the values while iterating over the keys (or over the values themselves, if they're mutable):
for (int k : m.keySet()) {m.put(k, 1234); // Change all values to 1234}
Python
To iterate over dict
s, there are three options, all of which involve for loops.
Dicts will be returned in the same order of insertion in
Python 3.6+.
You can iterate over the keys:
for key in d:print(key)
Over the values:
for value in d.values():print(value)
And even over key-value pairs:
for key, value in d.items():print(key, value)
It's also possible to change the values while iterating over the keys (or over the values themselves, if they're mutable):
for key in d:d[key] = 1234 # Change all values to 1234
While you are free to change the values in a map when iterating over it (as demonstrated above), it is generally a bad idea to insert or remove elements of a map while iterating over it.
Python
For example, the following code attempts to remove every entry from a map, but results in a runtime error.
d = {i: i for i in range(10)}for i in d:del d[i]
Traceback (most recent call last): File "test.py", line 3, in <module> for i in d: RuntimeError: dictionary changed size during iteration
One way is to get around this is to create a new map.
d = {i: i for i in range(10)}# only includes every third elementd_new = dict(item for i, item in enumerate(d.items()) if i % 3 == 0)print("new dict:", d_new) # new dict: {0: 0, 3: 3, 6: 6, 9: 9}
Another is to maintain a list of all the keys you want to remove and remove them after the iteration finishes:
d = {i: i for i in range(10)}# removes every third elementto_remove = {key for i, key in enumerate(d) if i % 3 == 0}for key in to_remove:del d[key]print("new dict:", d) # new dict: {1: 1, 2: 2, 4: 4, 5: 5, 7: 7, 8: 8}
C++
For example, the following code attempts to remove every entry from a map, but results in a segmentation fault.
map<int, int> m;for (int i = 0; i < 10; ++i) m[i] = i;for (auto &it : m) {cout << "Current Key: " << it.first << endl;m.erase(it.first);}
The reason is due to "iterators, pointers and references referring to elements
removed by the function [being] invalidated" (as stated in the documentation for
erase
), though iterators
are beyond the scope of this module.
One way to get around this is to just create a new map instead of removing from the old one.
map<int, int> m, M;for (int i = 0; i < 10; ++i) m[i] = i;int current_iteration = 0;for (const auto &it : m) {// only includes every third elementif (current_iteration % 3 == 0) { M[it.first] = it.second; }
Another is to maintain a list of all the keys you want to erase and erase them after the iteration finishes.
map<int, int> m;for (int i = 0; i < 10; ++i) { m[i] = i; }vector<int> to_erase;int current_iteration = 0;for (const auto &it : m) {// removes every third elementif (current_iteration % 3 == 0) { to_erase.push_back(it.first); }
Java
Modifying a Collection (Set
, Map
, etc.) in the middle of a for-each loop
will cause a
ConcurrentModificationException.
See the following snippet for an example:
Map<Integer, Integer> m = new TreeMap<>();// m starts as {0: 0, 1: 1, 2: 2}m.put(0, 0);m.put(1, 1);m.put(2, 2);for (int key : m.keySet()) {m.remove(key); // ConcurrentModificationException thrown!!}
One work-around is to use Iterator
and the .remove()
method to remove
elements while looping over them, like in the next code snippet:
Map<Integer, Integer> m = new TreeMap<>();// m starts as {0: 0, 1: 1, 2: 2}m.put(0, 0);m.put(1, 1);m.put(2, 2);Iterator<Map.Entry<Integer, Integer>> iter = m.entrySet().iterator();while (iter.hasNext()) {int key = iter.next().getKey();if (key == 0 || key == 2) { iter.remove(); }
However, Iterator
is outside the scope of this module.
The easiest option (in most cases) if you want to remove/insert mutiple entries
at once is to use your Container's .addAll(c)
or .removeAll(c)
methods. That
means that you should put all the elements you want to remove (or add) in a new
Collection, and then use that new Collection as the parameter of the
.addAll(c)
or .removeAll(c)
method that you call on your original
Collection. See the following code snippet for an example (it works equivalently
to the code above):
Map<Integer, Integer> m = new TreeMap<>();// m starts as {0: 0, 1: 1, 2: 2}m.put(0, 0);m.put(1, 1);m.put(2, 2);Set<Integer> keysToRemove = new TreeSet<>();for (Map.Entry<Integer, Integer> entry : m.entrySet()) {int key = entry.getKey();if (key == 0 || key == 2) { keysToRemove.add(key); }
Problems
Some of these problems can be solved by sorting alone, though sets or maps could make their implementation easier.
Status | Source | Problem Name | Difficulty | Tags | |
---|---|---|---|---|---|
CSES | Easy | Show TagsMap | |||
Bronze | Easy | Show TagsSet | |||
Bronze | Normal | Show TagsSet, Simulation | |||
Bronze | Normal | Show TagsMap | |||
Bronze | Normal | Show TagsMap, Sorting | |||
Silver | Normal | Show TagsMap | |||
CF | Normal | Show TagsPrefix Sums, Set | |||
AC | Hard | Show TagsMap |
Check Your Understanding
C++
What is the time complexity of insertions, deletions, and searches in a sorted set of size ?
Java
What is the time complexity of insertions, deletions, and searches in a sorted set of size ?
Python
What is the time complexity of insertions, deletions, and searches in a sorted set of size ?
Module Progress:
Join the USACO Forum!
Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!