Dictionary and SortedDictionary
Dictionary and SortedDictionary
Hello everybody. Today I plan to talk about Dictionary
and SortedDictionary
in .NET – how they are arranged and what is the difference between them.
Why?
First, I was asked about this several times in interviews and the first time I was completely lost with the answer, which was not the most pleasant experience that I want to spare you. Second, a dictionary is one of the most frequently used data structures, and when developing it is useful to understand what its subdomains are, as well as to know when to use SortedDictionary
justified
Dictionary.
To begin with, let’s deal with Dictionary<TKey, TValue>
. It is a collection of keyvalue pairs. The key must be unique. On average, receiving, adding, removing an element from it takes place according to . How does this happen? Let’s understand each other.
Internally, the dictionary uses a structure called name Entry
and two arrays buckets
and entries
.
private struct Entry {
public int hashCode; // хеш код, вычисленный для ключа
public int next; // индекс следующего элемента с тем же хешем, 1, если текущий элемент последний
public TKey key;
public TValue value;
}
private int[] buckets; // индексы начала бакетов
private Entry[] entries; // элементы словаря
Each element is stored in a bucket, which corresponds to the remainder of the division of the hash from the key by the size of the array buckets
. Note that we have possible collisions when two different keys give the same hash. In this case, we will have a linked list of elements in the bucket.
As long as we keep the bucket sizes small (03 elements) and their number proportional to the number of elements (ideally, each bucket contains 0 or 1 element), we get an averaged access by , because internally we take an element from the array by its index. Two arrays are used to implement a similar structure (an array containing linked lists). buckets
and entries
. In the array bucket
the index is the bucket number, and the value is the beginning of the linked list with elements entries
.
As the number of elements increases, the size of arrays also increases. At each moment of time, the size of the array buckets
is a prime number. And here are the simplest numbers? Now we will find out.
To determine in which bucket to put the added pair, a hash of the key is calculated inside the dictionary [1]. Then we take the remainder from dividing that hash by the size of the array buckets
and put our pair in the index with the received value. This way, we guarantee that any hash result of the function will be within the size of the array buckets
[2]. However, if we choose the array sizes randomly, we can distribute the load of values badly, for example, many numbers have the same remainder of 2 or some other power of that wonderful number. In contrast, prime numbers by definition have only two divisors, one and the number itself, so the probability that the remainder from dividing two different numbers by a third prime number will be the same is very small. Due to this, we guarantee that with a normally working hash function, the buckets will be filled evenly.
Separately, note that in the dictionary, in order to distinguish between two objects with the same hash, the method is used when requesting the value Equals
.
On this, I will finish with the analysis of the mechanism of workDictionary
. I didn’t mention here how the deletion and then the insertion of elements is handled, but to understand why on average a dictionary works for this is not necessary.
Now let’s talk a little about thread safety. As long as the different threads only read data from the dictionary, it will be safe. If we want to somehow modify the dictionary, we need to use ConcurrentDictionary
or ImmutableDictionary
, or own data access synchronization mechanism. An important point, if during a parallel passage through the dictionary through Parallel.ForEach
(or in some other way) we somehow modify the current value, then such an operation will not be safe.
Intermediate results and recommendations:

The effectiveness of the dictionary depends on the quality of the hash key function. If the hash function gives many collisions, the dictionary will not save us. When overriding
GetHashCode
to use class objects as a key you should keep this in mind. 
If we have a constantly growing number of elements in the dictionary, it makes sense to immediately set some simple number as the initial capacity of the dictionary. This will help to avoid the load associated with the reconstruction of arrays
buckets
andentries
during expansion. 
If two keys have the same hash function result, they are then compared by
Equals
. 
Himself
Dictionary
is threadsafe as long as different threads only read data from it. In other cases, synchronization should be ensured or usedConcurrentDictionary
,ImmutableDictionary
.
Created by Citonaria.
As the name suggests SortedDictionary<TKey, TValue>
– a collection of keyvalue pairs that is permanently sorted by key. The key, as in the case of a regular dictionary, must be unique. However, disagreements continue. The speed of working with elements of a sorted dictionary is equal to , where the number of elements in the dictionary, while sometimes it will be faster than for a regular dictionary. What is it connected with? With internal implementation.
Inside SortedDictionary
is a binary search tree. It is no longer used here GetHashCode
and fission residues. The comparison takes place through the standard IComparable<TKey>
for TKey
or through the passed in constructor for SortedDictionary
object IComparer<TKey>
. Ago SortedDictionary
does not suffer from frequent collisions, which will give us a more efficient interaction than with a regular dictionary in this scenario. A binary search tree itself is a data structure for which the following statement is correct:
Each vertex has 0 to 2 descendants, all elements in the left subtree are less than or equal to the value in the parent vertex, and all elements in the right subtree are greater than the value in the parent.
There are different types of binary search trees that differ in their approach to balancing, including SortedDictionary
redblack wood is used. But I will not delve into the construction of trees in this article, maybe another time.
For us, the main thing is to understand the temporal complexity of operations SortedDictionary
and what it is related to.
For a sorted dictionary, security is guaranteed only for reading data. If we want to somehow modify the sorted dictionary or any of the values, we need to implement our synchronization mechanism or block the collection entirely for the time when changes are possible.
Intermediate results and recommendations:

If the order of the keys in the dictionary is important to us, we should use
SortedDictionary
. The main thing to remember is that the complexity of operations is average. 
If we only need a reversed output, and the rest of the time the order of the keys does not play a role, we can think in the direction of a regular dictionary.

If we need a collection of keyvalue pairs, but key hashes often collide, it makes sense to use
SortedDictionary
instead of the usual one. It is also important when it comes to the predictability of the complexity of operations, for a hash it depends on the hash function, and sometimes the predictability of complexity is more important to us than the speed of operations. 
Since
SortedDictionary
keeps the order of the keys constant, key range selection operations can be performed faster than for a regular dictionary.
Conclusion.
After we learned how the inside works Dictionary
and SortedDictionary
, we can understand the specifics of working with each of these data structures. I hope you found this article useful and that you were able to better understand how and when to use these data structures.