Dictionary and SortedDictionary

Dictionary and SortedDictionary

Dictionary and SortedDictionary

Hello everybody. Today I plan to talk about Dictionary and SortedDictionary in .NET – how they are arranged and what is the difference between them.

Why?

First, I was asked about this several times in interviews and the first time I was completely lost with the answer, which was not the most pleasant experience that I want to spare you. Second, a dictionary is one of the most frequently used data structures, and when developing it is useful to understand what its subdomains are, as well as to know when to use SortedDictionary justified

Dictionary.

To begin with, let’s deal with Dictionary<TKey, TValue>. It is a collection of key-value pairs. The key must be unique. On average, receiving, adding, removing an element from it takes place according to . How does this happen? Let’s understand each other.
Internally, the dictionary uses a structure called name Entry and two arrays buckets and entries.

private struct Entry {
  public int hashCode; // хеш код, вычисленный для ключа
  public int next;     // индекс следующего элемента с тем же хешем, -1, если текущий элемент последний 
  public TKey key;
  public TValue value;
}

private int[] buckets;   // индексы начала бакетов
private Entry[] entries; // элементы словаря

Each element is stored in a bucket, which corresponds to the remainder of the division of the hash from the key by the size of the array buckets. Note that we have possible collisions when two different keys give the same hash. In this case, we will have a linked list of elements in the bucket.

Collisions

As long as we keep the bucket sizes small (0-3 elements) and their number proportional to the number of elements (ideally, each bucket contains 0 or 1 element), we get an averaged access by , because internally we take an element from the array by its index. Two arrays are used to implement a similar structure (an array containing linked lists). buckets and entries. In the array bucket the index is the bucket number, and the value is the beginning of the linked list with elements entries.

An example of what an array might look like

As the number of elements increases, the size of arrays also increases. At each moment of time, the size of the array buckets is a prime number. And here are the simplest numbers? Now we will find out.
To determine in which bucket to put the added pair, a hash of the key is calculated inside the dictionary [1]. Then we take the remainder from dividing that hash by the size of the array buckets and put our pair in the index with the received value. This way, we guarantee that any hash result of the function will be within the size of the array buckets [2]. However, if we choose the array sizes randomly, we can distribute the load of values ​​badly, for example, many numbers have the same remainder of 2 or some other power of that wonderful number. In contrast, prime numbers by definition have only two divisors, one and the number itself, so the probability that the remainder from dividing two different numbers by a third prime number will be the same is very small. Due to this, we guarantee that with a normally working hash function, the buckets will be filled evenly.
Separately, note that in the dictionary, in order to distinguish between two objects with the same hash, the method is used when requesting the value Equals.
On this, I will finish with the analysis of the mechanism of workDictionary. I didn’t mention here how the deletion and then the insertion of elements is handled, but to understand why on average a dictionary works for this is not necessary.
Now let’s talk a little about thread safety. As long as the different threads only read data from the dictionary, it will be safe. If we want to somehow modify the dictionary, we need to use ConcurrentDictionaryor ImmutableDictionary, or own data access synchronization mechanism. An important point, if during a parallel passage through the dictionary through Parallel.ForEach (or in some other way) we somehow modify the current value, then such an operation will not be safe.
Intermediate results and recommendations:

  • The effectiveness of the dictionary depends on the quality of the hash key function. If the hash function gives many collisions, the dictionary will not save us. When overriding GetHashCode to use class objects as a key you should keep this in mind.

  • If we have a constantly growing number of elements in the dictionary, it makes sense to immediately set some simple number as the initial capacity of the dictionary. This will help to avoid the load associated with the reconstruction of arrays buckets and entries during expansion.

  • If two keys have the same hash function result, they are then compared by Equals.

  • Himself Dictionary is thread-safe as long as different threads only read data from it. In other cases, synchronization should be ensured or used ConcurrentDictionary, ImmutableDictionary.

Created by Citonaria.

As the name suggests SortedDictionary<TKey, TValue> – a collection of key-value pairs that is permanently sorted by key. The key, as in the case of a regular dictionary, must be unique. However, disagreements continue. The speed of working with elements of a sorted dictionary is equal to , where the number of elements in the dictionary, while sometimes it will be faster than for a regular dictionary. What is it connected with? With internal implementation.
Inside SortedDictionary is a binary search tree. It is no longer used here GetHashCode and fission residues. The comparison takes place through the standard IComparable<TKey> for TKey or through the passed in constructor for SortedDictionary object IComparer<TKey>. Ago SortedDictionary does not suffer from frequent collisions, which will give us a more efficient interaction than with a regular dictionary in this scenario. A binary search tree itself is a data structure for which the following statement is correct:

Each vertex has 0 to 2 descendants, all elements in the left subtree are less than or equal to the value in the parent vertex, and all elements in the right subtree are greater than the value in the parent.

There are different types of binary search trees that differ in their approach to balancing, including SortedDictionary red-black wood is used. But I will not delve into the construction of trees in this article, maybe another time.
For us, the main thing is to understand the temporal complexity of operations SortedDictionaryand what it is related to.
For a sorted dictionary, security is guaranteed only for reading data. If we want to somehow modify the sorted dictionary or any of the values, we need to implement our synchronization mechanism or block the collection entirely for the time when changes are possible.
Intermediate results and recommendations:

  • If the order of the keys in the dictionary is important to us, we should use SortedDictionary. The main thing to remember is that the complexity of operations is average.

  • If we only need a reversed output, and the rest of the time the order of the keys does not play a role, we can think in the direction of a regular dictionary.

  • If we need a collection of key-value pairs, but key hashes often collide, it makes sense to use SortedDictionary instead of the usual one. It is also important when it comes to the predictability of the complexity of operations, for a hash it depends on the hash function, and sometimes the predictability of complexity is more important to us than the speed of operations.

  • Since SortedDictionary keeps the order of the keys constant, key range selection operations can be performed faster than for a regular dictionary.

Conclusion.

After we learned how the inside works Dictionary and SortedDictionary, we can understand the specifics of working with each of these data structures. I hope you found this article useful and that you were able to better understand how and when to use these data structures.

Related posts