Task.WhenAll or Parallel.ForEachAsync in C#

Task.WhenAll or Parallel.ForEachAsync in C#

Everyone wants to write code that runs fast. Often we sit and review the algorithms we have written and try to figure out what we can do to tweak their performance. In this case, parallel execution of tasks is often resorted to. Of course, if you can run tasks in parallel and do this work at the same time, then the total amount of processing time is expected to decrease.

If you quickly look at the results that appear on the Internet when looking for advice on the implementation of such things, you can see that there are many similar and different proposals from different programmers. At some point in your search, you’ll likely come across a search by idea of ​​use Task.WhenAll or Parallel.ForEachAsync.

As you read some of this stuff, you’ll see a lot of different conflicting answers both on StackOverflow and around the internet. Today I’m going to compare these two methods with some benchmarks that pit them against each other to finally find out the applicability of each of the two methods.


▍ Method of comparison

Before we move on to reviewing the benchmark code written using

of the popular NuGet package BenchmarkDotNet

it is worth noting that comparing

Task.WhenAll

and

Parallel.ForEachAsync

we will actually simulate two different types of tasks for the system.

Reading many discussions of these two methods on the Internet, you may come across a suggestion to try the methods within the paradigm IO bound vs CPU bound.

What does that even mean?

IO bound means that the tasks you wrote are waiting for some other thing to complete that is not part of your process or not running at all on your computer.

CPU bound in turn, shows another situation when the task being executed is loaded with some calculations.

Therefore, as part of the next two benchmarks that will be considered, there will be an attempt to simulate IO bound work and CPU bound work to compare the methods Task.WhenAll and Parallel.ForEachAsync.

▍ Code of benchmarks

Let’s start with an example of simulating IO bound work. There are a few points worth noting in the code below.

First, the benchmark is configured with two different settings to dynamically explore what the app will feel like.

Second, “real” IO bound work, such as accessing a third-party resource, will not occur for security purposes. In fact, it is modeled through a challenge await Task.Delay and control the configured value of the waiting delay.

[ShortRunJob]
public class BenchmarkSimulatedIo
{
    [Params(1, 10, 100)]
    public int CollectionCount;
 
    [Params(1, 10, 100, 1000)]
    public int SimulatedIoDelays;
 
    [Benchmark]
    public async Task TaskWhenAll()
    {
        var tasks = Enumerable
            .Range(0, CollectionCount)
            .Select(async _ => await Task.Delay(SimulatedIoDelays))
            .ToArray();
 
        await Task.WhenAll(tasks);
    }
 
    [Benchmark]
    public async Task ParallelForEach() =>
        await Parallel.ForEachAsync(
            Enumerable.Range(0, CollectionCount),
            cancellationToken: default,
            async (i, ct) => await Task.Delay(SimulatedIoDelays, ct));
}

Now let’s move on to another benchmark class where CPU bound work will be simulated.

As you can see, the code is very similar to what you just saw above.

Of course, the delay parameter has changed to another parameter, which will gradually increase the amount of work, increasing the load on the processor.

Accordingly, modeling CPU bound work will consist of calculating random numbers a given number of times, which will keep the thread busy.

[ShortRunJob]
public class BenchmarkSimulatedCpu
{
    private int[]? _dataSet;
    
    [Params(1, 10, 100)]
    public int CollectionCount;
 
    [Params(1000, 10_000, 100_000, 1_000_000)]
    public int CpuWorkIterations;
 
    [GlobalSetup]
    public void GlobalSetup() =>
        _dataSet = Enumerable.Range(0, CollectionCount).ToArray();
 
    [Benchmark]
    public async Task TaskWhenAll()
    {
        var tasks = _dataSet!.Select(_ =>
        {
            for (var i = 0; i < CpuWorkIterations; i++)
            {
                Random.Shared.Next();
            }
 
            return Task.CompletedTask;
        }).ToArray();
 
        await Task.WhenAll(tasks);
    }
 
    [Benchmark]
    public async Task ParallelForEach() =>
        await Parallel.ForEachAsync(
            _dataSet!,
            cancellationToken: default,
            (_, ct) =>
            {
                for (var i = 0; i < CpuWorkIterations; i++)
                {
                    Random.Shared.Next();
                }
 
                return ValueTask.CompletedTask;
            });
}

▍ Results

Let’s look at the results of the benchmarks in which the work of IO bound was simulated.

If you go sequentially, from the simplest cases with the smallest numbers to more complex ones, you will notice a segment with two ends, the difference between which is the amount of work.

In particular, we’re going to focus on execution time. This means you need to look at the column Meanwhich contains this time.

In general, the performance Task.WhenAll and Parallel.ForEachAsync goes hand in hand until the number of tasks reaches 100.

Starting at that mark, even with a delay of one millisecond Parallel.ForEachAsync behaves much slower than Task.WhenAll. Moreover, the relative slowdown is proportional to the number of CPU cores.

Now let’s move on to consider the results of benchmarks where CPU bound work was simulated. We will interact with the resulting table in a similar way.

I remind you that now, instead of the amount of the waiting delay, the amount of work expected before the CPU is loaded is displayed below.

At 10 simultaneous tasks and workload at the level 10,000 iterations Parallel.ForEachAsync begins to pull ahead by a little more than a two-fold margin. At the same time, as the load increases, the lag multiplicity increases Task.WhenAll.

▍ Conclusion

So, the obtained results turned out to be quite interesting. For some, perhaps even expected.

However, to summarize, we saw the following: Parallel.ForEachAsync performil is better in CPU bound conditions, while Task.WhenAll performed better under IO bound conditions.

What can this be related to? It seems to me that the flow scheduler is behind the scenes Parallel.ForEachAsync limits itself in terms of how much it can parallelize. Accordingly, in the case Task.WhenAll there is no throttling, and all management and scheduling of threads falls on the shoulders of the operating system.

▍ PS

Runtime for measuring article performance:

BenchmarkDotNet v0.13.12, macOS Monterey 12.3 (21E230) [Darwin 21.4.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK 8.0.100

The full code of all benchmarks is available at the link:

gist.github.com/Stepami/17ceafbfdd91259a9821fd808e3eb08f

I also run the StepOne Telegram channel, where I post a lot of interesting content about commercial development, C# and the IT world through the eyes of an expert.

Discounts, raffle results and news about the RUVDS satellite — in our Telegram channel 🚀

Related posts