The task “Get the value in N url from the list” from the interview on Go / Habr

The task “Get the value in N url from the list” from the interview on Go / Habr

At the moment, I am actively looking for a new project, so I am actively going to interviews.

I decided to share my thoughts on solving a task that (I think) is often given in interviews.

Task

Write a function that accepts multiple URLs and returns the sum of the response bodies of the addresses and an error if something goes wrong.

Interested in discussing solutions?

So, we have a separate program. There are 2 sets of data – with a successful case and an unsuccessful case. Moreover, different zones are specially included in the data sets of the unsuccessful case. The data set was invented by myself, so if you think that their coverage is not complete, write in the comments.

A banal option

In the banal version (to make it work), we take and simply go through the entire data set. But the option is a worker!

// Банальный синхронный вариант

package main

import (
	"fmt"
	"io"
	"net/http"
	"time"
)

const byteInMegabyte = 1024 * 1024

func main() {

	urlsList1 := []string{
		"https://youtube.com",
		"https://ya.ru",
		"https://reddit.com",
		"https://google.com",
		"https://mail.ru",
		"https://amazon.com",
		"https://instagram.com",
		"https://wikipedia.org",
		"https://linkedin.com",
		"https://netflix.com",
	}
	urlsList2 := append(urlsList1, "https://111.321", "https://999.000")

	{
		t1 := time.Now()
		byteSum, err := requesSumm(urlsList1)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabyte), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
	fmt.Println("++++++++")
	{
		t1 := time.Now()
		byteSum, err := requesSumm(urlsList2)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabyte), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
}

func requesSumm(urlsSlv []string) (int64, error) {

	var sum int64

	client := &http.Client{
		Timeout: 10 * time.Second,
	}

	for _, v := range urlsSlv {
		resp, err := client.Get(v)
		if err != nil {
			return 0, err
		}
		defer resp.Body.Close()
		body, err := io.ReadAll(resp.Body)
		if err != nil {
			return 0, err
		}

		sum += int64(len(body))

	}
	return sum, nil
}

The execution time, as is clear from the definition, is equal to the sum of all requests.

ilia@goDevLaptop sobesi % go run httpget/v1.go
Сумма страниц в Мб=2.12, ошибка - <nil> 
Время выполнение запросов 16.01 сек. 
++++++++
Сумма страниц в Мб=0.00, ошибка - Get "https://111.321": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 
Время выполнение запросов 18.88 сек. 
ilia@goDevLaptop sobesi %

Then the obvious option for the Golang language is to connect an asynchronous call based on separate goroutines. Let’s see how the execution time will change?

// Банальный ассинхронный вариант
package main

import (
	"fmt"
	"io"
	"net/http"
	"sync"
	"time"
)

const byteInMegabytev2 = 1024 * 1024

type respSt struct {
	lenBody int64
	err     error
}

func main() {
	urlsList1 := []string{
		"https://youtube.com",
		"https://ya.ru",
		"https://reddit.com",
		"https://google.com",
		"https://mail.ru",
		"https://amazon.com",
		"https://instagram.com",
		"https://wikipedia.org",
		"https://linkedin.com",
		"https://netflix.com",
	}
	urlsList2 := append(urlsList1, "https://111.321", "https://999.000")

	{
		t1 := time.Now()
		byteSum, err := requesSummAsync(urlsList1)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev2), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
	fmt.Println("++++++++")
	{
		t1 := time.Now()
		byteSum, err := requesSummAsync(urlsList2)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev2), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
}

func requesSummAsync(urls []string) (int64, error) {
	var wg sync.WaitGroup
	ansCh := make(chan respSt, len(urls))

	client := &http.Client{
		Timeout: 10 * time.Second,
	}

	for _, url := range urls {
		wg.Add(1)
		go func(u string) {
			defer wg.Done()
			resp, err := client.Get(u)
			if err != nil {
				ansCh <- respSt{
					lenBody: 0,
					err:     err,
				}
				return
			}
			defer resp.Body.Close()

			body, err := io.ReadAll(resp.Body)
			if err != nil {
				ansCh <- respSt{
					lenBody: 0,
					err:     err,
				}
				return
			}
			ansCh <- respSt{
				lenBody: int64(len(body)),
				err:     nil,
			}
		}(url)
	}

	go func() {
		wg.Wait()
		close(ansCh)
	}()

	var sum int64
	var err error
	for bodyLen := range ansCh {
		sum += bodyLen.lenBody
		if bodyLen.err != nil {
			if err == nil {
				err = fmt.Errorf("Ошибка %v у сайта %v", bodyLen.err)
				continue
			}
			err = fmt.Errorf("Ошибка %v у сайта %v;%v", bodyLen.err, err)
		}
	}
	if err != nil {
		return 0, err
	}

	return sum, err
}

In fact, the execution time is equal to the execution of the slowest query + the time to add.

ilia@goDevLaptop sobesi % go run httpget/v2.go
Сумма страниц в Мб=2.50, ошибка - <nil> 
Время выполнение запросов 2.81 сек. 
++++++++
Сумма страниц в Мб=0.00, ошибка - Ошибка Get "https://111.321": context deadline exceeded (Client.Timeout exceeded while awaiting headers) у сайта Ошибка Get "https://999.000": dial tcp: lookup 999.000: no such host у сайта %!v(MISSING);%!v(MISSING) 
Время выполнение запросов 10.00 сек. 
ilia@goDevLaptop sobesi %

The request timeout is 10 seconds, but can we improve the speed in our task?


Let’s supplement the implementation above with a common context that will be common to all created goroutines.

// Ассинхронный вариант с контекстом
package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"net/http"
	"sync"
	"time"
)

type respStC struct {
	lenBody int64
	err     error
}

const byteInMegabytev3 = 1024 * 1024

func main() {
	urlsList1 := []string{
		"https://youtube.com",
		"https://ya.ru",
		"https://reddit.com",
		"https://google.com",
		"https://mail.ru",
		"https://amazon.com",
		"https://instagram.com",
		"https://wikipedia.org",
		"https://linkedin.com",
		"https://netflix.com",
	}
	urlsList2 := append(urlsList1, "https://111.321", "https://999.000")

	{
		t1 := time.Now()
		byteSum, err := requestSumAsyncWithCtx(urlsList1)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev3), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
	fmt.Println("++++++++")
	{
		t1 := time.Now()
		byteSum, err := requestSumAsyncWithCtx(urlsList2)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev3), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
}

func requestSumAsyncWithCtx(urls []string) (int64, error) {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	var wg sync.WaitGroup
	ansCh := make(chan respStC, len(urls))

	client := &http.Client{
		Timeout: 10 * time.Second,
	}

	for _, url := range urls {
		wg.Add(1)
		go func(u string) {
			defer wg.Done()
			req, err := http.NewRequestWithContext(ctx, "GET", u, nil)
			if err != nil {
				ansCh <- respStC{lenBody: 0, err: err}
				return
			}

			resp, err := client.Do(req)
			if err != nil {
				ansCh <- respStC{lenBody: 0, err: err}
				return
			}
			defer resp.Body.Close()

			body, err := io.ReadAll(resp.Body)
			if err != nil {
				ansCh <- respStC{lenBody: 0, err: err}
				return
			}

			ansCh <- respStC{lenBody: int64(len(body)), err: nil}
		}(url)
	}

	go func() {
		wg.Wait()
		close(ansCh)
	}()

	var sum int64
	var err error
	for bodyLen := range ansCh {
		sum += bodyLen.lenBody
		if bodyLen.err != nil && !errors.Is(bodyLen.err, context.Canceled) {
			if err != nil {
				err = fmt.Errorf("Ошибка %v у сайта %v;%v", bodyLen.err, bodyLen.lenBody, err)
			} else {
				err = fmt.Errorf("Ошибка %v у сайта %v", bodyLen.err, bodyLen.lenBody)
			}
			cancel()
		}
	}
	return sum, err
}

Now let’s look at the execution time.

ilia@goDevLaptop sobesi % go run httpget/v3.go
Сумма страниц в Мб=2.50, ошибка - <nil> 
Время выполнение запросов 2.89 сек. 
++++++++
Сумма страниц в Мб=0.00, ошибка - Ошибка Get "https://999.000": dial tcp: lookup 999.000: no such host у сайта 0 
Время выполнение запросов 0.00 сек. 
ilia@goDevLaptop sobesi %

And now it turns out that we can not wait for each request, but return an error immediately.


But real life is always a struggle against constraints, and if we are given too large a list (more available network connections) in the environment where the program is executed, then we will already get undefined behavior of both our program and the environment. Therefore, we will limit the ability of the program to create more than a given number of network connections at the same time.

For this, of course, we will use a buffered channel 😉

// Ассинхронный вариант с контекстом и пулом соединений в poolHTTPReq
package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"net/http"
	"sync"
	"time"
)

type respStCWP struct {
	lenBody int64
	err     error
}

const poolHTTPReq = 2
const byteInMegabytev4 = 1024 * 1024

func main() {
	urlsList1 := []string{
		"https://youtube.com",
		"https://ya.ru",
		"https://reddit.com",
		"https://google.com",
		"https://mail.ru",
		"https://amazon.com",
		"https://instagram.com",
		"https://wikipedia.org",
		"https://linkedin.com",
		"https://netflix.com",
	}
	urlsList2 := append(urlsList1, "https://111.321", "https://999.000")

	{
		t1 := time.Now()
		byteSum, err := requestSumAsyncWithCtxAndPool(urlsList1)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev4), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
	fmt.Println("++++++++")
	{
		t1 := time.Now()
		byteSum, err := requestSumAsyncWithCtxAndPool(urlsList2)
		fmt.Printf("Сумма страниц в Мб=%.2f, ошибка - %v \n", (float64(byteSum) / byteInMegabytev4), err)
		fmt.Printf("Время выполнение запросов %.2f сек. \n", time.Now().Sub(t1).Seconds())
	}
}

func requestSumAsyncWithCtxAndPool(urls []string) (int64, error) {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	var wg sync.WaitGroup
	ansCh := make(chan respStCWP, len(urls))
	semaphore := make(chan struct{}, poolHTTPReq)

	for _, url := range urls {
		semaphore <- struct{}{}
		wg.Add(1)
		go func(u string) {
			defer func() {
				<-semaphore
				wg.Done()
			}()

			req, err := http.NewRequestWithContext(ctx, "GET", u, nil)
			if err != nil {
				ansCh <- respStCWP{lenBody: 0, err: err}
				return
			}

			resp, err := http.DefaultClient.Do(req)
			if err != nil {
				ansCh <- respStCWP{lenBody: 0, err: err}
				return
			}
			defer resp.Body.Close()

			body, err := io.ReadAll(resp.Body)
			if err != nil {
				ansCh <- respStCWP{lenBody: 0, err: err}
				return
			}

			ansCh <- respStCWP{lenBody: int64(len(body)), err: nil}
		}(url)
	}

	go func() {
		wg.Wait()
		close(ansCh)
		close(semaphore)
	}()

	var sum int64
	var err error
	for bodyLen := range ansCh {
		sum += bodyLen.lenBody
		if bodyLen.err != nil && !errors.Is(bodyLen.err, context.Canceled) {
			if err != nil {
				err = fmt.Errorf("Ошибка %v у сайта %v;%v", bodyLen.err, bodyLen.lenBody, err)
			} else {
				err = fmt.Errorf("Ошибка %v у сайта %v", bodyLen.err, bodyLen.lenBody)
			}
			cancel()
		}
	}
	return sum, err
}

And we will get values, of course, worse than the previous version, but already closer to life.

ilia@goDevLaptop sobesi % go run httpget/v4.go
Сумма страниц в Мб=2.50, ошибка - <nil> 
Время выполнение запросов 9.05 сек. 
++++++++
Сумма страниц в Мб=2.12, ошибка - Ошибка Get "https://999.000": dial tcp: lookup 999.000: no such host у сайта 0 
Время выполнение запросов 4.29 сек. 
ilia@goDevLaptop sobesi % 

All code is naturally posted on GitHub

If you are interested in similar articles or if you have any questions, remarks, wishes – be sure to write a comment.

And also subscribe to my Telegram channel, where I publish my thoughts on everything interesting that catches my eye from the it world.

Related posts