The history of the creation of an aggregator for freelance exchanges

The history of the creation of an aggregator for freelance exchanges

General part

Hello reader, I want to share my story on the development of a project for collecting freelance orders, currently from Russian freelance exchanges implemented in the Java language, that is, an aggregator. The development of the project https://github.com/gdevby/alert-job was started on 15.10.2022, the launched project is available at https://aj.gdev.by. This article will be of interest to the following groups of people:

  • Who wants to try to optimize receiving notifications about new orders from freelance exchanges, that is, reduce the time spent on this task

  • Who wants to benchmark their test project in the field of java spring for microservice architecture.

  • Who wants to try developing applications with microservice architecture using java spring.

  • Who wants to run such projects in docker on a separate virtual server. The article first describes the general issues, then describes the details that may be of interest to Java developers.

The article first describes the general issues, then describes the details that may be of interest to Java developers.

The idea of ​​the project arose from the task of finding a job on the freelance exchange. A search for similar projects at that time did not lead to anything, although after implementation we were able to find several. Since the daily check of orders on several exchanges, the selection according to the criteria after two months was tiring, some kind of automation was needed, so that it was possible to set up very flexibly and then review the orders that would be more likely to be suitable. On average, it took about 20-45 minutes every day. At first I made a basic core for myself, it was just a console application, it was configured through settings files, then I wanted to share it with others, that is, I needed a full-fledged web application that would be accessible to everyone using a web browser.

What functionality should this system have:

  1. Filter orders by technology

  2. Filter orders by name

  3. Filter orders by description

  4. Filter orders by price

  5. There must be exclusion words to reject orders that are not suitable for you, for example, an order with a title that contains “for training”

  6. Setting up notifications so that you don’t disturb at night and on weekends

Now let’s talk about the already ready implementation and presentation of the service and consider how we can create our filter, here the steps of creating an account on this service will be skipped, since this is a standard task.

  1. In this system, there is a concept of a module, which is the type of activity that will be used to collect the order, for example “Back or Mobile technology”. It serves to combine from several exchanges for “Back or Mobile Development” to specify keywords for this type of order

    Modules

  2. Next, we specify the sources – this is where orders will be taken from, we can specify several sources at once (freelance exchanges) and categories for them

    Sources

  3. Now we configure the “Filter”, which orders we will receive, that is, we specify keywords and words – minuses, if this word is contained in a certain place (for example, only for the description), then this order will be rejected, because you know for sure that it is not your order.

    Filters

  4. I would also like to know which orders do not come to you in order to change the filters, that is, to have a quick communication for changes in the filters

    Orders are not suitable for you

  5. Message settings have the following form. In it, you can set intervals and days when you can receive messages

    Notification system

This is the end of the general description of functionality and tasks, we will continue with a description of the development: what and how it was done. We are also adding new exchanges by request, currently 5 exchanges, although there are problems with FREELANCEHUNT.COM – not working, they
they deactivated my account, and my country is not in the list, it seems they don’t like my origin.

development

Since this project was created on a freelance basis as an open project, it was decided to take technologies that we did not work with, so that life would not seem simple. As a result, it was decided to write it in a reactive style, or at least as close as possible to it. First of all, we decided to try spring cloud, spring webflux, we also decided to run the SSE protocol https://ua.wikipedia.org/wiki/Server-sent_events, although there was not much documentation on it, but it was well suited for communication between two services for receiving new orders. At the front, they decided to try React JS. In general, a microservice architecture is not required for this type of project such as order collection, it would be enough to implement a monolithic web application. But since we have a lot of time and a very strong desire to try new things sometimes, we decided to do it, realizing that some problems may haunt us. This would require two or three times more time than if we were to use current knowledge and experience.

There were three people in our team, these are 2 back developers and one front developer, theoretically I expected that we would invest in 60 days in the worst case, since this is a project on a freelance basis, that is, without payment. We could do this project not full time, but when there is free time from other tasks. As a result, there were several critical problems, sometimes some services stopped working, so the program was fully operational only after 150 days, although the prototype was ready after 50 days of operation.

The modules that make up the application, we have broken down the microservice architecture as follows:

  1. alert-job-config-repo — repository with configuration, we store here all configs for devs and for sale

  2. alert-job-config – gives the configuration for business logic via http, when the service is started, it turns here and receives the configuration with which it will start

  3. alert-job-eureka — for registration and connection of services

  4. alert-job-gateway – provides a single entry point for requests

  5. keycloak – responsible for authorization and authentication open authenticaion 2.0 protocol (oauth 2.0)

  6. logstash – responsible for collecting logs

  7. front we used react js

  8. notification-alert-job — responsible for sending order notifications

  9. prometheus and grafana – for displaying logs, metrics, and ERROR level error alerts

  10. Docker is a technology for running containers

  11. nginx-proxy – auxiliary container for configuring domain names and ssl certificates.

  12. Parser-alert-job – responsible for parsing orders

  13. core-alert-job – the main backup for interaction with the site

Interesting extracts from services when solving problems

  1. To collect logs and analysis, we first looked at one of the popular solutions – ELK stack, but during the study, we encountered the fact that the protection of services at the Kibana level is necessary, for this a paid license is required. We found a way out of the situation — we transferred these requirements to grafana, where you can receive logs and send messages.

  2. When working with sse and webflux, after some time we stopped receiving orders from our service, this problem was repeated once. We had something similar when an error occurred at the subscription level and then the subscription did not recover, then we wrapped the problematic try catch code so that it does not kill the webflux subscription, after a restart it worked for us, we are watching

	@GetMapping(value = "/stream-sse", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
	public Flux<ServerSentEvent<List<OrderDTO>>> streamFlruEvents() {
		log.trace("subscribed on orders");
		Flux<ServerSentEvent<List<OrderDTO>>> flruFlux = Flux
				.interval(Duration.ofSeconds(parserInterval)).map(sequence -> ServerSentEvent.<List<OrderDTO>>builder()
						.id(String.valueOf(sequence)).event("periodic-flru-parse-event").data(fl.flruParser()).build())
				.doOnNext(s -> {
					int size = s.data().size();
					context.getBean(COUNTER_FLRU, Counter.class).increment(size);
				});
		...
		return Flux.merge(flruFlux, hubrFlux, freelanceRuFlux, weblancerFlux, freelancehuntOrderParcerFlux);
	}

And the subscription itself to the order

public void sseConnection() {
		ParameterizedTypeReference<ServerSentEvent<List<OrderDTO>>> type = new ParameterizedTypeReference<ServerSentEvent<List<OrderDTO>>>() {
		};
		Flux<ServerSentEvent<List<OrderDTO>>> sseConection = webClient.get().uri("http://parser:8017/api/stream-sse")
				.accept(MediaType.TEXT_EVENT_STREAM).retrieve().bodyToFlux(type)
				.doOnSubscribe(s -> log.info("trying subscribe"))
				.retryWhen(Retry.backoff(Integer.MAX_VALUE, Duration.ofSeconds(30)));
		sseConection.subscribe(event -> {
			try {
				log.trace("got elements by subscription {} size {}", event.event(), event.data().size());
				Set<AppUser> users = userRepository.findAllUsersEagerOrderModules();
				forEachOrders(users, event.data());
			} catch (Throwable ex) {
				log.error("problem with subscribe", ex);
			}
		}, error -> log.warn("failed to get orders from parser {}", error));
	}
  1. Launching in docker required the port to be specified, although we thought eureka would allow this. As a result, we contacted the server using http://host:port

  2. Elasticsearch did not start correctly for logstash with this configuration. To correct the behavior, it was necessary to change the access rights for this directory to which elastic search is mounted chmod 777 public

elasticsearch:
    restart: always
    image: elasticsearch:8.9.2
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - ES_JAVA_OPTS=-Xms256m -Xmx1024m
      - xpack.security.enabled=false
      - TZ=Europe/Moscow
    volumes:
       - $elasticsearch_directory:/usr/share/elasticsearch/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9200/?pretty"]
      interval: 10s
      timeout: 10s
      retries: 3
      start_period: 60s
  1. The keycloak problem, as there were different keycloak docker containers, tried to use quay.io/keycloak/keycloak, but when I created the config, I couldn’t save it in a config file, so then any user can build an image with the right settings, ready for deployment in a development environment here is my solution example https://keycloak.discourse.group/t/keycloak-17-docker-container-how-to-export-import-realm-import-must-be-done-on – container-startup/13619/23

  2. As for security, it was only implemented at the gateway level, which is a single point of entry. User data is passed to lower services in arguments. They wanted to sell something simple, without duplicating the code for each service. As it turned out, we will not be duplicating so much, you can create a common module and implement the main logic there. And for each service, implement a check of rights for access, that is, checking the jwt and extracting data from the token.

  3. As for security, it was only implemented at the gateway level, which is a single point of entry. User data is passed to lower services in arguments. They wanted to sell something simple, without duplicating the code for each service. As it turned out, we will not be duplicating so much, you can create a common module and implement the main logic there. And for each service, implement a check of rights for access, that is, checking the jwt and extracting data from the token.

  4. We found an interesting docker image when setting up a reverse proxy and an https certificate. These are nginx-proxy and nginx-proxy-acme. With the help of these images, you can get certificates, generate nginx proxy routing for your containers located on the same network.

  5. We described all dynamic docker parameters in .env, it turned out to be a very convenient way to set parameters for docker compose

Conclusion

I am glad that I took up such a project, because it brought practical and theoretical experience. They were also able to solve this task qualitatively. A lot of time was spent on writing and administration, support will take some time, but these are expected costs.

Books that helped implement the program:

  1. Using docker – Adrien Mouette

  2. ELK – Shukla, Kumar

  3. Spring microservices in action – Karnem, Sanchez

  4. Spring security in action – Laurentiu Spilca

  5. Practice of reactive programming in spring — Dokuka, Lozynskyi

Related posts