Safe parallel technology. Istio / Hebrew

Safe parallel technology. Istio / Hebrew

Somehow, the office got the idea that we should think about how to parallelize the work on one microcomputer, so that the teams do not overlap with each other. There are some apis that several teams are working on. Each one works on its own feature locally and writes tests, but when the deplo comes to the stand, column formation occurs because it is necessary to merge the changes into one branch ala develop and throw it to the test. At the same time, there may be conflicts when merging the code or changing properties that are not compatible between different branches.

The mobile bank now serves 450+ Mikriks. More than 90 teams are working on it. Since we do not have code ownership in the project, each team makes changes to the mics they need. In order to avoid various complications that lead to an increase in time to market, it was necessary to separate the development of separate teams so that they do not influence each other and can work in parallel.

Problem

Our test stand is quite fat and we did not want to create a new stand next to it. Since this is another article in the budget, plus it must be supported, which also requires a lot of effort. Due to constant changes, the test bench periodically felt bad, which directly affects business processes.

At the same time, we live in K8S together with Istio. In the article “The practical magic of Istio when building the architecture of large microservice systems”, colleagues have already written about Istio and why it was chosen. You can familiarize yourself with it to understand what kind of beast it is.

Most importantly, Istio has a powerful request routing mechanism that can help us make parallel development and testing safer.

How it was done

Several options were considered, such as the organization of a queue, a separate release cluster or a cluster (namespace) of each team. But with such a large number of teams, a huge amount of resources will be required, and the complexity of deployment and support increases.

We settled on one decision, which we named Feature Branches. In part, we have already applied this decision to the fronts that developed the web. It had to be refined and scaled for development under Andriod, iOS, as well as for the back.

Before we get into the technical details, a bit of terminology:

  • Feature instance: the service instance for which the routing rules are configured

  • Feature name: feature name, feature for routing in feature instance.

  • Feature branch: a set of feature instances united by one feature name.

  • Master branch: standard branch, set of release versions.

The implementation of the solution may vary, but we use Kubernetes and Istio. In Istio, there are separate resources that are responsible for routing – VirtualService (VS) and DestinationRule (DR). Each feature creates its own DR, and VS routes traffic between them. For example, the DR for the demo-api for the master branch looks like this:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
...
spec:
host: demo-api.default.svc.cluster.local
  subsets:
    - labels:
        app.kubernetes.io/instance: demo-api
      name: default

The default subset that leads to the deployed master branch is specified here.

At the same time, for the DR feature:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
...
spec:
  host: demo-api.default.svc.cluster.local
  subsets:
    - labels:
        app.kubernetes.io/instance: demo-api.feature-31337
      name: feature-31337

The name of the feature is set in the subset, and it also enters the label through a dot. VS and DR work in pairs. For our VS service, it will look like this:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
...
spec:
  hosts:
    - demo-api.default.svc.cluster.local
  http:
    - match:
        - headers:
            X-FEATURE-NAME:
              exact: feature-31337
      route:
        - destination:
            host: demo-api.default.svc.cluster.local
            subset: feature-31337
    - route:
        - destination:
            host: demo-api.default.svc.cluster.local
            subset: default

This is the picture. To understand where to redirect the request, Istio first looks in VS, and then, based on the matching rules, selects the desired subset, which is supplied in DR and leads to the desired service.

And now we have come to what the routing of HTTP requests is based on, namely the presence of an HTTP header is taken into account X-FEATURE-NAME. Since in VS, in the match section, the presence of this header in the request is checked for an exact (exact) match of its value with feature-31337then the request will fly into the pod with a label app.kubernetes.io/instance: demo-api.feature-31337.

Services can be combined into a chain if they are deployed with the same feature name. At the same time, if there is no given feature in the call chain in the service, the request will fly to the default subset because no matching section in VS will work.

Untitled

This configuration of Istio is not only for each back, but also for the microfront. Frontends do not use requests with the X-FEATURE-NAME header directly. They have the ability to deploy features under a dedicated URL. Example, feature-31337.demo.net.

Such a URL is convenient to share with testers or designers to show your current work. In this case, the request arrives at the service with configured Spring Cloud Gateway. GW parses the url and finds the name of the feature there, which it then puts in the header X-FEATURE-NAME and redirects the request to the required microfront:

spring:
  cloud:
    gateway:
      routes:
        - id: ignored
          uri: http://demo-ui
          order: 1
          predicates:
            - Host={branch}.demo.net
            - Path=/demo-ui/**
          filters:
            - SetRequestHeader=X-FEATURE-NAME, {branch}
            - RewritePath=/demo-ui/(?<segment>.*), /$\\{segment}

This is how the branch feature is arranged and works.

The branch deployment process is also automated. The bank uses a self-written CI/CD platform that, among other things, deploys artifacts to the environment. The platform is integrated with Bitbucket using hooks. That is, it knows when the push to the repository took place.

After receiving the push event, the Platform looks at the commit message in search of keywords, and if it finds a message (deploy_feature), it starts the artifact assembly flow and deploys it in K8S to the test environment. At the same time, the developer does not need to monitor the state of assembly and feature deployment, as a specially trained bot notifies him in the messenger about a successful or failed assembly.

Let’s briefly summarize what we got:

No. 1. The implementation of the branches feature is based on the routing rules provided by Istio.

No. 2. To earn a feature instance, you need:

  • Create a DestinationRule that contains a subset for selecting a label service.

  • Create a VirtualService that, based on the HTTP header, will redirect the request to the desired subset.

No. 3. To access a feature instance, you need:

  • Use the title on the back X-FEATURE-NAME in requests.

  • On the frontend, you can use a dedicated url by type feature-31337.demo.netif you configure Spring Cloud Gateway in advance as shown above.

Conclusion

After applying the branches feature, it is no longer necessary to create a general branch in Git and deploy it to the master branch, which greatly increased the convenience of development and the stability of the test environment.

Direct deployment to the master branch is prohibitedAnd it is updated only when the artifact is rolled out to production, since deployment and testing on the environment is a mandatory step.

Teams quickly adapted to the new work flow and began producing hundreds of feature instances. In order not to overwhelm the cluster, separate nodes were selected under them, and a job was also written, which is responsible for cleaning the features of the branches. It is also deployed in the Kuba cluster and is launched by crown.

Few numbers. As I said above, there are currently 90+ teams working on the project. In this feature, the total number of branches exceeds 300 units. Also, a survey was conducted among developers about the convenience of working/using the branches feature. One of the questions was asked to give a subjective assessment on a five-point scale. The average score was 4.3, which is very good. More than half of the engineers took part in the survey.

After implementing the work branches feature, the test environment started to work much more stably. We always have among the release versions of artifacts that are tested and work stably. That is, now the team can safely deploy a version of the artifact without fear of breaking adjacent services. If any error occurs, it only occurs in this snapshot version, allowing it to be localized.

Similarly, snapshot branches allow you to debug the service remotely, again, because adjacent apis will not be affected.

There is also a spoonful of tar in our solution. Using the branches feature increased the load on Jenkins, which began to collect more snapshot artifacts. Also, this approach does not completely solve the problem with external systems, plus the complexity of development has increased slightly.

At the moment, the mechanism of the branch feature is implemented only for APIs, but at the same time, we still have bases, queues and caches that have not yet been expanded. There is a task. We will think about how to make it beautiful and at the same time so that it is convenient to use.

Related posts