How the code that sleeps the moon works

How the code that sleeps the moon works

IN

first part

in this small series of articles, we talked about the fact that the mechanism of durable execution (durable execution) saves the state of the program in the log, as well as the related complications in case of updating the service code, which lead to the loss of relevance of the log. We have seen that limiting the execution time of the handler greatly alleviates this problem. But… doesn’t this lead to the loss of one of the most interesting properties of continuous execution – the ability to create business processes that work with long pauses? At Restate, we believe that there is nothing to lose by using the right primitives.

However, if you like to write code with long wait times because it fits well with the thinking model, then Restate will help you implement it to the fullest. If you value robust execution but are skeptical of long-running handlers and their versioning issues, there is a solution. Below are several ways to achieve the same properties by adding persistent messaging and state to this mechanism.

▍ Suspending the handler

A simple task like sending your users an email a month after they sign up can often turn out to be surprisingly difficult. Obviously, you cannot do this while the registration request is about to complete. Apparently, here you need to write your intention in a database or queue and come back to it later. And for this, you need to create a cronjob or consumer. In fact, you just need to call the postal service, but tell them to “do it in a month”.

Tools like Temporal and Durable Functions provide you with the handler suspension abstraction. It allows you to create handlers that will perform scheduled work after, for example, a month of waiting. That is, in the case of sending a letter described above, you will simply need to run the handler during the registration request, accompanying it with a delay parameter. As a result, if the execution starts, the planned letter will be sent in a month. In Restate, this template will look like this:

const emailService = restate.router({
  email: async (ctx: restate.RpcContext, request: { email: string, delay: number }) => {
    // добавление новых шагов до этапа засыпания создаст сложности
    await ctx.sleep(request.delay)
    await ctx.sideEffect(() => ses.sendEmail(...))
  }
});
const emailApi: restate.ServiceApi<typeof emailService> = { path: "email" };

Having an active request for a month is not an ideal solution. In this case, if you want to change the mail handler code before the call

ctx.sleep

– for example, to add email validation – you need to keep the old version until the end of the month, until the existing requests in it are completed, because they will fail in the new version. This means that any functionality that is added to the command

sleep

, will not have full effect until the end of the month. In addition, it will be very difficult to apply urgent security patches because they will need to be installed for all old versions that may be invoked.

What property do we really need here? As already mentioned, we just need to send a request to the postal service, which should be processed in a month. Then the “active” pending item will not be a log of a partially completed workflow, possibly with many steps, but a single request message. And it is much easier to version the request. It is enough to enter new parameters as optional without deleting the existing ones. Protocol Buffers can also be used to implement backward compatibility checks. It will look like this:

const emailService = restate.router({
  delayedEmail: async (ctx: restate.RpcContext, request: { email: string, delay: number }) => {
    // добавление сюда новых шагов не повлияет на активные запросы
    ctx.sendDelayed(emailApi, request.delay).send({ email: request.email })
  }
  // недолго выполняющийся обработчик
  email: async (ctx: restate.RpcContext, request: { email: string }) => {
    await ctx.sideEffect(() => ses.sendEmail(...))
  }
});
const emailApi: restate.ServiceApi<typeof emailService> = { path: "email" };

The trick is to make long pauses not within calls, but between versioning requests as needed. We’ve moved from a deprecatable handler abstraction to a deprecatable RPC abstraction. Restate plays the role of a persistent event bus as well as a persistent executor.

▍ Management cycles

But what if you need to write a workflow for a control loop that involves executing a set of tasks every hour, perhaps indefinitely (or until some story limit is reached)? Many use workflows this way. This makes long-lived control cycles easier to understand and debug. In this case, versioning requires you to stop all cycles, update and restart them. But this can become a very time-consuming task with thousands of active loops. In code it looks like this:

const controlService = restate.router({
  loop: async (ctx: restate.RpcContext) => {
    // настройка до запуска цикла - не должна изменяться во время выполнения запросов
    let state: State = { ... }

    while (true) {
      // работа цикла - не должна изменяться во время выполнения запросов
      state = await ctx.sideEffect(() => mutateState(state))

      if (endCondition) {
        return
      }

      await ctx.sleep(1000 * 3600) // 1 hour
    }
  }
});
const controlApi: restate.ServiceApi<typeof controlService> = { path: "control" };

How to solve this problem without using infinitely long handlers? You can use the same principle: take long pauses between calls. In practice, this will simply be a tail recursion:

const controlService = restate.router({
  setup: async (ctx: restate.RpcContext) => {
    // настройка до цикла
    const state = { ... }

    ctx.send(controlApi).loop(state)
  },
  loop: async (ctx: restate.RpcContext, state: State) => {
    // работа цикла
    const nextState = await ctx.sideEffect(() => mutateState(state))

    if (!endCondition) {
      // планирование очередной итерации
      ctx.sendDelayed(controlApi, 1000 * 3600).loop(nextState) // 1 час
    }
  },
});
const controlApi: restate.ServiceApi<typeof controlService> = { path: "control" };

Here, again, we don’t need to version much — just the body of the request, in this case the state of the loop, which should maintain consistency between iterations.

▍ Virtual objects

Another typical case of persistent handlers is their use to “own” some complex state that is reproduced in memory from the log each time the code is executed. This mechanism mimics how Cloudflare’s Durable Objects work. Essentially, we use the specific call log as a write-only log. For example, you can use a workflow to store the state of a chess game. This template isn’t well-developed here at Restate (although you can write a library for it yourself), and we’ll explain why below. Now let’s just imagine that we have a primitive

stream

similar to

signal

Temporal, which allows active processors to pull information from other processors.

const chessService = restate.router({
  game: async (ctx: restate.RpcContext, gameID: string) => {
    let boardState = { ... }

    // ctx.stream не существует!
    for await (const move of ctx.stream("moves").receive()) {
      if (verifyMove(boardState, move)) {
        boardState = applyMove(boardState, move)
      }
    }
  },
  move: async (ctx: restate.RpcContext, move: Move) => {
    // ctx.stream не существует!
    ctx.stream("moves").send(move)
  },
});
const chessApi: restate.ServiceApi<typeof chessService> = { path: "chess" };

Let me remind you that this handler only outwardly seems long – the runtime can suspend it after any completed move, which allows it to work in serverless systems. How are previous moves actually saved? When the handler resumes upon receiving a new player move, it replays all previously received moves, rechecks them, and builds the state of the board until it is ready to process the current move. And this scheme is not very successful. Assuming that

verifyMove

has complexity O(1), we needlessly perform an O(N) operation for each move. The fact is that there is no information about the state of the handler at the time of suspension among the executions. She knows how to recreate it from scratch. And this is a typical problem in event-driven approaches. Essentially, we use an endlessly growing log of handler actions as a database. A smart decision, but it hurts performance, making the termination mechanism more expensive.

That’s why Restate exposes a database of key-value pairs so that calls to a particular key can interact with future calls to that key using state:

const chessService = restate.keyedRouter({
  move: async (ctx: restate.RpcContext, gameID: string, move: Move) => {
    const boardState = await ctx.get<BoardState>("boardState")
    if !(verifyMove(boardState, move)) {
      throw new TerminalError(`Invalid move ${move}`)
    }
    ctx.set("boardState", applyMove(boardState, move))
  },
});
const chessApi: restate.ServiceApi<typeof chessService> = { path: "chess" };

So instead of threads sending messages to idle handlers that accumulate state in the log, we can simply send RPCs to services that accumulate Restate state. This greatly simplifies everything. For services, RPC is the standard interaction mechanism, and explicit state is the standard way of remembering what’s happening. In a general sense, this is the same principle as in typical services that store state in persistent storage.

But, wait… Why can’t we just use any key-value store for this? Many write workflows that interact with Postgres, for example. The fact is that then we introduce the problem of double entry. You need to keep the storage and the execution process in sync, even in the event of failures. Only persistent performance will not save you. You will need to implement some transaction semantics and storage locking, and integrate it properly. In the case of Restate, you get a repository that is guaranteed to be updated according to the progress of the processor’s execution, maintaining consistency even in the face of crashes, race states, and network splits.

▍ Conclusion

We’re not yet sure if all instances of long-running handlers will be able to cover these techniques, but we believe that many will. And if we can create mostly short-lived handlers, then versioning can be implemented via immutable deployments, where old versions are kept until their calls complete, which would typically only be a few minutes.

But here we are not categorical. Sometimes a long handler is actually a convenient solution, and related upgrade issues can sometimes be resolved with a version-aware control flow1 or by restarting the handlers in case of changes.

▍ Note

1. It is possible to update active workflows by adding branches of new versions to the code, but we believe that this approach should be used only in extreme cases, as it greatly complicates further maintenance of the code.

Telegram channel with discounts, prize draws and IT news 💻

Related posts