A line of 1.5 gigabytes

Short description

A former Java service supporter has shared his experience of dealing with significant memory consumption when preparing for load planning. The source said that his team discovered memory capacity issues at meetings which led them to develop a complex formula for calculating expected servers needed to support anticipated user demand. After identifying memory as a bottleneck, the team studied current memory use and used a script to streamline customized session analysis. One of their easy solutions was of a large 1.5GB string involving a screen’s previous configurations, which as a quick fix, was truncated for systems exceeding a specified number of characters.

A line of 1.5 gigabytes

In my previous job, I supported a Java service that provided remote UI functionality similar to RDP or Citrix. This service was arranged on the basis of sessions consisting of interconnected Java objects, which were to be cleared either after the user exits or after a specified timeout.

At the load planning stage, we discovered significant memory consumption, the reasons for which I would like to talk about in this article.


Load planning

Part of my day-to-day work with the team was planning workloads for the coming year. By analyzing usage metrics, growth patterns, and population studies, our data scientists were able to predict

how many users will we have next year

.

To determine the infrastructure needed to support the expected user base, we used an extremely complex formula:

This is how we calculated the number

servers

which we will need next year.

At one of the load planning meetings, it became clear that due to the huge popularity of the service, we are expecting a significant increase in the number of users. Our calculations have shown that we need more servers than we have to meet the demand. So the challenge was to figure out how to accommodate more users on each server to support the projected user base.

What are we limited by?

Thanks to the load measurement, we were able to identify the bottleneck of our system, which in this case was the memory. When new users were added to the server, the system began to fail under the increased load, and eventually it ran out of memory. Understanding what we are

limited by memory

was critical because it directed our efforts toward reducing memory consumption.

We study the use of memory

We calculated the approximate memory consumption of each user using the formula:

Taking numbers from the head as an example, we can get the following:

That is, each user needs approximately 300 MB of memory. To understand how to reduce this number, we performed some serious measurements of memory consumption.

To identify potential improvement opportunities, we started by analyzing the Java memory dump. At first we explored the dumps manually, but due to the large number of servers we had to develop a script to streamline the process. With this script we were able to detect memory consuming objects associated with specific sessions. By identifying such problems, we could get rid of unnecessary costs and optimize the use of memory in the system.

Perhaps in another post I will talk about the script and the analysis, but for now I would like to take a closer look at one easy victory that the memory analysis gave us.

A very long line

We started by examining thousands of memory dumps in search of large objects. The largest “whale” turned out to be a 1.5 GB line. It looked something like this:

As you can see from the image, the string contained many backslash characters. We found many similar smaller strings, but this one was the largest.

Studying the purpose of this line, I saw that we had classes that were arranged like this:

class Screen {
  //...
  private Screen previous;

  public String toJson() {
    JSONObject jo = new JSONObject();
    //...
    if (previous != null) {
      jo.put("previous", previous.toJson());
    }
    //...
    return jo.toString();
  }
}

class Session {
  //...
  String currentScreen;

  public void setUrl(Screen s) {
    currentScreen = s.toJson();
  }
}

Therefore, each screen has a previous screen visited by the user; this allows the user to go back to exactly the same screen they were on before (with state, scroll position, validation notifications, etc. preserved). Also, the user session has the current screen the user is on, so if the user reconnects to the session, they can return to the screen they were on.

There are two architectural problems here:

  1. The previous screen stack is unlimited, meaning we store more and more data until the server explodes.
  2. Performing jo.put("previous", previous.toJson());, we will convert the JSON dictionary to a string. Since JSON fields contain quotation marks, and these quotation marks must be combined with a transition sign when saving to a string, they are saved as \". This backslash must be combined with a newline when this string is stored inside another string, giving us \\\". A couple more such repetitions, and we get \\\\\\\\\\\\\\\\"

It turns out that a user with a multi-screen session was creating a String

currentScreen

Large proportions.

Resolve the problem and continue

We divided the problem into a quick and long-term solution:

A quick fix was to truncate the previous screens row if a certain number of characters were exceeded (eg 100MB). Although this solution was incomplete and could degrade the UX, it was quick to implement and easy to test, and also improved reliability (preventing the session from taking up too much space and causing the server to crash).

The long-term solution was to completely rewrite the stack solution of the previous screens: we created a separate real stack that had internal size limits and its own reporting. It took longer to write and test, and took longer to release, but it prevented wasted memory, rather than just hiding “whale” strings as yet another type of memory (ie very deep JSON objects) .

Epilogue

We continued to use the memory dump analysis tool and found other issues, but none were as easy to solve as this one.

The main takeaway from this story for me is that sometimes looking at the details of a program’s resource usage (for example, looking at a memory dump instead of just measuring memory consumed) is critical to success and has immediate benefits.

Related posts