The saga of the agent or how LLM can help in development on Jmix
Hello everyone! In recent years, large language models (LLM) have become extremely popular in solving a variety of tasks, starting from classic document retrieval and ending with the analysis of financial news for decision-making. In this article, we’ll show you how we applied these technologies to create an intelligent assistant ready to answer your Jmix questions and help you write your code.
Contents
What is Jmix AI Assistant?
Jmix AI Assistant is an LLM-based agent that can significantly speed up your development on Jmix thanks to the use of four tools: search for the latest version of the documentation, UI Samples, training materials, as well as a search part of the forum, where the assistant can find answers to specific questions.
You can test the work of the assistant here.
First experiments
When we started designing the assistant, the use of agents seemed optional in this task; we felt that the standard RAG approach should suffice. As LLM we took GPT-4 who already had some knowledge of Jmix and Java development.
We loaded several pages from the documentation into the Chroma vector database, and the model responded well to simple questions. However, we quickly realized that the model lacked context to answer the larger questions, causing it to often hallucinate and invent the answer. To solve this problem, we turned to the LangChain framework: in order not to break the sentence in half, we used RecursiveCharacterTextSplitter, and as the type of retriever we chose ParentDocumentRetriever, which allowed us to search in small chunks, and return the entire text from the documentation page (or more), which greatly increased the quality of answers.
Then we decided to check how well the agent can write code, but immediately faced the fact that often one request to the retriever is not enough. In addition, in addition to the main documentation, we also wanted to provide the model with additional data that would contain relevant code snippets, and we did not want to add all the information to one retriever. So we came to the conclusion that it is necessary to try to apply an agent approach in this task.
What is an LLM agent?
In fact, an agent is an abstraction, which means that a large language model is able to somehow interact with the environment and receive additional information from it to solve the task. Today, there are many types of LLM agents with different functionality, starting with Chain-of-Thought and ending with OpenAI Tools Agent.
We started by testing whether the agent could use one of the 3 tools (or data sources) offered to him to provide a ready answer. Then we used OpenAI Functions built into LangChain and it really worked, but only for simple questions. This approach had a significant disadvantage — the impossibility of building a logical chain that would allow the model to interpret the information received from the tools in order to then use the next one of them.
We considered different options of agent systems, such as Chain-of-Thought and its logical continuation Tree-of-Thought, which allow to fully analyze the user’s request and break it into logical pieces. The ability to build coherent reasoning was something that was missing when using OpenAI Functions, but the important tool calling functionality was missing. And then we discovered a new type of agents – ReAct, which we will talk about in the next paragraph.
ReAct LLM-agent
One of the main features of the ReAct agent is its ability not only to think, as it is done in Chain of Thoughts, but also to actively use the available tools. After forming an initial plan based on internal considerations, ReAct proceeds to perform specific actions to obtain information and/or take action. This can be a database query, an API call, or any other action that can produce the desired result.
After each performed action, the ReAct agent analyzes the results and receives feedback from the environment. This information helps the agent adjust his next steps and decide whether the information he has is sufficient to respond. This approach allows ReAct to more flexibly respond to user requests and independently decide which tool to use next.
We turn an abstract agent into a Jmix AI Assistant
Once we decided on the type of agent, we needed to figure out what data we could use to help the agent with the response. It was decided to expand the information in the retriever to all pages of the documentation, as well as to create two additional retrievers that would perform the functions of searching for UI Samples, as well as a section of the forum that contained answers to real questions from users that were not answered in the documentation .
In addition, the idea was to create a hyped multi-agent system where the main ReAct agent would request information from LLMs, who would have access to retrievers containing the required information. This approach proved to be quite good – the assistant began to give believable answers to complex questions like: right with the ability to mark progress?”. However, if you dig a little deeper, it turns out that the models inside the tools lacked the context of the original task, so their response to the main agent could be irrelevant, even though the relevant information was contained in the retriever. In addition, there was another big problem: the price of such requests. Although it seems that one request to GPT-4 is inexpensive, when there are four agents working in the system, and each requires payment for both input and output tokens, it becomes a bit painful.
In the end, it was decided to leave only retrievers in the tools, and in the absence of the information sought in them, to issue a prompt message for the agent with a request to reformulate the question or use another tool.
According to the results of internal testing, many developers noted that due to the “thoughts” system, the assistant is good at suggesting the right direction of the idea for solving the task and thereby helps, even if it cannot correctly form the final answer.
We decided to use Streamlit as a shell for interacting with the agent, as it allowed us to demonstrate all the agent’s capabilities in a short time, including showing the “thoughts” and the results of the tools in drop-down blocks, thanks to the StreamlitCallbackHandler available in LangChain.
Preparing Jmix AI Assistant for release
As part of testing our assistant, we collected a lot of feedback and made a number of improvements to our assistant:
-
Switched to a more advanced GPT-4o model;
-
Updated the knowledge base of Jmix AI Assistant, added materials from trainings, as well as more topics from the forum;
-
Added reformulation of the user request in several ways to improve the quality of vector search;
-
Analyzed user feedback and improved the agent’s system instructions based on them;
-
Increased the stability of the agent’s responses due to the transition to a more modern framework for working with LangGraph agents;
-
Reduced the probability of the agent’s response in other languages that do not correspond to the language of the user’s request;
-
Added conversation context memory;
-
Switched from Streamlit to your Jmix front end:
All this allowed us to take Jmix AI Assistant to a new level and make it a really effective assistant in solving everyday tasks related to development on Jmix, but we do not plan to stop there.
Further plans for service development
We already see great potential for further use in Jmix AI Assistant and of course we plan to develop it. During the development process, we encountered many problems, some of which touched on this article, and now we would like to tell you what we will be working on in the coming months.
First of all, we would like to reduce the percentage of cases where the requested information is in the retrievers, but for one reason or another does not reach the agent. Currently, the limited context length in LLM does not allow us to provide full versions of all found material, so we want to learn how to carefully filter the information we receive from retrievers. This can be helped by the use of a reranker model, which will be able to more accurately rerank the source documents from the retriever before submitting the LLM. Then it will be possible to submit a smaller number of documents, as we will be more sure of their relevance.
Another idea is to move the ability to view the full text of the documentation page into a separate tool, which will allow returning a larger number of document chunks with links to the source, which the agent can use to obtain complete information.
Of course, one cannot but touch on the importance of using good embedding models when using retrievers. We currently use OpenAI’s text-embedding-3 family of models to build our embeddings, but we are actively experimenting with other models that can increase the level of relevant responses.
Another potential improvement is the parallel use of several retrievers of different natures to increase the diversity in the output of retrievers. For example, in parallel with the standard retriever, you can use BM25 – a probabilistic ranking algorithm used to determine the relevance of documents to a search query, which is a logical extension of TF-IDF. This will allow the system to better search for those words and phrases that were rarely encountered in the training sample of embedding models.
These are just some of our ideas for further improving the service, but hopefully you’ve also picked up a couple of promising ideas for your projects.
Conclusion
The development of Jmix AI Assistant is an example of how modern technologies can radically transform the software development process. The use of LLM-agents allows not only to speed up the writing of code and the search for documentation, but also to significantly increase the speed of learning new tools for developers, such as Jmix.
However, the development path of Jmix AI Assistant is far from over. There are many interesting tasks ahead of us, but it is already clear that the application of artificial intelligence in software engineering is an opportunity to accelerate development, devoting more time to design, rather than reading documentation and writing code.