VK tests the II model for training on independent data sets

VK tests the II model for training on independent data sets

VK is testing its own solution for training artificial intelligence, which can simultaneously train on data from different databases without their actual exchange and the risk of leaking personal data. This principle is called vertical federated learning (VML). Artem Agafonov, head of the Data Science group in the VK Predict division, told about the development of the solution, Vidomosti writes.

Now companies are forced to share their own data with each other, with service developers and the owner of the infrastructure, which is engaged in their processing, in order to train the analytical model, says Agafonov. Data security is guaranteed only by trust between participants in the learning process, he clarifies.

There is also an option in which companies train their own model, then train a meta-model based on predictions, which combines the analyzed data, Agafonov continues. “But in this case, the models do not see all the data at once, and the data itself needs to be transmitted, which is dangerous in itself,” he explained.

Predictive models make it possible to predict supply and demand or, for example, equipment breakdowns, says Agafonov. For example, the model will simultaneously be able to analyze the retailer’s data on sales and shopping center traffic. Based on the obtained analytics, the seller can predict the demand for his goods in one or another shopping center, and the shopping center can choose the right tenants. The solution may be in demand in various industries, such as fintech and industry, Agafonov added. VK is currently testing the service with several retail and development partners, and plans to sell this solution in the future.

In Russia, federated training is currently not widespread, says Nikita Nazarov, technical director of HFLabs. “If you train a model on a small number of features, it will be useless,” he explains. — Moreover, with a small amount of data in the training sample, confidentiality may be violated. But, I think, VK will have no problems with this. VK is a high-tech company with VKontakte, the most popular social network in Russia. So federated training fits well into their product profile.”

Google was one of the first to use federated learning technology to train spam filters, says Nazarov. Coogle, according to the expert, needs to train a model on the contents of mailboxes, but without revealing them, and formulate rules that can be used to detect spam. Platforms based on VML are also being developed by Amazon, IBM and Nvidia.

During the training of the VML model, it is important to take into account two points, said Andriy Komisarov, II-architect of DK “Litak”, expert of the Artificial Intelligence Alliance. First, it is necessary to find someone who also has the necessary data, and this can cause difficulties, Komisarov notes: the possession of such data may not be advertised.

Secondly, it is not enough to feed the data to the neural network, it is necessary to mark them correctly, he continues. The quality of the markup affects the quality of learning, and data owners are generally not equipped to learn. “If VK succeeds in solving these two tasks, then a rather promising solution can come out,” says Komisarov. — In general, I would say that this is more of a PR move, an attempt to cover the topic, than a real business case. Although if VK has an army of data engineers sitting in the closet, then in terms of business, such a site can also open up good prospects.”

Related posts