xAI discovered the scales and architecture of the Grok-1 language model

xAI discovered the scales and architecture of the Grok-1 language model

Illustration from Midjourney

Elon Musk’s startup opened Sources of the Grok-1 language model under the Apache 2.0 license. In the form of a torrent, they offer to download 300 GiB files with MoE-model weights from 314 billion parameters.

In recent months, OpenAI has attracted a significant share of attention in the field of artificial intelligence. This company was born in December 2015 as a research organization, which was founded by many prominent people in the field of high technologies. Among them was Elon Musk.

In 2018, Musk left OpenAI due to disagreements with the goal. He will be later to argue that the organization violated the mission for which it was founded. The specific details of this conflict have been clarified in recent weeks.

However, Elon did not stop trying to do II. In April last year, the businessman promised to create a startup to make a big language model (BLM) without biases. A month before that, Musk founded X Corp., which he renamed Twitter. Similarly, he called the new startup xAI, although he made it separate from X Corp. structure

In November 2023, the xAI startup introduced the Grok BAM. At first, the chatbot could be used only by selected people, who were personally invited by Musk. Access was then given to subscribers of X Premium+, the more expensive ($16 per month or $168 per year instead of $8 or $84 for Premium) subscription to the X microblogging service.

Last Monday, March 11, 2024, Musk promised bring Grok to Open Source. The specific day was not named, the laconic tweet indicated only this week.

Considering that the week in the USA starts on Sunday, Elon did not keep his promise. Only six days later, on Sunday, March 17, at 10:12 p.m. Moscow time, on the microblog of the project appeared record “░W░E░I░G░H░T░S░I░N░B░I░O░”.

The text of the tweet is a joke about rampant spam with pornography, which users of X have been complaining about for several weeks. Bots to attract traffic spread calls to look at the profile description, which contains a link to some fraudulent site. In the case of a link, it leads to the files for launching BYAM.

The distribution style of the model is also reminiscent of something. Prior to this, the Mistral AI startup was remembered by the AI ​​community for model releases in the form of tweets (1, 2) with magnet links to the torrent with files. At the same time, Mistral AI did not explain anything: there were no press releases, no statements about performance in benchmarks, or even an explanation of what was inside.

The contents of the file RELEASE

In the case of xAI, there is still a small description. It is stated that Grok-1 is a model of the mixture of experts type (8 experts, where 2 are active) with 314 billion parameters, where 86 billion are active parameters. This BAM was trained by xAI from scratch. The published model is basic, it does not have fine-tuning for any specific task.

Since there are 314 billion parameters in the model, Musk’s eccentric style would have required the files to be released last Thursday. The fact is that the date March 14 in the American tradition is written as 3/14, which is why it is called the day of numbers. It is likely that Elon wanted to do so, and xAI specialists were simply late for the round date.

Almost 300 GiB files with Grok-1 scales are distributed via torrent file on the Academic Torrents site or by magnet link. Instructions for running BYAM are available at github.com/xai-org/grok-1 and at Hugging Face. It is clear that a model of such a huge size will require a significant amount of video cards for inference.

xAI writes that the model was trained on a large amount of text data. Data sources for Grok-1 training are not specified.

Both the published code and the Grok-1 scales are licensed under Apache 2.0. This means that derivative works can be distributed under a different license and even made into a proprietary commercial product.

In fact, this means that from now on, the creation of a competitor to OpenAI and Anthropic could benefit from retraining the Grok-1 model. Not having to create your own pre-training could potentially save you millions of dollars. Such an assumption expressed machine learning specialist Andriy Burkov.

Perhaps this is what xAI is counting on: the release of the model is supported by the wish “Happy coding!”.

Related posts