Andriy Karpaty. Software 2.0. The unclear software of the future

Short description

Andriy Karpaty, co-founder of OpenAI and former lead developer of Tesla’s autopilot, has returned to the research laboratory after a period spent recording educational videos on neural networks. Karpaty is known for his work in artificial intelligence and machine learning, having co-founded OpenAI and worked on the development of GPT-4 and the ChatGPT chatbot. He has also discussed the concept of Software 2.0, a form of programming reliant on neural networks that requires less human input. Karpaty predicts that in the future, neural networks will become so advanced that teaching new ones will become redundant.

Andriy Karpaty. Software 2.0. The unclear software of the future

When we talk about modern developments in the field of neural networks and machine learning, the first name that comes to mind is

Andriy Karpaty

. The young Slovak quickly became a star in this field and one of the main authorities on specific systems programming. This is the person who

taught John Carmack

in particular.

Andriy Karpaty is a co-founder of the company OpenAI (GPT-4, ChatGPT) and a leading developer of the Tesla autopilot. However, he recently left Tesla for an obvious reason: there is a possibility that humanity is on the brink of a grand discovery of unparalleled importance that will divide the history of our species into to and after singularity. We are talking about AGI, that is, AI of general purpose. If so, then there is no point in working on anything else now.

▍ Career and projects

Andrii Karpaty was born in Bratislava (Slovakia) in 1986 and emigrated to Canada at the age of 15.

In 2009, he completed his studies at the University of Toronto (Computer Science/Physics + Mathematics), in 2011 he completed his postgraduate studies at the University of British Columbia (Machine Learning for Robotics), and in 2015 he defended his thesis on neural networks. computer vision, natural language processing and their intersection.

Developed and became the main lecturer of the first course on deep learning at Stanford University. CS 231n: “Convolutional Neural Networks for Pattern Recognition.” Over the years, this class became one of the largest at Stanford, and in the first years it generally grew exponentially: 150 students in 2015, 330 in 2016, 750 in 2017.

By chance, Andriy completed three internships at Google Brain in 2011 (self-learning neural network from video materials), then at Google Research in 2013 (learning with a teacher from YouTube videos) and at DeepMind in 2015 (deep learning with reinforcement).

In 2015, he co-founded the OpenAI research laboratory, which is now known worldwide thanks to the DALL-E image generator, the GPT-4 large language model (released on March 14, 2023) and the ChatGPT chatbot that runs on this model.

The original GPT model, illustration from a scientific article dated June 11, 2018

OpenAI was originally planned as a non-profit organization that would “freely collaborate” with universities and researchers around the world. However, in 2019, a for-profit subsidiary OpenAI Limited Partnership (OpenAI LP) was registered for the non-profit corporation, with a limit of 100 times the maximum profit.

However, by that time, two co-founders had already left the organization: Ilon Musk and Andriy Karpaty (2017). The first referred to a conflict of interest in connection with the development of the Tesla autopilot, and the second went to work at Tesla as the lead developer of this autopilot, on which he worked for six years, with the prospect of creating a fully autonomous Full Self-Driving system.

After six years of hard work, Andriy went on sabbatical, resigned from Tesla, and for the past six months, he was only engaged in recording educational videos on YouTube on creating neural networks, with step-by-step instructions and fragments of the source code on Github (these videos with examples are recommended for all beginners).

As part of this educational activity, at the beginning of 2023, Andriy published NanoGPT – the simplest code (300 lines) in the open source and 300 lines to train/tune medium-sized GPTs. The code is written based on minGPT for educational purposes so that anyone can train a neural network from scratch on suitable hardware. Everything is quite clear (with step-by-step instructions on the video). In particular, the current one plays GPT-2 (124 million) with OpenWebText on a single 8XA100 40GB node in just four days of training.

In February 2023, the Carpathians


about the return of OpenAI.

“I’m joining OpenAI (again :)). Like many others in the AI ​​field and beyond, I am greatly inspired by the impact of their work and have personally benefited greatly from it. The future potential is particularly interesting; I am very pleased to join the work again and join the development,” wrote Andriy.

Thus, he again returned to the scientific laboratory, where he worked in 2015-2017.

▍ Software 2.0. Unclear software of the future

Before leaving OpenAI in 2017, Karpaty wrote an interesting article

Software 2.0

In which he talked about the use of neural networks in programming.

In his opinion, neural networks will make a fundamental shift in development, allowing the creation of fundamentally more complex software that is inaccessible to human understanding, as shown in the illustration below.

The classic Software 1.0 stack is written in plain languages ​​like Python and C++. It consists of clear instructions for a computer created by a programmer. By writing each line of code, the programmer defines a specific point in program space with some desired behavior.

In contrast, Software 2.0 will be written in much more abstract, human-unfriendly language, such as neural network weights. A person will not take an active part in writing the code, because there are a lot of weights (in typical networks millions), and it is quite difficult to code directly in the weights.

The program will look something like this (fragment):

The task of the programmer is not so much to write the code as to define a certain desired goal of the program’s behavior. For example, “satisfy a set of input and output example pairs” or “win a game of Go”. A programmer sets goals and creates a rough code skeleton (ie neural network architecture) that defines a subset of the software space to search and then use the computing resources at his disposal

search in this space of a program that works


“In the case of neural networks, we restrict the search to a continuous subset of the program space, where the search process can be made (somewhat surprisingly) efficient using backpropagation and stochastic gradient descent,” Karpaty writes.

Therefore, Software 1.0 is human-written source code (such as multiple


), which compiles into a binary that does the useful work.

In Software 2.0, the source code consists of:

  1. A set of data that defines a desired behavior.
  2. A neural network architecture that gives a rough code skeleton, but with lots of details (weights) to fill in.

In the process of training a neural network, the data set is transformed into a binary final neural network. In most practical applications, neural network architectures and learning systems will become a standardized commodity, so most “development” will take the form of “smoking, growing, massing, and cleaning labeled datasets.”

This fundamentally changes the programming paradigm by which we iteratively develop our software. According to Karpata, development teams are divided into two groups:

  • programmers 2.0 (data labeling specialists) edit and extend datasets;
  • several 1.0 programmers support and iteratively develop the surrounding learning code infrastructure, analytics, visualizations, and labeling interfaces.

Software (1.0) is eating the world, and now AI (2.0) is eating software.

We can now see the transition from 1.0 to 2.0 in many industries where neural networks are beginning to be actively used.

In a recent post, Deep Neural Networks 33 Years Ago and 33 Years Later, Karpaty extrapolated the development of neural networks from 1989 to 2055. He suggested looking at the neural networks of 1989 with tiny datasets and imagining that future researchers would also look at the neural networks of 2023. They will seem like toys and learn in one minute on a personal PC or smartphone.

Datasets will be about 10 million times larger than our childhood experiments like GPT-4 or GPT-5, which some impatient investors already compared to AGI.

Even if the architecture of neural networks remains approximately the same, increasing the number of parameters gives a qualitative result in the functioning of the models. For example, the human brain and

the brain of the Drosophila fly

are functionally different due to the large difference in the number of neurons (86 billion and 100 thousand, respectively) and


between them. At the same time, the mechanism of operation of individual neurons in humans and Drosophila is approximately the same.

Drosophila brain connectome

It is easy to calculate that the quantitative difference between the Drosophila and human brains is much smaller (860,000×) than between the neural networks of 2022 and 2055 (10,000,000×).

Andriy Karpaty believes that in the most extreme extrapolation we will not need it at all in several decades teach new neural networks:

“In 2055, we will ask the megabrain of a neural network, which has grown ten million times, to perform some task by speaking (or thinking) in its native language. And if you ask politely enough, he will comply. Yes, you can still train neural networks…but why would you need to?”


. According to

Metaculus statistics

A year ago, the weighted average user forecast for the appearance of strong AI was 2043. But in April 2022, after the news about the future GPT-4 appeared, a tectonic shift took place until 2028. At the moment, the techno-society has moved the most likely date of commissioning of AGI to May 2026. That is, we have only a few months left (37)…

All in all, technology in this field is advancing much faster than anticipated. Perhaps our contemporaries will witness the most important revolution in human history. And not the last role in this belongs to the developers of models for training neural networks, in particular Andrii Karpata.

If the plot of the movie “Terminator” ever comes true and future generations are sent to the past machine to change history, then it is Andriy who could become the main target for this machine.

Previous articles of the series “The most outstanding programmers of our time”

Telegram channel with prize draws, IT news and posts about retro games 🕹️

Related posts