From Calculators to ChatGPT

AI has existed for a long time and is more than just LLMs. Time for a little landscape review! How did we get here, and what is different now?

It is hard to argue that AI is older than the computer, so that I won't do. Yet, as soon as we invented calculators, researchers started fantasizing about more general thinking machines. There were other approaches. What were they?

First off, it wasn't obvious that computers could learn, so some of the other approaches to create computer intelligence involved creating general principles for thinking. Outside the box of Machine Learning, we can find the disciplines 'symbolic reasoning,' 'cognitive architectures' (looking at you, ACT-R, Clarion), and 'cellular automata' among others.

Inside the box of Machine Learning, we can find many statistical methods. Those who have taken statistics in university will undoubtedly have had to estimate the weight of a mouse based on its length. In this box, we can find prediction methods like linear regression and clustering methods like k-means clustering and support vector machines. The single-layer neural network lives here, too (by virtue of not being 'deep').

So what is 'Deep Learning' then? Out of all machine learning methods, neural networks proved the most flexible and versatile. Yet, a single layer of neurons often fell short in making precise predictions. Researchers found that if they stacked many layers of neurons after one another, they could get more accurate predictions, and this method generalized very well.

It generalized so well that it could be used for various practical purposes, including computer vision (CV), natural language processing (NLP), and it even birthed the field of reinforcement learning (RL) (used for many autonomous decision-making applications, for example, playing Chess or Go). This method only has one drawback: it takes a long time and a lot of data to train these neural network prediction models.

That limitation was overcome in 2017. In the paper 'All You Need Is Attention,' researchers from Google proposed a neural network architecture called 'transformer' that allowed these computations to be parallelized. Imagine being able to read 100 books simultaneously. If you could do that, it would have taken you much less time to sit through your high school education.

The breakthrough in training neural nets made it feasible to create much larger neural networks, 100+ layers deep, with 10+ million neurons per layer (resulting in billions of parameters), and train them on even larger datasets, with up to trillions of examples.

But where would you find such a dataset? Well, the internet is pretty large... so that is what OpenAI did. They decided to train a transformer neural network on language data from the entire internet. This yielded a machine that, given an unfinished sentence, could predict the next word. A Large (many neurons) Language Model (predicting language usage).

Nice. but we don't want a next-word predictor. We want an assistant that does our work for us. With further training (fine-tuning), OpenAI fixed that and released it into the world as ChatGPT.

The reason why this resulted in such a shock is that AI applications were mostly limited to the domain of numbers and quantitative analysis. Therefore, it was mostly useful to people who are good with math, and have access to large datasets of numbers. This is a large, but not all-encompassing market.

Yet, everyone uses language, everywhere, on a daily basis. Language is the operating system of humanity. We have made our operating system compatible with the operating system of computers. We can tell them what to do without hardly any friction. That IS a big deal and opens up a wealth of possibilities and applications for everyone.

Especially when we teach it to use tools like calculators.

Recent Blogs posted from our team

Build Better Workflows
in Just 5 Minutes