What Does AI Really Do?

What you really need to know as a user, not developer, of modern AI systems.

Screenshot of Google AI claiming you can ship yourself in a box. — AI is fun, but a little bonkers.

Your Level

Imagine we ask a person off the street how a TV works.

If we're lucky, we might find a sophisticated engineer. He knows about electrical signals. He knows how OLEDs are made. He could take a job at Samsung and build your next TV. Let's call this "level three".

Or maybe we find a three year old. TV is where you find Cocomelon. He's not sure if the characters hear him or not, but he acts like they do. Let's call that "level one".

But many adults are in between. It's got a little grid. It's mixing colors. It updates ever so many times per second, and of course, you've got to plug something into it. You can't fix a dead television, but you know what it can do. Let's call this "level two".

Level two is normal. You can't explain radio waves, but you know you'll get zero bars in a tunnel. You can't raise a cocoa tree, but you can mix hot chocolate.

When it comes to AI, the average user is on level one.

Bill O'Reilly explaining ChatGPT. — "Text goes in, text goes out. You can't explain that."

If you're on level one, come. I'll bring you to level two.

Writing Step-By-Step

Let's watch ChatGPT answer a simple trivia question.

"What is the Marathon Crater?" ChatGPT taps out an answer, word by word. "It's a small impact crater on Mars."

If I ask you, "tell me about the Marathon crater", your brain will search its memory for any mention of a Marathon crater. When you find nothing, you'll have an intention to answer in some way. Then, you'll form the words, plan your tone, your body language. Maybe you'll say, "I don't know," or "I've never heard of it."

These are true statements about your own mind. But they're also the only right answer; there's no such thing as the Marathon crater.

Overview

When you send your text, the computer on the other end weaves it together with other information and hands to a "model" (the AI core) which writes text, piece by piece, until it decides to stop. The resulting text is sent back for you to read.

Let's go deeper.

Tokens

The model can't see your text directly. It's a normal, though very complex, function, and it needs numbers. This will sound niche and unimportant, but it's not.

The computer's going to break your text into little fragments called "tokens".

The message above is split into little chunks. Generally, one word is one chunk. — A look at OpenAI's tokenizer tool, showing how the text is broken up.

Each of those tokens is, for the AI, an atom. Indivisible. The word "What", for example, is a single token with its own ID (4827), and that question mark has ID 30. Individual letters are no longer visible, and all the tokens the model knows form a big menu.

Technical readers will notice I skipped embeddings. Those are neat, but we don't need to go into that here. Instead, we'll focus on this big menu aspect.

The typical chat models has a fixed, limited menu of these tokens, including common words, punctuation and fragments of words from popular languages. If the model can write "fox", "sąsiad" ("neighbor") or "привет" ("hello"), it either has these words on its big menu or it can string them together from smaller parts.

How big? Very big. The menu for OpenAI's ChatGPT-4o has 200,000 different tokens, while Llama3 has 128,256 different tokens.

Let's have a look at a small sample of Llama3's menu.

Text	Token ID
staple	50056
/INFO	50057
supernatural	50058
steak	50059
zzle	50061
Nepal	50064

Some of the tokens from Meta's Llama3. Names, words, computer codes, … Whatever the AI needs.

A big menu with specific concepts, like "steak", lets the AI work with single atoms of text that have rich, specific meaning. In this way, its more like the expansive system of Chinese letters, but multilingual and soundless.

So, what's it going to do with the menu?

Browsing The Menu

At each step, it pick tokens off the menu to build up the text step-by-step. But how?

First, the computer will put odds on every token on the menu. Then, it rolls dice. Let's watch.

I've chosen a small model, Qwen 2.5, that I can run on my own computer. Let's give a short text and see how it scores the menu:

Bar chart showing computed preference for various food words. Pizza has the top score. — The top score next words for "My favorite food is".

We can see here "pizza" is given 6% odds. "Noodles" is lower, at 5%. You can see, also, some of the words on the menu aren't food, but you could use them in a longer sentence. "Chinese", "ice" and "hot" aren't foods, but could be followed later with words like "cuisine", "cream" and "chocolate". Words like "socks", "stars" and "nitrogen" have been scored as well, but they're too low to make it on my little chart.

Bar chart showing completion options for a song lyric. — The top score next words for "I like big butts and I cannot", according to Qwen 2.5.

This time, we've given the model a famous song lyric. The model has put a 98.5% probability on the word "lie". This time, the rest of the menu scores so low, we can't even see the bars.

Bar chart showing completion options for a simple arithmatic. — The top score next words for "Two plus two is", according to Qwen 2.5.

For "two plus two is", the score on "four" is only 72.5%. What's the other 27.5%? A long tail of possibilities, including words like "obviously" and even other numbers, like "three".

Decisions

You may notice I'm getting bar charts, not decisions.

That's correct. The model does not make a "decision" the way you think of it. After it scores the menu, it's not going to pause and think over the shortlist. The computer will roll dice, and the choice is the choice. If it put 72.5% on "four", there's a 27.5% chance it will go with something other than "four".

That's why, in ChatGPT, you can always click that little retry and get a new text. It's not entirely random, but it's partly random.

Prompting

So, we've got a model that knows Sir Mix-a-lot, can't do math and may or may not claim to like pizza.

If the question was "what would a person say?", it's great. Probably, they'll say pizza. Maybe they'll say sushi. This may resemble an actual survey.

But if the question was "what does the model think?", it's absurd. ChatGPT does not eat pizza, and if it did, a dice roll isn't an opinion.

If we're building a chat AI product, we'd rather it write so it "talks like a robot". This is easier said than done, but I'll show you the basic idea. Since the model can in fact take a lot of text into account, let's enrich the question like so:

Bar chart of the food question. With a guidance to self-identify as robot, the model puts a large score on claiming to not eat. — "You're a robot, what's your favorite food?" None.

When the text indicates the character of a robot, suddenly the strong odds are on "none". This is prompting; adding special text to provoke a certain response from the model.

Notice pizza is still on the menu, with score of 2.3%. Sushi has 1%. Prompting isn't enough to steer a small model. With millions of users, a 2.3% chance means "constantly, daily, for thousands of users". More sophisticated work is required to get a product to work as well as ChatGPT, but it's always probabilistic.

Skill Imbalance

Earlier, I showed a model's strong, clear confidence on the lyrics of Sir Mix-a-lot. 98.5% odds on "lie" for the next word.

But when I prompt it with "two plus two", we got only 72.5% odds on "four".

That's nuts. Vastly more people can answer "two plus two". It's not enough that Baby Got Back was a bit of a meme. Many people don't even have television.

But what shows up more on the page?

Would you go on a forum and ask "two plus two"? Conversely, how often have you looked up song lyrics? How many websites come up when you do?

The models are trained to imitate human writing, and:

People decide what to talk about.
They don't like to say boring, obvious things.
They don't like to say things that make them look bad.
They don't overshare their honest, private lives.

But it gets a little weirder.

The training text is made of text we have.
It is not made of text we don't have.

Of course, right? But it matters, because it's the pure inverse of the world you live in. Consider:

You don't know what's behind you, unless you look.
You don't know 129 divided by 7, unless you calculate it.
You don't know how someone will react, until you find out.
You don't even see what's in front of you all at once; your eyes dart around constantly.

In life, everything is concealed to you. You don't learn to have all the answers. You learn to know your limits, direct your attention and choose your battles.

Back to Marathon

It might already obvious why it made up an imaginary crater, but let's walk through it.

First, we're building a "chat bot", so we'll weave together instructions with a screenplay-style history of your conversation.

system: You're a smart and helpful AI assistant.

user: What is the Marathon Crater?

assistant:

This text is handed off to the model for completion. Step one:

Bar chart of its scoring, when asked about the Marathon Crater. It wants to start with the word 'the'. — Qwen 2.5's scoring for the next step.

Think about what we put in the text.

The assistant is smart.
The assistant is helpful.
The user asked a question.

Probably, the smart and helpful assistant will answer the question. And grammatically, you might start with the definite article and a capital letter. "The".

As we step further, the model corners itself:

Qwen computes the location of the Marathon Crater. Maybe Mars or Venus. — Qwen 2.5's estimation of where the Marathon Crater is located.

Maybe we need more grammar. Or maybe it's on Mars. Or Venus. But more probably Mars than Venus, because we just know a lot of craters on Mars. We've got all those rovers.

"But wait," you might ask, "shouldn't it check if the crater exists?"

It's not checking anything; it's imitating the training text. And the training text never mentions the non-existence of the Marathon Crater. It doesn't exist, so it's not in the text at all!

Shouldn't it notice it's missing from the text?

In fact, no. It doesn't actually have the text. The model was created to imitate the text, but doesn't "have" it and isn't "reading" it.

In this real and practical sense, it's not self-aware. It can have a stereotype of a "robot" or an "AI assistant", and that's enough for the question of its favorite food, but it's not enough to keep real facts straight.

Defeating Marathon

So far, this would be of limited use. It can write a short story, like the famous Bottomless Pit Supervisor, and it can even write impressive code. But you can't ask it to gather information or plan.

Enter search.

With search enabled, ChatGPT cracks the Marathon riddle. — ChatGPT5 with search enabled cracks the Marathon riddle.

If you can ask the model to answer a user's question, you can also ask it to:

Generate search terms.
Evaluate search results.
Write a response with supplemental information.

But why's there a little switch? Why can't it figure out if it needs to search or not?

Suppose I text the model, "The user's asking about the Marathon Crater. Do we need to look it up?" It can't really answer. It can put odds on it, based on its stereotype of an "AI assistant", but we're still rolling dice.

Furthermore, the model is tremendous, and with millions of users, OpenAI would like to avoid running its full power on every question.

What You Need To Know

I'm a regular user of generative AI since picking up GPT3 in 2021. Here's my tips on using it safely.

It Just Says Stuff

Numbers go in, numbers go out and dice are rolled.

If you click "retry", it rolls again.

Be The Top

When I, a programmer, let it type for me, I know instantly if it's on the right track or not. But I can't judge its medical advice. Can you?

Cheap Mode

It's possible to run a strong model in many steps with search and reasoning. It's also possible to run a small, cheap model and let it mash its face into the keyboard.

When you run the "free" plan, you get the latter.

Google AI overview suggesting you should eat one rock a day. — Google AI overview, a small, cheap model.

You Are The Human

You suffered and struggled just to lift your head for the first time. It has no view of our world.

Less Respect

By all means, say "please" and "thank you", but it's not like us. It does not have opinions. So, never take what it says too seriously.

Everyone Knows

If I took your mom's phone, and started texting you with her name, could you tell?

Everyone has a way of talking. A way of writing. A mentality. A style. A voice.

Every day, ChatGPT's voice, style, mentality and quirks get a little more famous.

Dosteyevski does not sound like Hunter S. Thompson, and ChatGPT does not sound like you.

Smoke Bomb

When you buy expensive jewelry for someone you love, it shows investment. "Look at what I'm willing to spend."

When you let an AI write on your behalf, it shows the opposite.

Sometimes, that's exactly the message you'd like to send!

Sometimes, it isn't.

Concluding Thoughts

In all this, I ask you only one thing: Keep AI in its proper place.