How AI Works Level 1 by Brilliant

Level 1

The corpus determines the language model’s vocabulary and what words it can generate.
- Language models all involve storing probabilities about which words might come next, given the preceding words. They calculate these probabilities based on sequences of words in the corpus.
- If you give this model a word it didn’t see during the training process, it won’t be able to suggest what word will come next because it’s not storing any information about that word. This model only knows about words in its corpus, large or small.
Chatbots, predictive text, and virtual assistants all use language models.
- Each of these models is built differently, but they all turn language into numbers and then back into language.

Language models predict the next word by assigning a probability to each possible next word.
- The context is the word (or phrase) that the model uses to make predictions.
Bigram models answer the question “given this first word, what is a likely next word?”
- The bigram model only considers the last word when making its prediction. It ignores the rest of the prompt.
- The Markov assumption says that the probability of a future word depends only on the current word.

In an n-gram model, predictions are based on the most likely next word.
- To figure out what that word is, we’ll have to translate words into numbers, or probabilities.
The n-gram algorithm first goes through the entire corpus and for each n-1 context counts how many times (frequency) a specific word follows it.
- For each n-gram context use the frequency to determine the percentage of time that each word appears after the context.
- Use the percentage to determine how often a word is predicted to follow a context.
N-gram models cannot connect pieces of information that are separated by a lot of words.
- N-gram models are simple models that are good at predicting the next word, but not at more complicated tasks.
- Because of this limitation, they have been replaced by more complex models such as neural networks for a lot of tasks.