AI

How AI Works Level 1 by Brilliant

Level 1

Intro to Language Models

  • The corpus determines the language model’s vocabulary and what words it can generate.
    • Language models all involve storing probabilities about which words might come next, given the preceding words. They calculate these probabilities based on sequences of words in the corpus.
    • If you give this model a word it didn’t see during the training process, it won’t be able to suggest what word will come next because it’s not storing any information about that word. This model only knows about words in its corpus, large or small.
  • Chatbots, predictive text, and virtual assistants all use language models.
    • Each of these models is built differently, but they all turn language into numbers and then back into language.

Predicting the Next Word

  • Language models predict the next word by assigning a probability to each possible next word.
    • The context is the word (or phrase) that the model uses to make predictions.
  • Bigram models answer the question “given this first word, what is a likely next word?”
    • The bigram model only considers the last word when making its prediction. It ignores the rest of the prompt.
    • The Markov assumption says that the probability of a future word depends only on the current word.

Larger Context Windows

  • N-gram models use n-1 words of context to predict the next word.
    • A trigram model uses two words of context to predict one word.
    • The 6-gram model needs five words of context that have appeared in that order in the corpus.
      • The predicted word is almost guaranteed to make sense, but the model is limited in its predictions, since it has to match a five word sequence.
    • How large or small the n is depends on the task.
      • Fixing a word’s spelling or grammar requires a smaller context than determining that a string of words has been plagiarized from a source.

Calculating Word Probabilities

  • In an n-gram model, predictions are based on the most likely next word.
    • To figure out what that word is, we’ll have to translate words into numbers, or probabilities.
  • The n-gram algorithm first goes through the entire corpus and for each n-1 context counts how many times (frequency) a specific word follows it.
    • For each n-gram context use the frequency to determine the percentage of time that each word appears after the context.
    • Use the percentage to determine how often a word is predicted to follow a context.
  • N-gram models cannot connect pieces of information that are separated by a lot of words.
    • N-gram models are simple models that are good at predicting the next word, but not at more complicated tasks.
    • Because of this limitation, they have been replaced by more complex models such as neural networks for a lot of tasks.

Leave a Reply

Your email address will not be published. Required fields are marked *