Garden Path Sentences II
This was my second (yet to fail) attempt at studying how LLMs process Garden Path Sentences. You can read the previous post for the first failed attempt.
After my first attempt failed, I decided to go back to the drawing board and try to look for something easier. Instead of going for “breaking the translation”, which is quite a complex and high level goal, I focused on smaller, more easily to compute goals.
Simple introduction to LLMs and autoregressivity
An LLM is trained to read and write only using a specific vocabulary of tokens: text is converted into a list of ordered tokens (that can be converted back into the original text), and fed into the LLM. The input is then transformed using the internal state of the LLM and the output is a list of logits, values that represent scores, one for each possible token in the dictionary.
Even if both input and output are lists of numbers, they are sort of perpendicular: the input is “horizontal”, it’s a transformation of the input text, while the output list is “vertical”, it’s a representation of how much each possible token of the dictionary the LLM thinks should be used as the next one in the input list.
The score though is a number that can get any value, and to choose a single token among the whole dictionary, following what the LLM suggests to be the right choice, we might want to switch from a list of numbers to a list of probabilities.
To do so, we use softmax: for each logit we compute the exponential (e^logit), and then we divide it for the sum of all the exponentials of the logits.
The resulting values have 2 very important features: they all are between 0 and 1, and they all sum up to 1. This makes them a great proxy for a probability.
Since now we have a list of probabilities, we can simply choose the most probable token according to the LLM, and output that. Then this output is added as the last element of the input list and the new input list is fed again to the LLM (unless the chosen token is a special one that represents the end of the answer). This makes an LLM autoregressive: its output becomes part of its input.
Surprisal
Now that we know how an LLM works with probabilities, we can use the probabilities to compute how much an LLM “expects” a specific word, given a context. And we compute it with a different proxy, the log-probability, which is exactly what it seems: we take the log of each probability after the softmax.
We do this for two reasons: to avoid operating with probabilities that are dangerously close to 0, since computers do not operate well with those kind of numbers; and to make it easier to compute the log-probability of a sequence of tokens: it’s easier to sum the log of several probabilities instead of computing their original value product.
I’ll use surprisal: it’s just the negative value of the log-probability, and I expect that when an LLM parses a GPS it will get to a point where it will show high surprisal, and that’s exactly where the GPS gets strange. And I want to find which parts of the LLM are responsible for this.