Explainable AI: Language Models
Hold your large language models accountable by learning how to “explain” the predictions
Introduction
Just like a coin, explainability in AI has two faces — one it shows to the developers (who actually build the models) and the other to the users (the end customers). The former face (IE i.e. intrinsic explainability) is a technical indicator to the builder that explains the working of the model. Whereas the latter (EE i.e. extrinsic explainability) is proof to the customers about the model’s predictions. While IE is required for any reasonable model improvement, we need EE for factual confirmation. A simple layman who ends up using the model’s prediction needs to know why is the model suggesting something. This article provides a brief introduction and famous techniques used to infer both types of explainabilities, with a focus on large language models.
Large Language models
As the name suggests, language models (LMs) try to model a language by creating probability distribution over the sequence of tokens (that can be words). This can be applied to the text generation task, as once the model is good enough with the probability, it can start predicting the next words provided some contextual words (prompts). This idea forms the basis…