Explainability - Technical Essentials
- Overview
- Key Dimensions of Explainability
- 1. Attention Visualization
- 2. Model Interpretability
- 3. Output Diversification
- 4. Error Analysis
- 5. Training Data Transparency
- Understanding Explainability in AI: EU AI Act, NIST, ISO42001
- Glossary of terms used
Overview
Transparency and accountability are fundamental for responsible AI model usage. To achieve this, all AI models, including traditional machine learning and generative AI, require explanations across multiple dimensions. While researchers may employ diverse approaches to explainability, our focus is on five key areas: attention visualization, model interpretability, output diversification, error analysis, and training data transparency. Infosys Responsible AI toolkit is continually enriched by incorporating techniques and methods from ongoing research within these areas.
Key Dimensions of Explainability

1. Attention Visualization
A technique used in machine learning to understand how a model focuses on different parts of an input sequence when making a prediction. It is particularly useful for models that use attention mechanisms, such as transformers, which are widely used in natural language processing (NLP) and computer vision tasks.
Technique |
Purpose |
Sample Output |
---|---|---|
Token Importance Charts |
To understand the relative importance of individual tokens (e.g., words, sub words) |
![]() |
Heat Map |
To highlight the most important regions or features of an input that contribute to a model's prediction. |
![]() |
Saliency Map |
To identify the most important regions of an image for a specific task (e.g., classification, object detection). |
![]() |
Super pixels |
To group pixels in an image into perceptually meaningful regions (e.g., objects, textures). |
![]() |
Circo's Plots |
To represent relationships between entities in a circular format. They are particularly useful for visualizing complex data sets, such as genomic data, protein-protein interactions, and social networks. |
![]() |
Sankey Diagrams |
To represent the flow of quantities through a system. They are particularly useful for understanding how input values are transformed into output values in a machine learning model. |
![]() |
2. Model Interpretability
Model interpretability aims at understanding how AI models arrive at its predictions. It involves breaking down the complex decision-making process of the model into human-understandable terms.
Feature Importance is a critical component of Model Interpretability, used to understand the relative importance of different features (variables) an AI model prediction. By quantifying the contribution of each feature, it helps to explain the model's decision-making process.
AI models require explanations of key features at both the overall model level and the individual instance level, termed global and local interpretability. Brief descriptions of terms related to model interpretability are outlined below
Global Interpretability
Understanding the overall behavior and decision-making process of a model across its entire input space. It provides insights into the general patterns and trends that the model has learned from the data. The following techniques help in providing Global explainability:
Permutation importance
Mean decrease impurity
Decision trees
Rule-based induction
Surrogate models
Self-reasoning techniques
Local Interpretability
The ability to understand how a machine learning model arrives at its predictions for specific instances. Unlike global interpretability, which focuses on understanding the overall behavior of the model, local interpretability provides explanations tailored to individual predictions. The following techniques help in providing Local explainability:
LIME
SHAP
Anchors
Counter factual explanations
Self-reasoning techniques
List of Model Interpretability Techniques
Permutation Importance
a technique used to assess the importance of a feature by measuring the change in model performance when the feature's values are shuffled randomly.
Purpose: Feature ranking, Model understanding, Feature selection, Bias detection
Applicability: All Machine learning models
SHAP (SHapley Additive exPlanations)
a game-theoretic method for explaining the output of any machine learning model. It calculates the contribution of each feature to a prediction, providing a more comprehensive understanding of the model's decision-making process.
Purpose: Feature ranking, Model understanding, Bias detection, Debugging
Applicability: SHAP is a model-agnostic method and applied to wide range of machine learning models
(e.g., Decision trees, Random forests, Gradient boosting machines, Neural networks, Support vector machines)
LIME (Local Interpretable Model-Agnostic Explanations)
a technique used in machine learning to explain the predictions of complex models in a locally interpretable way. It works by approximating a complex model with a simpler, linear model around a specific prediction.
Purpose: Local interpretability, Model agnosticism, Feature importance, Bias detection, Debugging
Applicability: LIME is a model-agnostic method and applied to wide range of machine learning models
(e.g., Decision trees, Random forests, Gradient boosting machines, Neural networks, Support vector machines)
Anchor Tabular
a technique used in explainability to identify a minimal set of conditions (anchors) that are sufficient to explain a prediction made by a machine learning model. These anchors are human-readable rules that capture the essence of the model's decision-making process for a specific instance.
Purpose: Local interpretability, Simplicity, Feature importance, Bias detection, Debugging
Applicability: Anchor Tabular is primarily designed for tabular data, but it can also be applied to other types of data with some modifications. It can be used to explain the predictions of various machine learning models (e.g., Decision trees, Random forests, Gradient boosting machines, Neural networks, Support vector machines)
Counterfactual explanations
a technique used in explainable AI to generate hypothetical scenarios that could have led to different predictions. By understanding how changes in input features would have affected the model's output, we can gain insights into the model's decision-making process.
Purpose: Local interpretability, Causality, Fairness, Debugging
Applicability: applied to wide range of machine learning models (e.g., Decision trees, Random forests, Gradient boosting machines, Neural networks, Linear models)
Mean Decrease Impurity (MDI)
a metric used in machine learning to measure the importance of features in a decision tree model. It quantifies the average reduction in impurity (e.g., Gini impurity, entropy) that results from splitting on a particular feature.
Purpose: Feature importance, Feature selection, Model understanding, Bias detection
Applicability: MDI is primarily used to explain decision tree models, as it is a metric derived from the impurity measures used in decision trees. However, it can also be applied to ensemble models like Random Forests, where the average MDI across all trees can be used to assess feature importance.
Decision Trees
a type of machine learning model that can be used for both classification and regression tasks. In the context of explainability, they are particularly valuable due to their inherent interpretability.
Purpose: Visual representation, Rule extraction, Feature importance, Local interpretability
Applicability: While decision trees themselves are highly interpretable, they can also be used to explain other types of machine learning models like Random Forests and Gradient Boosting Machines
Rule-based induction
a technique used in machine learning to extract human-readable rules from a model. These rules can provide insights into the model's decision-making process and make it easier to understand and interpret.
Purpose: Interpretability, Understanding, Debugging, Knowledge extraction
Applicability: Rule-based induction can be applied to a variety of machine learning models, including:
Decision Trees: Rules can be extracted directly from decision trees.
Rule-based Classifiers: Models that are explicitly designed to learn rules from data, such as RIPPER or CN2.
Neural Networks: Rules can be extracted from neural networks using techniques like rule extraction or symbolic regression.
Ensemble Models: Rules can be extracted from individual models in an ensemble and combined to provide a more comprehensive explanation.
Surrogate models
simpler models that approximate the behavior of a more complex machine learning model. They are used in explainability to provide a more understandable representation of the complex model's decision-making process
Purpose: Interpretability, Approximation, Feature importance, Debugging
Applicability: Surrogate models can be used to explain a wide range of machine learning models, including:
Deep Neural Networks: Complex neural networks can be approximated by simpler models like linear regression or decision trees.
Ensemble Models: Surrogate models can be used to explain the combined behavior of multiple models in an ensemble.
Black-box Models: Any machine learning model that is difficult to understand directly can be approximated by a surrogate model.
Common Surrogate Models:
Linear Regression: A simple linear model that can be used to approximate the relationship between input features and the target variable.
Decision Trees: Decision trees are often used as surrogate models due to their interpretability.
Rule-based Models: Models that are based on a set of rules can be used as surrogate models to explain the decision-making process.
Self-reasoning techniques
a set of techniques that enable LLMs to generate more comprehensive and coherent explanations for their outputs. These frameworks often involve breaking down complex problems into smaller, more manageable steps and guiding the LLM through a reasoning process. This technique relies heavily on prompt engineering, with thoughtfully crafted prompts leading to informative outcomes.
Few models built on self-reasoning framework are:
Chain of Thought (CoT): Step-by-step reasoning technique prompts the LLM to break down a complex task into smaller, more manageable steps and explain its reasoning for each step. This allows users to follow the LLM's thought process and understand how it arrived at its conclusion.
Thread-of-Thought (ToT) : Generate a series of interconnected thoughts that form a coherent narrative. This helps to visualize the LLM's reasoning process and identify any inconsistencies or biases.
Graph-of-Thought (GoT) : represents the LLM's reasoning process as a graph, where nodes represent intermediate thoughts and edges represent the connections between them. This visual representation can be helpful for understanding the LLM's decision-making process.
Chain-of-Verification (CoV) : prompts the LLM to verify its responses against external knowledge sources. This helps to ensure the accuracy and reliability of the LLM's outputs.
ReRead Reasoning (RE2) : Unlike most thought eliciting prompting methods, such as Chain-of Thought (CoT), which aim to elicit the reasoning process in the output, RE2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. Consequently, RE2 demonstrates strong generality and compatibility with most thought eliciting prompting methods.
Logic of Thoughts (LOT) : Logic-of-Thought (LoT) prompting which employs propositional logic to generate expanded logical information from input context, and utilizes the generated logical information as an additional augmentation to the input prompts, thereby enhancing the capability of logical reasoning. The LoT is orthogonal to existing prompting methods and can be seamlessly integrated with them.
Purpose: Step-by-step breakdown, Coherent narratives, Accuracy and reliability, Transparency
Applicability: LLMs
3. Output Diversification
Output diversification in explainability refers to the practice of generating multiple, diverse explanations for a given model prediction or decision. This is done to ensure that the explanations are not biased towards a particular perspective or interpretation.
Ensemble Methods
Combining multiple models or explanations can provide a more diverse set of insights. Few examples of ensembling are outlined below
Bagging
Train multiple decision trees on different subsets of the data and combine their explanations.
Boosting
Train a series of models, focusing on the instances that were misclassified by previous models, and combine their explanations.
Random Forest
Train multiple decision trees, each using a random subset of features and samples, and combine their explanations.
Stacking
Train a logistic regression model to combine the explanations from multiple base models like decision trees, neural networks.
Blending
Train multiple models (ex. decision tree, random forest, neural network), use SHAP for feature importance, and average the results to obtain a combined explanation.
Counterfactual Explanations
Generating alternative scenarios/hypothetical examples and evaluate the impact of specific input changes on the model’s outcome.
Perturbation Techniques
Introducing small changes to the input data or model parameters can reveal different explanations. Examples of perturbation include:
Input perturbation - Add random noise to the input variables to see how the explanation changes.
Model perturbation - Modify the weights of the model to see how the importance of different features changes.
Feature Attribution Methods
Using various feature attribution methods (e.g., SHAP, LIME) can offer different perspectives on the importance of input features.
Explanation Clustering Techniques
Group similar explanations together to identify patterns, trends, and relationships within a set of model explanations. Various clustering techniques include:
K-means clustering
Hierarchical clustering
Density based clustering
Spectral clustering
Topic modeling
4. Error Analysis
Error analysis in explainability is the process of systematically examining the errors made by a machine learning model to understand why it made incorrect predictions. This involves analyzing the model's behavior, the input data, and the underlying reasoning behind its decisions. Key milestones in error analysis are outlined below.
Error Classification: Categorizing errors based on their nature. For example, false positives, false negatives, misclassifications.
Root Cause Analysis: Investigate data quality issues, model complexity, or algorithmic biases.
Bias Detection: Use fairness metrics or bias detection algorithms to uncover biases in the model's predictions
Data Quality Assessment: Verify the quality of the input data, looking for missing values, outliers, or inconsistencies.
Model Complexity Analysis: Determine if the model is too complex or too simple for the task.
5. Training Data Transparency
The practice of providing information about the data used to train a machine learning model is crucial for understanding the model's decision-making process. Hence clear documentation of following aspects of training data is required to ensure fairness and reliability of AI models
Data Quality: Verify the accuracy and reliability of the data and check for inconsistencies
Data Privacy: Ensure compliance with relevant data privacy regulations and implement measures to safeguard the sensitive data and prevent unauthorized access
Data characteristics: Outline the details about the data, such as its size, format, and distribution.
Data preprocessing: Understand preprocessing steps applied to the data, such as normalization, scaling, or feature engineering and assess their impact on the model performance and interpretability
Data biases: Use bias detection algorithms or statistical analysis to identify potential biases in the data and implement mitigation strategies
Data provenance: Keep detailed records of data sources, collection methods, and preprocessing steps and provide access to these records to the stakeholders
Understanding Explainability in AI: EU AI Act, NIST, ISO42001
NIST (National Institute of Standards and Technology) - AI RMF
NIST’s Artificial Intelligence Risk Management Framework (AI RMF) has proposed four key principles for explainable AI (XAI) to promote trust and transparency in AI systems:
Explanation: AI systems should provide accompanying evidence or reasons for all outputs. This means that users should be able to understand why the system made a particular decision.
e.g.: A self-driving car system should be able to explain why it decided to brake suddenly, such as by highlighting the detected object, the distance to it, and the speed of the car.
Meaningful: Explanations should be understandable to individual users, regardless of their technical expertise. This requires tailoring explanations to the specific needs and knowledge level of the user.
e.g.: A loan approval system should explain the decision in terms that a non-technical user can understand, such as using plain language instead of technical jargon.
Explanation Accuracy: Explanations should accurately reflect the system's process for generating the output. This ensures that users can trust the explanations provided by the system.
e.g.: An image classification system should not provide an explanation that is based on a misunderstanding of the image content, such as mistaking a cat for a dog.
Knowledge Limits: AI systems should only operate under conditions for which they were designed and when they reach sufficient confidence in their output. This helps to prevent the system from making decisions that are beyond its capabilities or that are based on insufficient information.
e.g.: A weather forecasting system should indicate when it is uncertain about its predictions, such as by providing a confidence level or range of possible outcomes.
European Union Artificial Intelligence Act (EU AI Act)
Explainability by Design: AI systems should be designed to be explainable from the outset, rather than as an afterthought. This means that developers should consider explainability throughout the development process.
e.g.: Use LIME or SHAP for understanding AI model decisions. Choose interpretable models like decision trees.
Human-Understandable Explanations: Explanations should be provided in a way that is understandable to humans, regardless of their technical expertise. This may involve using natural language, visualizations, or other methods.
e.g.: An AI system that recommends products generates explanations like "We recommended this product because it aligns with your past purchase history and preferences."
Contextual Explanations: Explanations should be provided in the context of the specific use case. For example, an explanation of a credit scoring algorithm should be tailored to the specific needs of the lender and the borrower.
e.g.: A financial AI system explains a loan rejection by providing examples of similar cases where loans were denied due to similar reasons.
Transparency of Decision-Making: AI systems should be transparent in their decision-making processes. This means that users should be able to understand how the system reached a particular decision.
e.g.: An AI system that predicts customer churn provides information about the most important factors that contribute to the prediction, such as customer tenure, recent purchase frequency, and customer satisfaction ratings.
Accountability: Developers and operators of AI systems should be accountable for the decisions that their systems make. This includes being able to explain the reasoning behind those decisions.
e.g.: Developers are required to provide documentation and training materials to ensure that operators understand how the AI system works and can address potential issues. Organizations establish ethical review boards to assess the potential risks and benefits of AI systems and ensure they are used responsibly.
ISO 42001 - International Standard for AI Governance
Being ISO 42001 certified organization, Infosys Responsible AI office adheres to applicable guidelines for Explainability implementation. Following is the snapshot of ISO42001 guidelines for Responsible AI implementation.
Ensuring Trustworthiness
Ethical AI: Adhering to ethical principles and values throughout the AI lifecycle.
Fairness: Mitigating biases and ensuring AI systems treat individuals fairly.
Transparency: Providing clear explanations for AI decision-making processes.
Accountability: Taking responsibility for the development, deployment, and use of AI systems.
Safety: Prioritizing safety and minimizing risks associated with AI applications.
Promoting Human-Centric AI
User Experience: Considering the needs and experiences of users when designing and deploying AI systems.
Inclusivity: Ensuring AI systems are accessible and inclusive for diverse populations.
Managing AI Risks
Risk Assessment: Identifying and assessing potential risks associated with AI Systems.
Mitigation Strategies: Implementing measures to mitigate identified risks.
Continuous Monitoring: Regular monitoring and evaluating AI systems for emerging risks.
Demonstrating Commitment to Responsible AI
Credibility: Establishing credibility and trust with stakeholders.
Competitive Advantage: Gaining a competitive advantage by demonstrating a commitment to responsible AI practices.
Regulatory Compliance: Meeting regulatory requirements related to AI governance.
Driving Innovation
Ethical Innovation: Fostering innovation while adhering to ethical principles.
Responsible Development: Developing AI systems that contribute positively to society.
Glossary of terms used
Feature Ranking: To identify the most important features that contribute significantly to the model's predictions.
Model Understanding: To understand how different features interact and influence the model's outcomes.
Feature Selection: To identify redundant or irrelevant features that can be removed without affecting performance.
Bias Detection: To detect potential biases in the model by identifying features that disproportionately influence predictions.
Feature Importance: To identify the most important features that contribute significantly to the model's predictions.
Debugging: To identify and correct errors in the model's predictions.
Local Interpretability: To provide explanations tailored to individual predictions, rather than global explanations that apply to the entire model.
Model Agnosticism: Can be applied to any machine learning model, regardless of its complexity or architecture.
Simplicity: Generates simple, human-understandable rules that can be easily interpreted.
Causality: Can help to identify causal relationships between input features and the model's predictions.
Fairness: Can be used to detect and mitigate biases in the model by identifying features that disproportionately influence predictions.
Visual Representation: Decision trees can be visualized as tree-like structures, making it easy to understand the decision-making process.
Rule Extraction: Rules can be extracted from decision trees, providing human-readable explanations of the model's predictions.
Knowledge Extraction: Can be used to extract knowledge from the model that can be applied to other tasks.
Approximation: Surrogate models can be used to approximate the behavior of a complex model in a specific region of the input space.
Step-by-Step Breakdown: These frameworks break down complex problems into smaller, more manageable steps, making it easier to follow the LLM's reasoning.
Coherent Narratives: By generating interconnected thoughts or a graph representation, these techniques create a more coherent and understandable explanation.
Accuracy and Reliability: CoV helps to ensure the accuracy and reliability of the LLM's outputs by verifying them against external sources.
Transparency: These techniques provide transparency into the LLM's decision-making process, allowing users to understand how the model arrived at its conclusions.