An overview of Azimuth, an open source tool for systematic error analysis for text classification.

An overview of Azimuth, an open source tool for systematic error analysis for text classification.

Thank you for watching this short video where we present Azimuth, an easy-to-use open-source tool that helps AI practitioners analyze AI models and datasets to check for errors in text classification. In this video, we walk you through a high-level overview of using Azimuth. The description below provides a summary of what is covered in the video.


Azimuth is an open-source application that helps AI practitioners and data scientists better understand their dataset and model predictions by performing thorough dataset and error analyses. The application leverages different tools, including robustness tests, semantic similarity analysis, and saliency maps, unified by concepts such as smart tags and proposed actions. While this version of Azimuth focuses on NLP classification problems, the tool could easily be adapted to apply to other data types and models, e.g. vision or tabular use cases. However, the current focus is on text classification.

Our code and documentation are available at


See our Getting Started video for more details.


  • Top Banner*
  • The top banner contains useful information and links.
  • The project name from the config file is shown.
  • A dropdown allows you to select the different pipelines defined in the config. It also allows you to select no pipelines.
  • The settings allow you to enable/disable different analyses.
  • Dataset Class Distribution Analysis*
  • The Distribution Analysis section highlights gaps between the class distributions of the training and the evaluation sets.
  • Missing samples: Verify if each intent has sufficient samples in both sets.
  • Representation mismatch: Assess that the representation of each intent is similar in both sets.
  • Length mismatch: Verify that the utterances' length are similar for each intent in both sets.
  • Performance Analysis*
  • The Performance Analysis section summarizes the model performance in terms of the prediction outcomes and metrics available in Azimuth. Change the value in the dropdown to see the metrics broken down per label, predicted class, or smart tag families. Use the toggle to alternate between the performance on the training set or on the evaluation set.
  • Behavioral Testing*
  • The Behavioral Testing section summarizes the behavioral testing performance. The failure rates on both the evaluation set and the training set to highlight the ratio of failed tests to the total amount of tests.
  • Click the failure rates to alternate between the performance on the training set or on the evaluation set. Select View details to get to Behavioral Testing Summary, which provides more information on tests and the option to export the results.
  • Post-processing Analysis*
  • The Post-processing Analysis provides an assessment of the performance of one post-processing step: the thresholding. The visualization shows the prediction outcome count on the evaluation set for different thresholds. Click View Details to see the plot full screen in Post-processing Analysis.


Azimuth leverages different analyses and concepts to enhance the process of dataset analysis and error analysis. The notion of smart tags is the most important concept, as it unifies most of the other analyses.

  • Smart Tags*
  • Smart tags are assigned to utterances by Azimuth when the app is launched. They can be seen as meta-data on the utterance and/or its prediction. The goal is to guide the error analysis process, identifying interesting data samples which may require further action and investigation. Different families of smart tags exist, based on the different types of analyses that Azimuth provides.
  • Proposed Actions*
  • While smart tags are computed automatically and cannot be changed, proposed actions are annotations that can be added by the user to identify a proposed action that should be done on a specific data sample.
  • Prediction Outcomes*
  • Another key concept used through the application is the notion of prediction outcomes. It acts as a metric of success for a given prediction.
  • Analyses*
  • In Azimuth, different types of analysis are provided. Each analysis has a dedicated section in the documentation. Almost all of them (except saliency maps) are linked to smart tags. Get started with Azimuth and try out the tool for Saliency Maps, Syntax Analysis, Similarity Analysis, Behavioral Testing, and Uncertainty Estimation.