Featured Demos
- Uncovering stereotypical biases via underspecified questions | Aristo
This work focuses specifically on identifying biases in question answering (QA) models. If these models are blindly deployed in real-life settings, the biases within these models could cause real harm, which raises the question; how extensive are social stereotypes in question-answering models?
Try the demo - Evaluating neural toxic degeneration in language models | Mosaic
In new joint work at AI2 and UW, we study how often popular NLP components produce problematic content, what might trigger this neural toxic degeneration from a given system, and whether or not it can be successfully avoided. We also study how much toxicity is present in the web text that these systems learned from to see why toxic degeneration is happening.
Try the demo - Find out whether scientific research supports or refutes a given claim | Semantic Scholar
Our fact verification demo was built using the SciFact dataset, a collection of 1.4K expert-written scientific claims paired with evidence-containing abstracts, and annotated with labels and rationales.
Try the demo - Crossing format boundaries with a single QA system | Aristo
UnifiedQA is a single pre-trained QA model that performs surprisingly well across 17 QA datasets spanning 4 diverse formats. Fine-tuning UnifiedQA into specialized models results in a new state-of-the-art on 6 datasets, establishing this model as a strong starting point for building QA systems.
Try the demo - Extractive search over CORD-19 with 3 powerful query modes | AI2 Israel, DIY Information Extraction
SPIKE-CORD is powerful sentence-level, context-aware, and linguistically informed extractive search system for exploring the CORD-19 corpus.
Try the demo - Exploring the evolving network of science in CORD-19 | Semantic Scholar
Use our exploratory search tools to find out what groups are working on what directions, see how biomedical concepts interact and evolve over time, and discover new connections.
Try the demo - Transformers as Soft Reasoners over Language | Aristo
RuleTaker determines whether statements are True or False based on rules given in natural language.
Try the demo - Several demos of a variety of popular computer vision models | PRIOR
The Computer Vision Explorer lets you try and compare a variety of popular computer vision models related to recognition, vision and language, human-centric vision, and scene geometry tasks. Use our example images or try with your own.
Try the demo - Try the QDMR CopyNet parser | AI2 Israel, Question Understanding
Live demo of the QDMR CopyNet parser from the paper Break It Down: A Question Understanding Benchmark (TACL 2020). The parser receives a natural language question as input and returns its Question Decomposition Meaning Representation (QDMR). Each step in the decomposition constitutes a subquestion necessary to answer the original question. More info: https://allenai.github.io/Break/
Try the demo - A Framework for Explaining Predictions of NLP Models | AllenNLP, AI2 Irvine
The AllenNLP Interpret toolkit makes it easy to apply gradient-based saliency maps and adversarial attacks to new models, as well as develop new interpretation methods. AllenNLP Interpret contains three components: a suite of interpretation techniques applicable to most models, APIs for developing new interpretation methods (e.g., APIs to obtain input gradients), and reusable front-end components for visualizing the interpretation results.
Try the demo