Software Packages

  • A citation recommendation system that allows users to find relevant citations for their paper drafts. The tool is backed by Semantic Scholar's OpenCorpus dataset.

  • Science Parse parses scientific papers (in PDF form) and returns them in structured form. It supports these fields: title, authors, abstract, sections (each with heading and body text), bibliography entries, each with title, authors, venue, and year, and finally mentions (i.e., places in the paper where bibliography entries are mentioned).

  • Given a pair of sentences (premise, hypothesis), the decomposed graph entailment model (DGEM) predicts whether the premise can be used to infer the hypothesis. The model decomposes the support for a structured graph representation of the hypothesis into support for its individual nodes and edges. This model was designed for the SciTail dataset and is described in more detail in SciTail: A Textual Entailment Dataset from Science Question Answering (AAAI’18). This repository also contains two baseline textual entailment models built using our NLP library, AllenNLP.

  • With alexafsm, developers can model dialog agents with first-class concepts such as states, attributes, transition, and actions. alexafsm also provides visualization and other tools to help understand, test, debug, and maintain complex FSM conversations. Learn more about alexafsm in this article.

  • Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query-aware context representation without early summarization.

  • Aristo mini is a light-weight question answering system that can quickly evaluate Aristo science questions with an evaluation web server and the provided baseline solvers.

  • Build tables of information by extracting facts from indexed text corpora via a simple and effective query language.

    Visit the public IKE code repository.

  • Given a paragraph that describes a process, output a "flow chart" representation of that process.

    The algorithm this software implements is described "Modeling Biological Processes for Reading Comprehension" (EMNLP '14), and an example dataset is ProcessBank Data

  • Given a scholarly PDF, extract figures, tables, captions, and section titles.

    PDFFigures 2.0 powers the figure extraction feature in Semantic Scholar. It is an expansion of the original PDFFigures algorithm. Also see "PDFFigures 2.0: Mining Figures from Research Papers" (JCDL '16) for a description of the algorithm.

  • Given two sentences, compute the optimal alignment of words/phrases between the two.

    This algorithm exploits a variety of background lexical knowledge resources. Also see "Semi-Markov Phrase-based Monolingual Alignment" (EMNLP '13) for a description of the algorithm.