Our goal, in the context of open-domain textual question-answering (QA), is to explain answers by not just listing supporting textual evidence (“rationales”), but also showing how such evidence leads to the answer in a systematic way. If this could be done, new opportunities for understanding and debugging the system’s reasoning would become possible. Our approach is to generate explanations in the form of entailment trees, namely a tree of entailment steps from facts that are known, through intermediate conclusions, to the final answer. To train a model with this skill, we created EntailmentBank, the first dataset to contain multistep entailment trees. At each node in the tree (typically) two or more facts compose together to produce a new conclusion. Given a hypothesis (question + answer), we define three increasingly difficult explanation tasks: generate a valid entailment tree given (a) all relevant sentences (the leaves of the gold entailment tree) (b) all relevant and some irrelevant sentences (c) a corpus. We provide baseline results for this task, and analyze the problems involved.