Conference Papers (Refereed)

  • Laura Jehl, Adria de Gispert, Mark Hopkins, WIlliam Byrne EACL 2014

    We present a simple preordering approach for machine translation based on a feature-rich logistic regression model to predict whether two children of the same node in the source-side parse tree should be swapped or not. Given the pair-wise children regression scores we conduct an efficient depth-first branch-and-bound search through the space of possible children permutations, avoiding using a cascade of classifiers or limiting the list of possible ordering outcomes. We report experiments in translating English to Japanese and Korean, demonstrating superior performance as (a) the number of crossing links drops by more than 10% absolute with respect to other state-of-the-art preordering approaches, (b) BLEU scores improve on 2.2 points over the baseline with lexicalised reordering model, and (c) decoding can be carried out 80 times faster. Less

  • Mark Hopkins, Jonathan May ACL 2013

    What do we want to learn from a translation competition and how do we learn it with confidence? We argue that a disproportionate focus on ranking competition participants has led to lots of different rankings, but little insight about which rankings we should trust. In response, we provide the first framework that allows an empirical comparison of different analyses of competition results. We then use this framework to compare several analytical models on data from theWorkshop on Machine Translation (WMT). Less

  • Mark Hopkins, Jonathan May EMNLP 2011

    We offer a simple, effective, and scalable method for statistical machine translation parameter tuning based on the pairwise approach to ranking (Herbrich et al., 1999). Unlike the popularMERT algorithm (Och, 2003), our pairwise ranking optimization (PRO) method is not limited to a handful of parameters and can easily handle systems with thousands of features. Moreover, unlike recent approaches built upon the MIRA algorithm of Crammer and Singer (2003) (Watanabe et al., 2007; Chiang et al., 2008b), PRO is easy to implement. It uses off-the-shelf linear binary classifier software and can be built on top of an existing MERT framework in a matter of hours. We establish PRO’s scalability and effectiveness by comparing it to MERT and MIRA and demonstrate parity on both phrase-based and syntax-based systems in a variety of language pairs, using large scale data scenarios. Less

  • Extraction Programs: A Unified Approach to Translation Rule Extraction
    Mark Hopkins, Greg Langmead, Tai Vo WMT (Workshop on Machine Translation) 2011
  • Mark Hopkins, Greg Langmead EMNLP 2010

    Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called "scope-3") that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization. Less

  • Mark Hopkins, Jonas Kuhn SSST (Workshop on Syntax and Semantics in Statistical Translation) 2007

    We present the main ideas behind a new syntax-based machine translation system, based on reducing the machine translation task to a tree-labeling task. This tree labeling is further reduced to a sequence of decisions (of four varieties), which can be discriminatively trained. The optimal tree labeling (i.e. translation) is then found through a simple depth-first branch-and-bound search. An early system founded on these ideas has been shown to be competitive with Pharaoh when both are trained on a small subsection of the Europarl corpus. Less

  • Mark Hopkins, Jonas Kuhn ACL Workshop on Deep Linguistic Processing 2007

    In this paper, we propose a new syntax-based machine translation (MT) approach based on reducing the MT task to a tree-labeling task, which is further decomposed into a sequence of simple decisions for which discriminative classifiers can be trained. The approach is very flexible and we believe that it is particularly well-suited for exploiting the linguistic knowledge encoded in deep grammars whenever possible, while at the same time taking advantage of data-based techniques that have proven a powerful basis for MT, as recent advances in statistical MT show. Less

  • Mark Hopkins, Jonas Kuhn ACL 2006

    We revisit the idea of history-based parsing, and present a history-based parsing framework that strives to be simple, general, and exible. We also provide a decoder for this probability model that is linear-space, optimal, and anytime. A parser based on this framework, when evaluated on Section 23 of the Penn Treebank, compares favorably with other state-of-the-art approaches, in terms of both accuracy and speed. Less

  • Mark Hopkins, Jonas Kuhn EACL Workshop on Cross-Language Knowledge Induction 2006

    The standard PCFG approach to parsing is quite successful on certain domains, but is relatively inexible in the type of feature information we can include in its probabilistic model. In this work, we discuss preliminary work in developing a new probabilistic parsing model that allows us to easily incorporate many different types of features, including crosslingual information. We show how this model can be used to build a successful parser for a small handmade gold-standard corpus of 188 sentences (in Less

  • Michel Galley, Mark Hopkins, Kevin Knight, Daniel Marcu NAACL 2004

    Abstract: We propose a theory that gives formal semantics to word-level alignments defined over parallel corpora. We use our theory to introduce a linear algorithm that can be used to derive from word-aligned, parallel corpora the minimal set of syntactically motivated transformation rules that explain human translation data.

  • Mark Hopkins UAI 2003

    We analyze a new property of directed acyclic graphs (DAGs), called layerwidth, arising from a class of DAGs proposed by Eiter and Lukasiewicz. This class of DAGs permits certain problems of structural model-based causality and explanation to be tractably solved. In this paper, we first address an open question raised by Eiter and Lukasiewicz – the computational complexity of deciding whether a given graph has a bounded layerwidth. After proving that this problem is NP-complete, we proceed by proving numerous important properties of layerwidth that are helpful in efficiently computing the optimal layerwidth. Finally, we compare this new DAG property to two other important DAG properties: treewidth and bandwidth. Less

  • Mark Hopkins, Judea Pearl AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning 2003

    Recently, Halpern and Pearl proposed a definition of actual cause within the framework of structural models. In this paper, we explicate some of the assumptions underlying their definition, and re-evaluate the effectiveness of their account. We also briefly contemplate the suitability of structural models as a language for expressing subtle notions of commonsense causation. Less

  • Mark Hopkins, Adnan Darwiche PGM (Workshop on Probabilistic Graphical Models) 2002

    Algorithms that triangulate a graph such that the resulting triangulation is at most a constant-factor from the treewidth are important theoretically, but perform poorly in practice. In this paper, we show that a successful heuristic method for triangulating graphs (based on a structure known as a dtree) can be viewed as a relaxation of a well-known family of constant-factor approximation algorithms. In doing so, we provide a theoretical underpinning for this heuristic, and show how these theoretically elegant approximation algorithms can be effectively applied in practice. Less

  • Strategies for Determining Causes of Events
    Mark Hopkins AAAI 2001
  • Using Recursive Decomposition to Construct Elimination Orders, Jointrees, and Dtrees
    Adnan Darwiche, Mark Hopkins ECSQARU 2001

    Darwiche has recently proposed a graphical model for driving conditioning algorithms, called a dtree, which specifies a recursive decomposition of a directed acyclic graph (DAG) into its families. A main property of a dtree is its width, and it was shown previously how to convert a DAG elmination order of width w into a dtree of width ≤ w. The importance of this conversion is that any algorithm for constructing low-width elimination orders can be directly used for constructing low-width dtrees. We propose in this paper a more directly method for constructing dtrees based on hypergraph partitioning. Less