Publications

  • Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert B. van Dover, John Gregoire and Carla P. Gomes AAAI 2017 IAAI-17 Best paper award

    High-throughput materials discovery involves the rapid synthesis, measurement, and characterization of many different but structurally related materials. A central problem in materials discovery, the phase map identification problem, involves the determination of the crystal structure of materials from materials composition and structural characterization data. We present Phase-Mapper, a novel solution platform that allows humans to interact with both the data and products of AI algorithms, including the incorporation of human feedback to constrain or initialize solutions. Phase-Mapper is compatible with any spectral demixing algorithm, including our novel solver, AgileFD, which is based on convolutive non-negative matrix factorization. AgileFD allows materials scientists to rapidly interpret XRD patterns, and can incorporate constraints to capture the physics of the materials as well as human feedback. We compare three solver variants with previously proposed methods in a large-scale experiment involving 20 synthetic systems, demonstrating the efficacy of imposing physical constraints using AgileFD. Since the deployment of Phase-Mapper at the Department of Energy’s Joint Center for Artificial Photosynthesis (JCAP), thousands of X-ray diffraction patterns have been processed and the results are yielding discovery of new materials for energy applications, as exemplified by the discovery of a new family of metal oxide solar light absorbers, among the previously unsolved Nb-Mn-V oxide system, which is provided here as an illustrative example. Phase-Mapper is also being deployed at the Stanford Synchrotron Radiation Lightsource (SSRL) to enable phase mapping on datasets in real time. Less

  • Mateo Diaz, Ronan Le Bras and Carla P. Gomes CP-AI-OR 2017

    Spatially Balanced Latin Squares are combinatorial struc- tures of great importance for experimental design. From a computational perspective they present a challenging problem and there is a need for efficient methods to generate them. Motivated by a real-world applica- tion, we consider a natural extension to this problem, balanced Latin Rectangles. Balanced Latin Rectangles appear to be even more defiant than balanced Latin Squares, to such an extent that perfect balance may not be feasible for Latin rectangles. Nonetheless, for real applications, it is still valuable to have well balanced Latin rectangles. In this work, we study some of the properties of balanced Latin rectangles, prove the nonexistence of perfect balance for an infinite family of sizes, and present several methods to generate the most balanced solutions. Less

  • Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, R. Bruce van Dover, Carla P. Gomes and John M. Gregoire ACS Combinatorial Science 2017

    Rapid construction of phase diagrams is a central tenet of combinatorial materials science with accelerated materials discovery efforts often hampered by challenges in interpreting combinatorial X-ray diffraction data sets, which we address by developing AgileFD, an artificial intelligence algorithm that enables rapid phase mapping from a combinatorial library of X-ray diffraction patterns. AgileFD models alloying-based peak shifting through a novel expansion of convolutional nonnegative matrix factorization, which not only improves the identification of constituent phases but also maps their concentration and lattice parameter as a function of composition. By incorporating Gibbs’ phase rule into the algorithm, physically meaningful phase maps are obtained with unsupervised operation, and more refined solutions are attained by injecting expert knowledge of the system. The algorithm is demonstrated through investigation of the V–Mn–Nb oxide system where decomposition of eight oxide phases, including two with substantial alloying, provides the first phase map for this pseudoternary system. This phase map enables interpretation of high-throughput band gap data, leading to the discovery of new solar light absorbers and the alloying-based tuning of the direct-allowed band gap energy of MnV2O6. The open-source family of AgileFD algorithms can be implemented into a broad range of high throughput workflows to accelerate materials discovery. Less

  • Yexiang Xue, Stefano Ermon, Ronan Le Bras, Carla P. Gomes and Bart Selman ICML 2016

    The ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements. Less

  • Tao Zou, Ronan Le Bras, Marcos Vaz Salles, Alan Demers, Johannes Gehrke The VLDB Journal 2015

    An increasing number of distributed data-driven applications are moving into shared public clouds. By sharing resources and operating at scale, public clouds promise higher utilization and lower costs than private clusters. To achieve high utilization, however, cloud providers inevitably allocate virtual machine instances non-contiguously; i.e., instances of a given application may end-up in physically distant machines in the cloud. This allocation strategy can lead to large differences in average latency between instances. For a large class of applications, this difference can result in significant performance degradation, unless care is taken in how application components are mapped to instances. In this paper, we propose ClouDiA, a general deployment advisor that selects application node deployments minimizing either (i) the largest latency between application nodes, or (ii) the longest critical path among all application nodes. ClouDiA employs a number of algorithmic techniques, including mixed-integer programming and constraint programming techniques, to efficiently search the space of possible mappings of application nodes to instances. Through experiments with synthetic and real applications in Amazon EC2, we show that mean latency is a robust metric to model communication cost in these applications and that our search techniques yield a 15–55 % reduction in time-to-solution or service response time, without any need for modifying application code. Less

  • Stefano Ermon, Ronan Le Bras, Santosh Suram, John M. Gregoire, Carla Gomes, Bart Selman and Robert B. van Dover AAAI 2015

    Identifying important components or factors in large amounts of noisy data is a key problem in machine learning and data mining. Motivated by a pattern decomposition problem in materials discovery, aimed at discovering new materials for renewable energy, e.g. for fuel and solar cells, we introduce CombiFD, a framework for factor based pattern decomposition that allows the incorporation of a-priori knowledge as constraints, including complex combinatorial constraints. In addition , we propose a new pattern decomposition algorithm , called AMIQO, based on solving a sequence of (mixed-integer) quadratic programs. Our approach considerably outperforms the state of the art on the materials discovery problem, scaling to larger datasets and recovering more precise and physically meaningful de-compositions. We also show the effectiveness of our approach for enforcing background knowledge on other application domains. Less

  • Ronan Le Bras, Yexiang Xue, Richard Bernstein, Carla P. Gomes and Bart Selman HCOMP 2014

    We propose a general framework for boosting combinatorial solvers through human computation. Our framework combines insights from human workers with the power of combinatorial optimization. The combinatorial solver is also used to guide requests for the workers , and thereby obtain the most useful human feedback quickly. Our approach also incorporates a problem decomposition approach with a general strategy for discarding incorrect human input. We apply this framework in the domain of materials discovery, and demonstrate a speedup of over an order of magnitude. Less

  • Ronan Le Bras, Carla P. Gomes, and Bart Selman CP 2014

    According to the Erdős discrepancy conjecture, for any infinite ±1 sequence, there exists a homogeneous arithmetic progression of unbounded discrepancy. In other words, for any ±1 sequence (x1, x2, ...) and a discrepancy C, there exist integers m and d such that | m i=1 x i·d | > C. This is an 80-year-old open problem and recent development proved that this conjecture is true for discrepancies up to 2. Paul Erd˝ os also conjectured that this property of unbounded discrepancy even holds for the restricted case of completely multiplicative sequences (CMSs), namely sequences (x1, x2, ...) where x a·b = xa · x b for any a, b ≥ 1. The longest CMS with discrepancy 2 has been proven to be of size 246. In this paper, we prove that any completely multiplicative sequence of size 127, 646 or more has discrepancy at least 4, proving the Erd˝ os discrepancy conjecture for CMSs of discrepancies up to 3. In addition, we prove that this bound is tight and increases the size of the longest known sequence of discrepancy 3 from 17, 000 to 127, 645. Finally, we provide inductive construction rules as well as streamlining methods to improve the lower bounds for sequences of higher discrepancies. Less

  • Ronan Le Bras, Richard Bernstein, John M. Gregoire, Santosh K. Suram, Carla P. Gomes, Bart Selman and Robert B. van Dover AAAI 2014

    Newly-discovered materials have been central to recent technological advances. They have contributed significantly to breakthroughs in electronics, renewable energy and green buildings, and overall, have promoted the advancement of global human welfare. Yet, only a fraction of all possible materials have been explored. Accelerating the pace of discovery of materials would foster technological innovations, and would potentially address pressing issues in sustainability, such as energy production or consumption. The bottleneck of this discovery cycle lies, however, in the analysis of the materials data. As materials scientists have recently devised techniques to efficiently create thousands of materials and experimentalists have developed new methods and tools to characterize these materials, the limiting factor has become the data analysis itself. Hence, the goal of this paper is to stimulate the development of new computational techniques for the analysis of materials data, by bringing together the complimentary expertise of materials scientists and computer scientists. In collaboration with two major research laboratories in materials science, we provide the first publicly available dataset for the phase map identification problem. In addition, we provide a parameterized synthetic data generator to assess the quality of proposed approaches, as well as tools for data visualization and solution evaluation. Less

  • Ronan Le Bras, Richard Bernstein, Carla P. Gomes, Bart Selman and Robert B. van Dover IJCAI 2013

    We will show how human computation insights can be key to identifying so-called backdoor variables in combinatorial optimization problems. Backdoor variables can be used to obtain dramatic speed-ups in combinatorial search. Our approach leverages the complementary strength of human input, based on a visual identification of problem structure , crowdsourcing, and the power of combina-torial solvers to exploit complex constraints. We describe our work in the context of the domain of materials discovery. The motivation for considering the materials discovery domain comes from the fact that new materials can provide solutions for key challenges in sustainability, e.g., in energy, new catalysts for more efficient fuel cell technology. Less

  • Ronan Le Bras, Carla P. Gomes and Bart Selman IJCAI 2013

    We present the first polynomial time construction procedure for generating graceful double-wheel graphs. A graph is graceful if its vertices can be labeled with distinct integer values from {0, ..., e}, where e is the number of edges, such that each edge has a unique value corresponding to the absolute difference of its endpoints. Graceful graphs have a range of practical application domains, including in radio astronomy, X-ray crystallography, cryptography , and experimental design. Various families of graphs have been proven to be graceful, while others have only been conjectured to be. In particular, it has been conjectured that so-called double-wheel graphs are graceful. A double-wheel graph consists of two cycles of N nodes connected to a common hub. We prove this conjecture by providing the first construction for graceful double-wheel graphs, for any N > 3, using a framework that combines streamlined constraint reasoning with insights from human computation. We also use this framework to provide a polynomial time construction for diagonally ordered magic squares. Less

  • Tao Zou, Ronan Le Bras, Marcos Vaz Salles, Alan Demers and Johannes Gehrke VLDB 2013

    An increasing number of distributed data-driven applications are moving into shared public clouds. By sharing resources and operating at scale, public clouds promise higher utilization and lower costs than private clusters. To achieve high utilization, however, cloud providers inevitably allocate virtual machine instances non-contiguously; i.e., instances of a given application may end-up in physically distant machines in the cloud. This allocation strategy can lead to large differences in average latency between instances. For a large class of applications, this difference can result in significant performance degradation, unless care is taken in how application components are mapped to instances. In this paper, we propose ClouDiA, a general deployment advisor that selects application node deployments minimizing either (i) the largest latency between application nodes, or (ii) the longest critical path among all application nodes. ClouDiA employs a number of algorithmic techniques, including mixed-integer programming and constraint programming techniques, to efficiently search the space of possible mappings of application nodes to instances. Through experiments with synthetic and real applications in Amazon EC2, we show that mean latency is a robust metric to model communication cost in these applications and that our search techniques yield a 15–55 % reduction in time-to-solution or service response time, without any need for modifying application code. Less

  • Ronan Le Bras, Bistra Dilkina, Yexiang Xue, Carla P. Gomes, Kevin S. McKelvey, Claire Montgomery and Michael K. Schwartz AAAI 2013

    Our work is motivated by an important network design application in computational sustainability concerning wildlife conservation. In the face of human development and climate change, it is important that conservation plans for protecting landscape connectivity exhibit certain level of robustness. While previous work has focused on conservation strategies that result in a connected network of habitat reserves, the ro-bustness of the proposed solutions has not been taken into account. In order to address this important aspect, we formalize the problem as a node-weighted bi-criteria network design problem with connectivity requirements on the number of disjoint paths between pairs of nodes. While in most previous work on survivable network design the objective is to minimize the cost of the selected network, our goal is to optimize the quality of the selected paths within a specified budget , while meeting the connectivity requirements. We characterize the complexity of the problem under different restrictions. We provide a mixed-integer programming encoding that allows for finding solutions with optimality guarantees, as well as a hybrid local search method with better scaling behavior but no guarantees. We evaluate the typical-case performance of our approaches using a synthetic benchmark, and apply them to a large-scale real-world network design problem concerning the conservation of wolverine and lynx populations in the U.S. Rocky Mountains (Montana). Less

  • Bistra Dilkina, Katherine Lai, Ronan Le Bras, Yexiang Xue, Carla P. Gomes, Ashish Sabharwal, Jordan Suter, Kevin S. McKelvey, Michael K. Schwartz and Claire Montgomery AAAI 2013

    Biodiversity underpins ecosystem goods and services and hence protecting it is key to achieving sustainability. However, the persistence of many species is threatened by habitat loss and fragmentation due to human land use and climate change. Conservation efforts are implemented under very limited economic resources, and therefore designing scalable, cost-efficient and systematic approaches for conservation planning is an important and challenging computational task. In particular, preserving landscape connectivity between good habitat has become a key conservation priority in recent years. We give an overview of landscape connectivity conservation and some of the underlying graph-theoretic optimization problems. We present a synthetic generator capable of creating families of randomized structured problems, capturing the essential features of real-world instances but allowing for a thorough typical-case performance evaluation of different solution methods. We also present two large-scale real-world datasets, including economic data on land cost, and species data for grizzly bears, wolverines and lynx. Less

  • Marcelo Finger, Ronan Le Bras, Carla P. Gomes and Bart Selman SAT 2013

    Practical problems often combine real-world hard constraints with soft constraints involving preferences, uncertainties or flexible requirements. A probability distribution over the models that meet the hard constraints is an answer to such problems that is in the spirit of incorporating soft constraints. We propose a method using SAT-based reasoning, probabilistic reasoning and linear programming that computes such a distribution when soft constraints are interpreted as constraints whose violation is bound by a given probability. The method, called Optimized Probabilistic Satis-fiability (oPSAT), consists of a two-phase computation of a probability distribution over the set of valuations of a SAT formula. Algorithms for both phases are presented and their complexity is discussed. We also describe an application of the oPSAT technique to the problem of combinatorial materials discovery. Less

  • Ronan Le Bras, Carla P. Gomes and Bart Selman AAAI 2012

    In recent years, significant progress in the area of search, constraint satisfaction, and automated reasoning has been driven in part by the study of challenge problems from combinatorics and finite algebra. This work has led to the discovery of interesting discrete structures with intricate mathematical properties. While some of those results have resolved open questions and conjectures, a shortcoming is that they generally do not provide further mathematical insights, from which one could derive more general observations. We propose an approach that integrates specialized combina-torial search, using so-called streamlining, with a human computation component. We use this approach to discover efficient constructive procedures for generating certain classes of combinatorial objects of any size. More specifically, using our framework, we discovered two complementary efficient constructions for generating so-called Spatially Balanced Latin squares (SBLS) of any order N, such that 2N+1 is prime. Previously constructions for SBLSs were not known. Our approach also enabled us to derive a new lower bound for so-called weak Schur numbers, improving on a series of earlier results for Schur numbers. Less

  • Ronan Le Bras, Stefano Ermon, Theodoros Damoulas, Richard Bernstein, Carla P. Gomes, Bart Selman and Robert B. van Dover CompSust 2012

    Combinatorial materials science involves the rapid, high-throughput synthesis, measurement, and analysis of a large number of different but structurally related materials. In combinatorial materials discovery, materials scientists search for intermetallic compounds with desirable physical properties by obtaining measurements on hundreds of samples from a composition spread. Determining the structure of the materials formed in a composition spread is key to understanding composition and property relations and can potentially result in a breakthrough materials discovery. This is an important and exciting direction in the emerging field of computational sustainability [4] as it aims to achieve the best possible use of our available material resources. One ultimate objective is to help discover the next-generation materials for fuel-cell catalysis, as such materials have the potential of dramatically increasing fuel cell capacity while reducing their cost. The analysis of composition spreads remains, however, a manual and laborious task. Thus the need for new techniques to automatically analyze and interpret such data. Whereas the data-intensive aspect of the area of materials discovery seems to favor Data-Mining or Machine Learning techniques, the rigorous and highly-structured physical properties that govern the crystallization on the composition spread interestingly suggest that constraint reasoning is key to a physically meaningful analysis. In this paper, we describe two novel approaches to this problem that integrate domain-specific scientific background knowledge about the physical and chemical properties of the materials. Our first approach combines constraint programming (CP) and machine learning (ML), while the second is based on satisfiability modulo theory (SMT). We evaluate the performance of our methods on realistic synthetic measurements, and we show that it provides accurate and physically meaningful interpretations of the data, even in the presence of artificially added noise. Less

  • Stefano Ermon, Ronan Le Bras, Carla P. Gomes, Bart Selman and Robert B. van Dover SAT 2012

    In combinatorial materials discovery, one searches for new materials with desirable properties by obtaining measurements on hundreds of samples in a single high-throughput batch experiment. As manual data analysis is becoming more and more impractical, there is a growing need to develop new techniques to automatically analyze and interpret such data. We describe a novel approach to the phase map identification problem where we integrate domain-specific scientific background knowledge about the physical and chemical properties of the materials into an SMT reasoning framework. We evaluate the performance of our method on realistic synthetic measurements, and we show that it provides accurate and physically meaningful interpretations of the data, even in the presence of artificially added noise. Less

  • Ronan Le Bras, Theodoros Damoulas, John M. Gregoire, Ashish Sabharwal, Carla P. Gomes and Robert B. van Dover CP 2011

    Motivated by an important and challenging task encountered in material discovery, we consider the problem of finding K basis patterns of numbers that jointly compose N observed patterns while enforcing additional spatial and scaling constraints. We propose a Constraint Programming (CP) model which captures the exact problem structure yet fails to scale in the presence of noisy data about the patterns. We alleviate this issue by employing Machine Learning (ML) techniques, namely kernel methods and clustering, to decompose the problem into smaller ones based on a global data-driven view, and then stitch the partial solutions together using a global CP model. Combining the complementary strengths of CP and ML techniques yields a more accurate and scalable method than the few found in the literature for this complex problem. Less

  • Ronan Le Bras, Alessandro Zanarini and Gilles Pesant CP 2009

    Accurately estimating the distribution of solutions to a problem , should such solutions exist, provides efficient search heuristics. The purpose of this paper is to propose new ways of computing such estimates , with different degrees of accuracy and complexity. We build on the Expectation-Maximization Belief-Propagation (EMPB) framework proposed by Hsu et al. to solve Constraint Satisfaction Problems (CSPs). We propose two general approaches within the EMBP framework: we firstly derive update rules at the constraint level while enforcing domain consistency and then derive update rules globally, at the problem level. The contribution of this paper is twofold: first, we derive new generic update rules suited to tackle any CSP; second, we propose an efficient EMBP-inspired approach, thereby improving this method and making it competitive with the state of the art. We evaluate these approaches experimentally and demonstrate their effectiveness. Less