Publications

Representing knowledge and querying data using double-functorial semantics, 2024. Michael Lambert, Evan Patterson.

Abstract

Category theory offers a mathematical foundation for knowledge representation and database systems. Popular existing approaches model a database instance as a functor into the category of sets and functions, or as a 2-functor into the 2-category of sets, relations, and implications. The functional and relational models are unified by double functors into the double category of sets, functions, relations, and implications. In an accessible, example-driven style, we show that the abstract structure of a 'double category of relations' is a flexible and expressive language in which to represent knowledge, and we show how queries on data in the spirit of Codd's relational algebra are captured by double-functorial semantics.

We argue that double categories of relations are a unifying language for knowledge representation and databases, combining Spivak and Kent’s functional ologs with my relational ologs. The paper is example-driven and intended to be accessible without detailed knowledge of double category theory.

We presented this paper at Applied Category Theory 2024 (video).
Transposing cartesian and other structure in double categories, 2024. Evan Patterson.

Abstract

The cartesian structure possessed by relations, spans, profunctors, and other such morphisms is elegantly expressed by universal properties in double categories. Though cartesian double categories were inspired in part by the older program of cartesian bicategories, the precise relationship between the double-categorical and bicategorical approaches has so far remained mysterious, except in special cases. We provide a formal connection by showing that every double category with iso-strong finite products, and in particular every cartesian equipment, has an underlying cartesian bicategory. To do so, we develop broadly applicable techniques for transposing natural transformations and adjunctions between double categories, extending a line of previous work rooted in the concepts of companions and conjoints.

I show that cartesian bicategories, in the sense of Carboni-Walters (I, II), can be systematically constructed from double categories with finite products, in the sense of Paré and my previous work. To do so, I introduce a general technique for transposing adjunctions in a double category.
GATlab: Modeling and programming with generalized algebraic theories, 2024. Owen Lynch, Kris Brown, James Fairbanks, Evan Patterson.
Abstract

Categories and categorical structures are increasingly recognized as useful abstractions for modeling in science and engineering. To uniformly implement category-theoretic mathematical models in software, we introduce GATlab, a domain-specific language for algebraic specification embedded in a technical programming language. GATlab is based on generalized algebraic theories (GATs), a logical system extending algebraic theories with dependent types so as to encompass category theory. Using GATlab, the programmer can specify generalized algebraic theories and their models, including both free models, based on symbolic expressions, and computational models, defined by arbitrary code in the host language. Moreover, the programmer can define maps between theories and use them to declaratively migrate models of one theory to models of another. In short, GATlab aims to provide a unified environment for both computer algebra and software interface design with generalized algebraic theories. In this paper, we describe the design, implementation, and applications of GATlab.
BibTeX
```
@inproceedings{2024-gatlab,
  title={{GATlab}: Modeling and programming with generalized algebraic theories},
  author={Lynch, Owen and Brown, Kris and Fairbanks, James and Patterson, Evan},
  booktitle={Proceedings of the Fortieth Conference on the Mathematical Foundations of Programming Semantics (MFPS 2024)},
  year={2024},
  doi={10.46298/entics.14666},
  eprinttype={arXiv},
  eprint={2404.04837}
}
```
Extracted from Catlab.jl, rewritten from scratch, and published for the first time here, GATlab.jl is a domain-specific language for algebraic specification embedded in the Julia programming language. GATlab is based on generalized algebraic theories (GATs), a logical system extending algebraic theories with dependent types.

We presented GATlab at Mathematical Foundations of Programming Semantics 2024 and also at JuliaCon 2024 (video).
A compositional framework for first-order optimization, 2024. Tyler Hanks, Matthew Klawonn, Evan Patterson, Matthew Hale, James Fairbanks.

Abstract

Optimization decomposition methods are a fundamental tool to develop distributed solution algorithms for large scale optimization problems arising in fields such as machine learning and optimal control. In this paper, we present an algebraic framework for hierarchically composing optimization problems defined on hypergraphs and automatically generating distributed solution algorithms that respect the given hierarchical structure. The central abstractions of our framework are operads, operad algebras, and algebra morphisms, which formalize notions of syntax, semantics, and structure preserving semantic transformations respectively. These abstractions allow us to formally relate composite optimization problems to the distributed algorithms that solve them. Specifically, we show that certain classes of optimization problems form operad algebras, and a collection of first-order solution methods, namely gradient descent, Uzawa's algorithm (also called gradient ascent-descent), and their subgradient variants, yield algebra morphisms from these problem algebras to algebras of dynamical systems. Primal and dual decomposition methods are then recovered by applying these morphisms to certain classes of composite problems. Using this framework, we also derive a novel sufficient condition for when a problem defined by compositional data is solvable by a decomposition method. We show that the minimum cost network flow problem satisfies this condition, thereby allowing us to automatically derive a hierarchical dual decomposition algorithm for finding minimum cost flows on composite flow networks. We implement our operads, algebras, and algebra morphisms in a Julia package and use our implementation to empirically demonstrate that hierarchical dual decomposition outperforms standard dual decomposition on classes of flow networks with hierarchical structure.

We explain how to compose optimization problems by making them into an algebra of an operad. We then show that several first-order optimization algorithms, starting with gradient descent, are morphisms between operad algebras. We implement these methods in the Julia package AlgebraicOptimization.jl.
Decapodes: a diagrammatic tool for representing, composing, and computing spatialized partial differential equations, 2024. Luke Morris, Andrew Baas, Jesus Arias, Maia Gatlin, Evan Patterson, James Fairbanks.
Abstract

We present Decapodes, a diagrammatic tool for representing, composing, and solving partial differential equations. Decapodes provides an intuitive diagrammatic representation of the relationships between variables in a system of equations, a method for composing systems of partial differential equations using an operad of wiring diagrams, and an algorithm for deriving solvers using hypergraphs and string diagrams. The string diagrams are in turn compiled into executable programs using the techniques of categorical data migration, graph traversal, and the discrete exterior calculus. The generated solvers produce numerical solutions consistent with state-of-the-art open source tools as demonstrated by benchmark comparisons with SU2. These numerical experiments demonstrate the feasibility of this approach to multiphysics simulation and identify areas requiring further development.
BibTeX
```
@article{morris2024,
  title={Decapodes: A diagrammatic tool for representing, composing, and computing spatialized partial differential equations},
  author={Morris, Luke and Baas, Andrew and Arias, Jesus and Gatlin, Maia and Patterson, Evan and Fairbanks, James},
  journal={Journal of Computational Science},
  volume={81},
  number={102345},
  year={2024},
  month={9},
  doi={10.1016/j.jocs.2024.102345},
  eprinttype={arXiv},
  eprint={2401.17432}
}
```
Decapodes.jl is a Julia package to modularly specify and then compile and simulate systems of partial differential equations. It is based on the discrete exterior calculus (DEC), implemented in our package CombinatorialSpaces.jl, and on diagrammatic differential equations, implemented in DiagrammaticEquations.jl.
The diagrammatic presentation of equations in categories, 2024. Kevin Arlin, James Fairbanks, Tim Hosgood, Evan Patterson.

Abstract

Lifts of categorical diagrams D: J→X against discrete opfibrations π: E→X can be interpreted as presenting solutions to systems of equations. With this interpretation in mind, it is natural to ask if there is a notion of equivalence of diagrams that precisely captures the idea of the two diagrams "having the same solutions."' We give such a definition, and then show how the localisation of the category of diagrams in X along such equivalences is isomorphic to the localisation of the slice category Cat/X along the class of initial functors. Finally, we extend this result to the 2-categorical setting, proving the analogous statement for any locally presentable 2-category in place of Cat.

Following up previous work on diagrammatic differential equations, we characterize when two diagrams presenting systems of equations “have the same solutions,” first, for diagrams in a category and then, replacing the 2-category of categories with any locally presentable 2-category.
Products in double categories, revisited, 2024. Evan Patterson.

Abstract

Products in double categories, as found in cartesian double categories, are an elegant concept with numerous applications, yet also have a few puzzling aspects. In this paper, we revisit double-categorical products from an unbiased perspective, following up an original idea by Paré to employ a double-categorical analogue of the family construction, or free product completion. Defined in this way, double categories with finite products are strictly more expressive than cartesian double categories, while being governed by a single universal property that is no more difficult to work with. We develop the basic theory and examples of such products and, by duality, of coproducts in double categories. As an application, we introduce finite-product double theories, a categorification of finite-product theories that extends recent work by Lambert and the author on cartesian double theories, and we construct the virtual double category of models of a finite-product double theory.

I revisit products in double categories from an unbiased perspective, following up Paré’s idea to use a double-categorical version of the family construction, or free product completion. As an application, I introduce finite-product double theories, refining the cartesian double theories studied previously.
Cartesian double theories: A double-categorical framework for categorical doctrines, 2023. Michael Lambert, Evan Patterson.
Abstract

The categorified theories known as "doctrines" specify a category equipped with extra structure, analogous to how ordinary theories specify a set with extra structure. We introduce a new framework for doctrines based on double category theory. A cartesian double theory is defined to be a small double category with finite products and a model of a cartesian double theory to be a finite product-preserving lax functor out of it. Many familiar categorical structures are models of cartesian double theories, including categories, presheaves, monoidal categories, braided and symmetric monoidal categories, 2-groups, multicategories, and cartesian and cocartesian categories. We show that every cartesian double theory has a unital virtual double category of models, with lax maps between models given by cartesian lax natural transformations, bimodules between models given by cartesian modules, and multicells given by multimodulations. In many cases, the virtual double category of models is representable, hence is a genuine double category. Moreover, when restricted to pseudo maps, every cartesian double theory has a virtual equipment of models, hence an equipment of models in the representable case. Compared with 2-monads, double theories have the advantage of being straightforwardly presentable by generators and relations, as we illustrate through a large number of examples.
BibTeX
```
@article{lambert2024,
  title={Cartesian double theories: A double-categorical framework for categorical doctrines},
  author={Lambert, Michael and Patterson, Evan},
  journal={Advances in Mathematics},
  volume={444},
  pages={109630},
  year={2024},
  doi={10.1016/j.aim.2024.109630},
  eprinttype={arXiv},
  eprint={2310.05384}
}
```
We introduce a new framework for doctrines based on double-categorical functorial semantics, a categorification of Lawvere’s functorial semantics. For a short introduction, see my blog post. Much of the mathematics behind CatColab is derived from this work.

I talked about this project at the Topos Institute Berkeley Seminar (slides, video) and at the Virtual Double Categories Workshop (slides, video).
A categorical representation language and computational system for knowledge-based planning, 2023. Angeline Aguinaldo, Evan Patterson, James Fairbanks, William Regli and Jaime Ruiz.
Abstract

Classical planning representation languages based on first-order logic have been extensively used to model and solve planning problems, but they struggle to capture implicit preconditions and effects that arise in complex planning scenarios. To address this problem, we propose an alternative approach to representing and transforming world states during planning. Based on the category-theoretic concepts of C-sets and double-pushout rewriting (DPO), our proposed representation can effectively handle structured knowledge about world states that support domain abstractions at all levels. It formalizes the semantics of predicates according to a user-provided ontology and preserves the semantics when transitioning between world states. This method provides a formal semantics for using knowledge graphs and relational databases to model world states and updates in planning. In this paper, we compare our category-theoretic representation with the classical planning representation. We show that our proposed representation has advantages over the classical representation in terms of handling implicit preconditions and effects, and provides a more structured framework in which to model and solve planning problems.
BibTeX
```
@inproceedings{2023-robot-task-planning,
  title={A categorical representation language and computational system for knowledge-based planning},
  author={Aguinaldo, Angeline and Patterson, Evan and Fairbanks, James and Regli, William and Ruiz, Jaime},
  booktitle={2023 {AAAI} Fall Symposium on Unifying Representations for Robot Application Development},
  year={2023},
  eprinttype={arXiv},
  eprint={2305.17208}
}
```
Using categorical databases and span rewriting, we propose and implement a representation language for robot task planning, and we compare it with classical representations such as PDDL.

We presented this paper at the 2023 AAAI Fall Symposium on Unifying Representations for Robot Application Development, where it received the Best Paper Award.
Structured and decorated cospans from the viewpoint of double category theory, 2023. Evan Patterson.
Abstract

Structured and decorated cospans are broadly applicable frameworks for building bicategories or double categories of open systems. We streamline and generalize these frameworks using central concepts of double category theory. We show that, under mild hypotheses, double categories of structured cospans are cocartesian (have finite double-categorical coproducts) and are equipments. The proofs are simple as they utilize appropriate double-categorical universal properties. Maps between double categories of structured cospans are studied from the same perspective. We then give a new construction of the double category of decorated cospans using the recently introduced double Grothendieck construction. Besides its conceptual value, this reconstruction leads to a natural generalization of decorated cospans, which we illustrate through an example motivated by statistical theories and other theories of processes.
BibTeX
```
@inproceedings{patterson2023,
  title={Structured and decorated cospans from the viewpoint of double category theory},
  author={Patterson, Evan},
  booktitle={Proceedings of the Sixth International Conference on Applied Category Theory (ACT 2023)},
  year={2023},
  doi={10.4204/EPTCS.397.13},
  eprinttype={arXiv},
  eprint={2304.00447}
}
```
I argue that open systems, as modeled by structured and decorated cospans, should be understood using concepts from double category theory. Structured cospans are shown to form a cocartesian equipment and decorated cospans to arise from the double Grothendieck construction.

I presented this paper at Applied Category Theory 2023 (slides). It is based on a series of blog posts (1, 2, 3).
Modeling model predictive control: A category theoretic framework for multistage control problems, 2023. Tyler Hanks, Baike She, Matthew Hale, Evan Patterson, Matthew Klawonn, James Fairbanks.
Abstract

Model predictive control (MPC) is an optimal control technique which involves solving a sequence of constrained optimization problems across a given time horizon. In this paper we show that the structure of problems in this sequence is elegantly captured via a novel category-theoretic viewpoint. Specifically, we develop a category - called Para(Conv) - whose objects are Euclidean spaces and whose morphisms represent constrained convex optimization problems. We go on to show that convex MPC optimization problems result from compositions of simpler subproblems in Para(Conv), and we also show that Para(Conv) can be used to model both initial and terminal state constraints, which often arise in practice. We present a novel Julia library that leverages our theoretical results to automate the implementation of correct-by-construction MPC problems in software. Specifically we provide code snippets showing the ease with which MPC problems can be implemented and solved using our library.
BibTeX
```
@inproceedings{hanks2024,
  title={Modeling model predictive control: A category theoretic framework for multistage control problems},
  author={Hanks, Tyler and She, Baike and Patterson, Evan and Hale, Matthew and Klawonn, Matthew and Fairbanks, James},
  booktitle={2024 American Control Conference (ACC)},
  pages={4850--4857},
  year={2024},
  organization={IEEE},
  doi={10.23919/ACC60939.2024.10644848},
  eprinttype={arXiv},
  eprint={2305.03820}
}
```
Inspired by Rockafellar’s concept of a convex bifunction, we propose a compositional framework for model predictive control (MPC) and we implement it in the Julia package AlgebraicControl.jl.
A compositional account of motifs, mechanisms, and dynamics in biochemical regulatory networks, 2023. Rebekah Aduddell, James Fairbanks, Amit Kumar, Pablo S. Ocal, Evan Patterson, Brandon T. Shapiro.
Abstract

Regulatory networks depict promoting or inhibiting interactions between molecules in a biochemical system. We introduce a category-theoretic formalism for regulatory networks, using signed graphs to model the networks and signed functors to describe occurrences of one network in another, especially occurrences of network motifs. With this foundation, we establish functorial mappings between regulatory networks and other mathematical models in biochemistry. We construct a functor from reaction networks, modeled as Petri nets with signed links, to regulatory networks, enabling us to precisely define when a reaction network could be a physical mechanism underlying a regulatory network. Turning to quantitative models, we associate a regulatory network with a Lotka-Volterra system of differential equations, defining a functor from the category of signed graphs to a category of parameterized dynamical systems. We extend this result from closed to open systems, demonstrating that Lotka-Volterra dynamics respects not only inclusions and collapsings of regulatory networks, but also the process of building up complex regulatory networks by gluing together simpler pieces. Formally, we use the theory of structured cospans to produce a lax double functor from the double category of open signed graphs to that of open parameterized dynamical systems. Throughout the paper, we ground the categorical formalism in examples inspired by systems biology.
BibTeX
```
@article{aduddell2024,
  title={A compositional account of motifs, mechanisms, and dynamics in biochemical regulatory networks},
  author={Aduddell, Rebekah and Fairbanks, James and Kumar, Amit and Ocal, Pablo S. and Patterson, Evan and Shapiro, Brandon T.},
  journal={Compositionality},
  volume={6},
  number={2},
  year={2024},
  doi={10.32408/compositionality-6-2},
  eprinttype={arXiv},
  eprint={2301.01445}
}
```
Inspired by the compositional philosophy of systems biology, we introduce a category-theoretic formalism for biochemical regulatory networks, using signed graphs to model the networks and signed functors to capture occurrences of motifs in networks. We then give a dynamical interpretation of regulatory networks based on the Lotka-Volterra equations. Most of the paper is now implemented in CatColab.
Compositional modeling with stock and flow diagrams, 2022. John Baez, Xiaoyan Li, Sophie Libkind, Nathaniel Osgood, Evan Patterson.
Abstract

Stock and flow diagrams are widely used in epidemiology to model the dynamics of populations. Although tools already exist for building these diagrams and simulating the systems they describe, we have created a new package called StockFlow, part of the AlgebraicJulia ecosystem, which uses ideas from category theory to overcome notable limitations of existing software. Compositionality is provided by the theory of decorated cospans: stock and flow diagrams can composed to form larger ones in an intuitive way formalized by the operad of undirected wiring diagrams. Our approach also cleanly separates the syntax of stock and flow diagrams from the semantics they can be assigned. We consider semantics in ordinary differential equations, although others are possible. As an example, we explain code in StockFlow that implements a simplified version of a COVID-19 model used in Canada.
BibTeX
```
@inproceedings{2022-stock-flow,
  title={Compositional modeling with stock and flow diagrams},
  author={Baez, John and Li, Xiaoyan and Libkind, Sophie and Osgood, Nathaniel and Patterson, Evan},
  booktitle={Proceedings of the Fifth International Conference on Applied Category Theory (ACT 2022)},
  pages={77--96},
  year={2022},
  doi={10.4204/EPTCS.380.5},
  eprinttype={arXiv},
  eprint={2205.08373}
}
```
Stock and flow diagrams, originating in the field of system dynamics, are also used in epidemiology to specify compartmental models. We give a category-theoretic account of stock and flow diagrams and implement it in the package StockFlow.jl. This paper complements our previous work on Petri nets in epidemiology modeling.

We presented this paper at Applied Category Theory 2022 (slides, video).
A diagrammatic view of differential equations in physics, 2022. Evan Patterson, Andrew Baas, Timothy Hosgood, James Fairbanks.
Abstract

Presenting systems of differential equations in the form of diagrams has become common in certain parts of physics, especially electromagnetism and computational physics. In this work, we aim to put such use of diagrams on a firm mathematical footing, while also systematizing a broadly applicable framework to reason formally about systems of equations and their solutions. Our main mathematical tools are category-theoretic diagrams, which are well known, and morphisms between diagrams, which have been less appreciated. As an application of the diagrammatic framework, we show how complex, multiphysical systems can be modularly constructed from basic physical principles. A wealth of examples, drawn from electromagnetism, transport phenomena, fluid mechanics, and other fields, is included.
BibTeX
```
@article{2022-diagrammatic-diff-eqs,
  title={A diagrammatic view of differential equations in physics},
  author={Patterson, Evan and Baas, Andrew and Hosgood, Timothy and Fairbanks, James},
  journal={Mathematics in Engineering},
  volume={5},
  number={2},
  pages={1-59},
  year={2023},
  doi={10.3934/mine.2023036},
  eprinttype={arXiv},
  eprint={2204.01843}
}
```
We formalize and systematize Tonti diagrams, a graphical language to present partial differential equations in physics. More generally, we explain how to reason formally about systems of equations and their solutions using category-theoretic diagrams. For an overview of this long paper, see our series of blog posts (part 1, part 2). The framework is implemented in the Julia package DiagrammaticEquations.jl.
An algebraic framework for structured epidemic modelling, 2022. Sophie Libkind, Andrew Baas, Micah Halter, Evan Patterson, James Fairbanks.
Abstract

Pandemic management requires that scientists rapidly formulate and analyse epidemiological models in order to forecast the spread of disease and the effects of mitigation strategies. Scientists must modify existing models and create novel ones in light of new biological data and policy changes such as social distancing and vaccination. Traditional scientific modelling workflows detach the structure of a model—its submodels and their interactions—from its implementation in software. Consequently, incorporating local changes to model components may require global edits to the code base through a manual, time-intensive and error-prone process. We propose a compositional modelling framework that uses high-level algebraic structures to capture domain-specific scientific knowledge and bridge the gap between how scientists think about models and the code that implements them. These algebraic structures, grounded in applied category theory, simplify and expedite modelling tasks such as model specification, stratification, analysis and calibration. With their structure made explicit, models also become easier to communicate, criticize and refine in light of stakeholder feedback.
BibTeX
```
@article{2023-structured-epi-modeling,
  title={An algebraic framework for structured epidemic modelling},
  author={Libkind, Sophie and Baas, Andrew and Halter, Micah and Patterson, Evan and Fairbanks, James P.},
  journal={Philosophical Transactions of the Royal Society A},
  volume={380},
  number={2233},
  pages={20210309},
  year={2022},
  doi={10.1098/rsta.2021.0309},
  eprinttype={arXiv},
  eprint={2203.16345}
}
```
We introduce an algebraic framework for building compartmental models in epidemiology, with high-level operations for hierarchically composing and stratifying models. The framework is implemented in the packages AlgebraicPetri.jl and AlgebraicDynamics.jl, and the paper’s examples are reproduced in Jupyter notebooks.
Computational category-theoretic rewriting, 2022. Kristopher Brown, Evan Patterson, Tyler Hanks, James Fairbanks.
Abstract

We demonstrate how category theory provides specifications that can efficiently be implemented via imperative algorithms and apply this to the field of graph transformation. By examples, we show how this paradigm of software development makes it easy to quickly write correct and performant code. We provide a modern implementation of graph rewriting techniques at the level of abstraction of finitely-presented C-sets and clarify the connections between C-sets and the typed graphs supported in existing rewriting software. We emphasize that our open-source library is extensible: by taking new categorical constructions (such as slice categories, structured cospans, and distributed graphs) and relating their limits and colimits to those of their underlying categories, users inherit efficient algorithms for pushout complements and (final) pullback complements. This allows one to perform double-, single-, and sesqui-pushout rewriting over a broad class of data structures. Graph transformation researchers, scientists, and engineers can then use this library to computationally manipulate rewriting systems and apply them to their domains of interest.
BibTeX
```
@article{2023-acset-rewriting,
  title={Computational category-theoretic rewriting},
  author={Brown, Kristopher and Patterson, Evan and Hanks, Tyler and Fairbanks, James},
  journal={Journal of Logical and Algebraic Methods in Programming},
  volume={134},
  year={2023},
  sortyear={2303},
  doi={10.1016/j.jlamp.2023.100888},
  eprinttype={arXiv},
  eprint={2111.03784}
}
```
We explain how to rewrite categorical data structures using span rewriting techniques, including double pushout (DPO), single pushout (SPO), and sesqui-pushout rewriting. We implement these methods in the Julia package AlgebraicRewriting.jl.

We presented the conference version of this paper at at 2022 International Conference on Graph Transformation.
Categorical data structures for technical computing, 2021. Evan Patterson, Owen Lynch, James Fairbanks.
Abstract

Many mathematical objects can be represented as functors from finitely-presented categories C to Set. For instance, graphs are functors to Set from the category with two parallel arrows. Such functors are known informally as C-sets. In this paper, we describe and implement an extension of C-sets having data attributes with fixed types, such as graphs with labeled vertices or real-valued edge weights. We call such structures "acsets," short for "attributed C-sets." Derived from previous work on algebraic databases, acsets are a joint generalization of graphs and data frames. They also encompass more elaborate graph-like objects such as wiring diagrams and Petri nets with rate constants. We develop the mathematical theory of acsets and then describe a generic implementation in the Julia programming language, which uses advanced language features to achieve performance comparable with specialized data structures.
BibTeX
```
@article{patterson2022,
  title={Categorical data structures for technical computing},
  author={Patterson, Evan and Lynch, Owen and Fairbanks, James},
  journal={Compositionality},
  volume={4},
  number={5},
  year={2022},
  eprinttype={arXiv},
  eprint={2106.04703},
  doi={10.32408/compositionality-4-5}
}
```
We argue that attributed C-sets, or “acsets” for short, are a fundamental data structure for technical computing. Acsets generalize both data frames and graphs, as well as more elaborate structures such as wiring diagrams and Petri nets. We provide a performant implementation of acsets in the Julia package ACSets.jl, with additional features available in Catlab.jl.
Operadic modeling of dynamical systems: Mathematics and computation, 2021. Sophie Libkind, Andrew Baas, Evan Patterson, James Fairbanks.
Abstract

Dynamical systems are ubiquitous in science and engineering as models of phenomena that evolve over time. Although complex dynamical systems tend to have important modular structure, conventional modeling approaches suppress this structure. Building on recent work in applied category theory, we show how deterministic dynamical systems, discrete and continuous, can be composed in a hierarchical style. In mathematical terms, we reformulate some existing operads of wiring diagrams and introduce new ones, using the general formalism of C-sets (copresheaves). We then establish dynamical systems as algebras of these operads. In a computational vein, we show that Euler's method is functorial for undirected systems, extending a previous result for directed systems. All the ideas in this paper are implemented as practical software using Catlab and the AlgebraicJulia ecosystem, written in the Julia programming language for scientific computing.
BibTeX
```
@inproceedings{2021-algdynamics,
  title={Operadic modeling of dynamical systems: Mathematics and computation},
  author={Libkind, Sophie and Baas, Andrew and Patterson, Evan and Fairbanks, James},
  booktitle={Proceedings of the Fourth International Conference on Applied Category Theory (ACT 2021)},
  pages={192--206},
  year={2021},
  doi={10.4204/EPTCS.372.14},
  eprinttype={arXiv},
  eprint={2105.12282}
}
```
Using the mathematics of operads and operad algebras, we show how to hierarchically compose deterministic dynamical systems, both discrete and continuous. The method is implemented in the Julia package AlgebraicDynamics.jl.

We presented this paper as a distinguished talk at Applied Category Theory 2021 (slides, video).
Wiring diagrams as normal forms for computing in symmetric monoidal categories, 2020. Evan Patterson, David I. Spivak, Dmitry Vagner.
Abstract

Applications of category theory often involve symmetric monoidal categories (SMCs), in which abstract processes or operations can be composed in series and parallel. However, in 2020 there remains a dearth of computational tools for working with SMCs. We present an "unbiased" approach to implementing symmetric monoidal categories, based on an operad of directed, acyclic wiring diagrams. Because the interchange law and other laws of a SMC hold identically in a wiring diagram, no rewrite rules are needed to compare diagrams. We discuss the mathematics of the operad of wiring diagrams, as well as its implementation in the software package Catlab.
BibTeX
```
@inproceedings{2020-wd-normal-form,
  title={Wiring diagrams as normal forms for computing in symmetric monoidal categories},
  author={Patterson, Evan and Spivak, David I. and Vagner, Dmitry},
  booktitle={Proceedings of the Third International Applied Category Theory Conference (ACT 2020)},
  pages={49--64},
  year={2020},
  doi={10.4204/EPTCS.333.4},
  eprinttype={arXiv},
  eprint={2101.12046}
}
```
We introduce a formalism for directed, acyclic wiring diagrams and explain how it leads to a normal form for morphism expressions in a symmetric monoidal category. We also discuss how directed wiring diagrams are implemented in the Julia package Catlab.jl.

We presented this paper at Applied Category Theory 2020 (video).
The algebra and machine representation of statistical models, 2020. Evan Patterson.
Abstract

As the twin movements of open science and open source bring an ever greater share of the scientific process into the digital realm, new opportunities arise for the meta-scientific study of science itself, including of data science and statistics. Future science will likely see machines play an active role in processing, organizing, and perhaps even creating scientific knowledge. To make this possible, large engineering efforts must be undertaken to transform scientific artifacts into useful computational resources, and conceptual advances must be made in the organization of scientific theories, models, experiments, and data. This dissertation takes steps toward digitizing and systematizing two major artifacts of data science, statistical models and data analyses. Using tools from algebra, particularly categorical logic, a precise analogy is drawn between models in statistics and logic, enabling statistical models to be seen as models of theories, in the logical sense. Statistical theories, being algebraic structures, are amenable to machine representation and are equipped with morphisms that formalize the relations between different statistical methods. Turning from mathematics to engineering, a software system for creating machine representations of data analyses, in the form of Python or R programs, is designed and implemented. The representations aim to capture the semantics of data analyses, independent of the programming language and libraries in which they are implemented.
BibTeX
```
@phdthesis{patterson2020,
  title={The algebra and machine representation of statistical models},
  author={Patterson, Evan},
  school={Stanford University, Department of Statistics},
  eprinttype={arXiv},
  eprint={2006.08945},
  year={2020}
}
```
In my PhD thesis, I draw a precise analogy between models in logic and statistics, enabling statistical models to be seen as models of statistical theories, in the logical sense. This is accomplished using tools from algebra, particularly categorical logic. I develop the algebra of statistical theories and models and present a wide range of examples from classical statistics. This work was not previously published. In addition, I recapitulate and extend previous work on the semantic modeling of data science code.

I presented the material about statistical theories and models at the MIT Categories Seminar (slides) and the 2020 Workshop on Categorical Probability and Statistics (slides).
Hausdorff and Wasserstein metrics on graphs and other structured data, 2019. Evan Patterson.
Abstract

Optimal transport is widely used in pure and applied mathematics to find probabilistic solutions to hard combinatorial matching problems. We extend the Wasserstein metric and other elements of optimal transport from the matching of sets to the matching of graphs and other structured data. This structure-preserving form of optimal transport relaxes the usual notion of homomorphism between structures. It applies to graphs, directed and undirected, labeled and unlabeled, and to any other structure that can be realized as a C-set for some finitely presented category C. We construct both Hausdorff-style and Wasserstein-style metrics on C-sets and we show that the latter are convex relaxations of the former. Like the classical Wasserstein metric, the Wasserstein metric on C-sets is the value of a linear program and is therefore efficiently computable.
BibTeX
```
@article{patterson2020,
  title={{Hausdorff} and {Wasserstein} metrics on graphs and other structured data},
  author={Patterson, Evan},
  journal={Information and Inference: A Journal of the {IMA}},
  volume={iaaa025},
  year={2020},
  eprinttype={arXiv},
  eprint={1907.00257},
  doi={10.1093/imaiai/iaaa025}
}
```
I extend ideas from optimal transport to structured data by probabilistically relaxing the notion of a structure-preserving map, or homomorphism. As an application, I construct Wasserstein-style metrics on graphs, simplicial sets, and other combinatorial structures.

I talked about this project at the 2019 AMS Special Session on Applied Category Theory (slides).
String diagrams for assembly planning, 2019. Jade Master, Evan Patterson, Shahin Yousfi, Arquimedes Canedo.
Abstract

Assembly planning is a difficult problem for companies. Many disciplines such as design, planning, scheduling, and manufacturing execution need to be carefully engineered and coordinated to create successful product assembly plans. Recent research in the field of design for assembly has proposed new methodologies to design product structures in such a way that their assembly is easier. However, present assembly planning approaches lack the engineering tool support to capture all the constraints associated to assembly planning in a unified manner. This paper proposes CompositionalPlanning, a string diagram based framework for assembly planning. In the proposed framework, string diagrams and their compositional properties serve as the foundation for an engineering tool where CAD designs interact with planning and scheduling algorithms to automatically create high-quality assembly plans. These assembly plans are then executed in simulation to measure their performance and to visualize their key build characteristics. We demonstrate the versatility of this approach in the LEGO assembly domain. We developed two reference LEGO CAD models that are processed by CompositionalPlanning's algorithmic pipeline. We compare sequential and parallel assembly plans in a Minecraft simulation and show that the time-to-build performance can be optimized by our algorithms.
BibTeX
```
@inproceedings{2019-compositional-planning,
  title={String diagrams for assembly planning},
  author={Master, Jade and Patterson, Evan and Yousfi, Shahin and Canedo, Arquimedes},
  booktitle={International Conference on Theory and Application of Diagrams (Diagrams 2020)},
  pages={167--183},
  year={2020},
  eprinttype={arXiv},
  eprint={1909.10475},
  doi={10.1007/978-3-030-54249-8_14}
}
```
We explore using monoidal categories and string diagrams as a foundation for assembly planning. Working in the simplified domain of LEGO assembly in Minecraft, we design and implement a proof-of-concept system that spans the assembly process, from the object’s CAD description to planning and scheduling to simulation.
Conformalized quantile regression, 2019. Yaniv Romano, Evan Patterson, Emmanuel J. Candès.
Abstract

Conformal prediction is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. Despite this appeal, existing conformal methods can be unnecessarily conservative because they form intervals of constant or weakly varying length across the input space. In this paper we propose a new method that is fully adaptive to heteroscedasticity. It combines conformal prediction with classical quantile regression, inheriting the advantages of both. We establish a theoretical guarantee of valid coverage, supplemented by extensive experiments on popular regression datasets. We compare the efficiency of conformalized quantile regression to other conformal methods, showing that our method tends to produce shorter intervals.
BibTeX
```
@inproceedings{romano2019,
  title={Conformalized quantile regression},
  author={Romano, Yaniv and Patterson, Evan and Cand{\`e}s, Emmanuel J.},
  booktitle={Thirty-third Conference on Neural Information Processing Systems (NeurIPS-19)},
  eprinttype={arXiv},
  eprint={1905.03222},
  year={2019}
}
```
We propose a new method to construct prediction intervals, based on conformal inference. Our philosophy is that if interval estimation is the goal, then one should bypass point estimation and directly estimate the interval endpoints using quantile regression. One can then conformalize the intervals to attain valid, distribution-free coverage in finite samples.
Teaching machines to understand data science code by semantic enrichment of dataflow graphs, 2018. Evan Patterson, Ioana Baldini, Aleksandra Mojsilovic, Kush R. Varshney.
Abstract

Your computer is continuously executing programs, but does it really understand them? Not in any meaningful sense. That burden falls upon human knowledge workers, who are increasingly asked to write and understand code. They deserve to have intelligent tools that reveal the connections between code and its subject matter. Towards this prospect, we develop an AI system that forms semantic representations of computer programs, using techniques from knowledge representation and program analysis. To create the representations, we introduce an algorithm for enriching dataflow graphs with semantic information. The semantic enrichment algorithm is undergirded by a new ontology language for modeling computer programs and a new ontology about data science, written in this language. Throughout the paper, we focus on code written by data scientists and we locate our work within a larger movement towards collaborative, open, and reproducible science.
BibTeX
```
@inproceedings{patterson2018,
  title={Teaching machines to understand data science code by semantic enrichment of dataflow graphs},
  author={Patterson, Evan and Baldini, Ioana and Mojsilovi{\'c}, Aleksandra and Varshney, Kush R},
  booktitle={Proceedings of 2018 KDD Workshop on the Fragile Earth: Theory Guided Data Science to Enhance Scientific Discovery},
  eprinttype={arXiv},
  eprint={1807.05691},
  year={2018}
}
```
Building on previous work, we refine our method for creating semantic representations of data science programs. We implement the method for both Python and R and we release the first version of the Data Science Ontology. The ontology is written in a new ontology language called Monocl. All source code is available on GitHub.

We demoed the system at IJCAI-ECAI 2018 and we presented a shorter paper at the 2018 KDD Fragile Earth Workshop. Later, we presented the project at PyData NYC 2019 (slides).
Knowledge representation in bicategories of relations, 2017. Evan Patterson.
Abstract

We introduce the relational ontology log, or relational olog, a knowledge representation system based on the category of sets and relations. It is inspired by Spivak and Kent's olog, a recent categorical framework for knowledge representation. Relational ologs interpolate between ologs and description logic, the dominant formalism for knowledge representation today. In this paper, we investigate relational ologs both for their own sake and to gain insight into the relationship between the algebraic and logical approaches to knowledge representation. On a practical level, we show by example that relational ologs have a friendly and intuitive—yet fully precise—graphical syntax, derived from the string diagrams of monoidal categories. We explain several other useful features of relational ologs not possessed by most description logics, such as a type system and a rich, flexible notion of instance data. In a more theoretical vein, we draw on categorical logic to show how relational ologs can be translated to and from logical theories in a fragment of first-order logic. Although we make extensive use of categorical language, this paper is designed to be self-contained and has considerable expository content. The only prerequisites are knowledge of first-order logic and the rudiments of category theory.
BibTeX
```
@article{patterson2017,
  title={Knowledge Representation in Bicategories of Relations},
  author={Patterson, Evan},
  eprinttype={arXiv},
  eprint={1706.00526},
  year={2017}
}
```
I explain how to represent relational knowledge in a mathematical structure called a bicategory of relations. I compare this structure to the more familiar systems of first-order logic and description logic. Despite its heavy dose of monoidal category theory, the paper is intended to be readable by non-category-theorists. Many pictures and informal explanations are provided.

I talked about this project at the 2017 CSLI Workshop (slides) and later at the 2017 AMS Special Session on Applied Category Theory (slides).
Dataflow representation of data analyses: Toward a platform for collaborative data science, 2017. Evan Patterson, Robert McBurney, Hollie Schmidt, Ioana Baldini, Aleksandra Mojsilovic, Kush R. Varshney.
Abstract

Data science plays an increasingly important role in solving today's scientific and social challenges. To promote progress toward a cure for multiple sclerosis, the Accelerated Cure Project has created an open repository of biological and survey data on patients with multiple sclerosis. Similar large-scale repositories are being created in other domains. As the open, data-driven model of science proliferates, the research community faces a growing need for a cloud platform for collaborative data science. Such a platform should facilitate collaboration between domain experts and data scientists and possess artificial intelligence capabilities for organizing, recommending, and manipulating data analyses. In this paper, we present some foundational technologies motivated by this vision. Our system automatically extracts a high-level dataflow graph from a data analysis. This graph describes how data flows through an analysis pipeline, including which statistical methods are used and how they fit together. The system requires no special annotations from the data analyst and consumes analyses written in Python using standard tools, such as Scikit-learn and Statsmodels. In this paper, we explain how our system works and how it fits into our larger vision for a collaborative data science platform.
BibTeX
```
@article{patterson2017,
  author={Patterson, Evan and McBurney, Robert and Schmidt, Hollie and Baldini, Ioana and Mojsilovi{\'c}, Aleksandra and Varshney, Kush R.},
  journal={IBM Journal of Research and Development},
  title={Dataflow representation of data analyses: Toward a platform for collaborative data science},
  volume={61},
  number={6},
  pages={9:1-9:13},
  year={2017},
  doi={10.1147/JRD.2017.2736278}
}
```
We argue that a cloud platform for data science, powered by AI, could accelerate scientific discovery by helping data scientists and domain scientists collaborate. Existing data science platforms do not meet this need. We further argue that to enable AI capabilities, we must first create machine-interpretable representations of data analyses. We develop a first version of this technology (superseded by later work).

We presented a shorter version of this paper at the AAAI 2017 Spring Symposium on Artificial Intelligence for the Social Good (slides).

Publications

Abstract

Abstract

Abstract

BibTeX

Abstract

Abstract

BibTeX

Abstract

Abstract

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX

Abstract

BibTeX