Workflow management

Workflow management is a source of pain for most data scientists (HN ).

Projects

  • Kepler
    • Ludascher et al, 2006: Scientific workflow management and the Kepler system (doi, pdf)
    • Altintas et al, 2004: Kepler: an extensible system for design and execution of scientific workflows (doi)
  • Taverna (Apache )
    • Oinn et al, 2004: Taverna: a tool for the composition and enactment of bioinformatics workflows (doi)
  • Galaxy (GitHub )
    • Goecks et al, 2010: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences (doi, pdf)
  • Luigi (GitHub ), by Spotify
  • Airflow (Apache , GitHub ), originally by Airbnb

Literature

Surveys

  • Yu & Buyya, 2006: A taxonomy of workflow management systems for grid computing (doi, pdf)
  • Barker & van Hemert, 2008: Scientific workflow: a survey and research directions (pdf)
  • Liue et al, 2015: A survey of data-intensive scientific workflow management (doi)