Workflow management is a source of pain for most data scientists (HN ).
Projects
- Kepler
- Ludascher et al, 2006: Scientific workflow management and the Kepler system
(doi, pdf)
- Altintas et al, 2004: Kepler: an extensible system for design and execution
of scientific workflows (doi)
- Taverna (Apache )
- Oinn et al, 2004: Taverna: a tool for the composition and enactment of
bioinformatics workflows (doi)
- Galaxy (GitHub )
- Goecks et al, 2010: Galaxy: a comprehensive approach for supporting
accessible, reproducible, and transparent computational research in the life
sciences (doi, pdf)
- Luigi (GitHub ), by Spotify
- Airflow (Apache , GitHub ), originally by Airbnb
Literature
Surveys
- Yu & Buyya, 2006: A taxonomy of workflow management systems for grid computing
(doi, pdf)
- Barker & van Hemert, 2008: Scientific workflow: a survey and research
directions (pdf)
- Liue et al, 2015: A survey of data-intensive scientific workflow management
(doi)