Data provenance
The provenance of a dataset is a record, preferably in machine-readable form, of how the data was produced and transformed. See also scientific workflow management and provenance in knowledge representation.
Projects
- VisTrails
- Node-based graphical interface ala SPSS Modeler or LabView
- Main use case is visualization but includes nodes for sklearn
- Emphasis on provenance: tracking of workflow over execution and through time
- CodaLab (GitHub )
- By Percy Liang and his students and collaborators
- Two elements: worksheets and competitions
- Worksheets are overlays on immutable execution graph
- Apache projects
Literature
- Simmhan, Plale, Gannon, 2005: A survey of data provenance in e-science (doi, tech report )