Data science platforms
Collaboration and challenge platforms
- Kaggle
- DrivenData
- Data science challenges for social good
- Dream Challenges
- Challenges in systems biology and translation medicine
- Founded and managed by Gustavo Stolovitzky
- OpenML
- Open platform focused on ML, especially supervised learning
- LabBook
- Project at IBM Research Almaden and U. Toronto
- Kandogan et al, 2015: LabBook: Metadata-driven social collaborative data analysis (doi, pdf)
Hosted data science
All the major cloud providers now offer hosted data science services, currently branded as “AI” (AWS ML , Google Cloud AI , MS Azure AI , IBM Watson Studio ). Other offerings in this crowded space include:
- Domino Data Lab
- “System of record” for data science work
- Collaboration and reproducibility (automatic tracking)
- Mostly infrastructure for computing in the cloud (uses Docker)
- Cloudera Data Science Workbench
- Similar in spirit to Domino
- Originally acquired from Sense
- Datazar
- Hosted datasets, and notebooks and console for R and Python
- Some custom visualization tools
- Plotly
- Mostly hosted plots and dashboards
- Claims to be “a GitHub for data scientists”
Computing services
- Agave Platform
- “Science-as-a-service”
- Web API, not application
- H20
- Implementations of common stats/ML algorithms for Hadoop
- Also H20 Flow , a web-based notebook-style environment (video)
- Open source , including Flow
Knowledge services
- Knowen
- Part wiki (organized as a DAG)
- Part collaborative editor with LaTeX support (cf. Overleaf )
- No computational features
- Hypothes.is (HN )
- Open annotation of arbitrary webpages
- UX via browser extensions