Open Source Projects

DrivenData maintains a number of popular open source projects for the data science, machine learning, and software engineering communities. Check them out here!

Cookiecutter Data Science

A logical, reasonably standardized, and flexible project structure for doing and sharing data science work

Since starting DrivenData, we’ve seen a lot of data science in the wild. As the field develops, it’s becoming increasingly important to organize data science work so that it’s easy to reproduce and build upon.

Cookiecutter Data Science is a widely used project template that keeps data scientists organized and on track.

Deon: An Ethics Checklist for Data Scientists

A command line tool that allows you to easily add an ethics checklist to your data science projects

When there's a lot at stake, checklists make sure big questions don't slip through the cracks and tough conversations happen even (especially) in fast-moving environments. The goal of deon is to push that conversation forward and provide concrete, actionable reminders to the developers that have influence over how data science gets done.

One command jumpstarts the conversation all data teams should be having. Explore the checklist here!


Path style access for pandas

Love pathlib.Path? Love pandas? Wish it were easy to use pathlib methods on pandas Series? This package is for you.

Just one import adds a .path accessor to any pandas Series or Index so that you can use all of the methods on a Path object.

Winning Models from DrivenData Competitions

Prize-winning algorithms from DrivenData’s competitions

DrivenData runs machine learning competitions to help non-profits, NGOs, governments, and other social impact organizations use data science in service of humanity. Part of our mission is to enable data scientists and mission-driven organizations to learn from the work done in these competitions. To this end, the code submitted by winners is released under an open source license for others to learn from, use, and adapt.

Check out how ML experts built their winning algorithms!

Project Zamba

Computer vision for wildlife research and conservation

At the end of 2017, data scientists from more than 90 countries around the world drew on more than 300,000 video clips in a competition to build the best machine learning models for identifying wildlife from camera trap footage. Following the competition, the top-performing submission was packaged into an open source software tool and made available for general use by researchers and conservationists.

Zamba is an open-source Python package that identifies 23 animals in video data.

Concept to Clinic

An AI-powered application for early lung cancer detection built for radiologists

In the Concept to Clinic challenge, hundreds of data scientists and engineers from around the world came together to build open source tools to fight the world’s deadliest cancer. The prototype developed during the live challenge period between August 2017 and January 2018 focused on helping clinicians flag, assess, and report concerning nodules from CT scans.

This open-source project is an end-to-end application that allows radiologists to better interact with state-of-the-art AI as part of their diagnostic process.