Sample Projects

Automating wildlife identification for research and conservation

Partners: Max Planck Institute for Evolutionary Anthropology, Arcus Foundation

Detected wildlife in video footage—automatically and at scale—by running a global algorithm development challenge and developing an open source python package and web application building on the winning solution (Project Zamba).

Approaches include: Deep learning, computer vision, transfer learning, data science competition, crowdsourced data annotations, open source software

Project Zamba blog


Building LLM solutions

Partners: private sector, social sector

Built solutions using LLMs for multiple real-world applications, across a range of tasks including text summarization, semantic search, named entity recognition (NER), and multimodal analysis. This work has spanned research on state-of-the-art transformer models tuned for specific use cases to production ready retrieval-augmented AI applications.

Approaches include: LLMs, transformers, finetuning, prompt-tuning, LLM evaluation, retrieval-augmentation


Identifying crop types using satellite imagery in Yemen

Partners: The World Bank, The Conflict and Environment Observatory

Used satellite imagery to identify crop extent, crop types and climate risks to agriculture in Yemen, informing World Bank development programs in the country after years of civil war.

Approaches include: Deep learning, computer vision, earth observation data


Illuminating mobile money experiences in Tanzania

Partners: IDEO.org

Analyzed millions of mobile money records to uncover patterns in behavior, and then combined these insights with human-centered design to shape new approaches to delivering mobile money to low-income populations in Tanzania.

Approaches include: Human-centered design + data science, exploratory analysis, interactive visualization, rapid prototyping

case study


Tracking attacks on health care in Ukraine

Partners: Insecurity Insight, Physicians for Human Rights

Built a real-time, interactive map to visualize attacks on the Ukrainian health care system since the Russian invasion began in February of 2022. The map will support partner efforts to provide aid, hold aggressors accountable in court, and increase public awareness.

Approaches include: Interactive visualization, open data, geospatial data, production web application

case study Explore the map


Mining chat messages with plant doctors using language models

Partners: CABI Plantwise

Automated recognition of agricultural entities (such as crops, pests, diseases, and chemicals) in WhatsApp and Telegram messages among plant doctors, enabling new ways to surface emerging trends and improve science-based guidance for smallholder farmers.

Approaches include: Natural language processing (NLP), named-entity recognition (NER), fuzzy matching, human-in-the-loop data annotation


Matching students with schools where they are likely to succeed

Partners: Data science company foundation

Used machine learning to match students with higher education programs where they are more likely to get in and graduate based on their unique profile. Focused on serving students from backgrounds traditionally less likely to attend college or apply for more competitive programs.

Approaches include: Recommender systems, predictive modeling, software engineering


Mapping fair trade products from source to shelf

Partners: Fair Trade USA

Visualized the flow of fair trade coffee products from the farms where they are grown to the stores where they are sold, connecting the nodes in supply chain transactions and increasing transparency for customers and auditors.

Approaches include: Interactive dashboarding, GIS analysis, Tableau

case study


Developing performance indicators and repayment models in off-grid solar

Partners: The World Bank, Angaza, GOGLA, Lighting Global

Analyzed repayment behaviors across dozens of pay-as-you-go (PAYG) solar energy companies serving off-grid populations throughout Africa; developed key performance indicators (KPIs) to facilitate standardized measurement and reporting for PAYG portfolios.

Approaches include: Predictive modeling, exploratory analytics, open source software, key performance indicators (KPIs), public-private partnerships

case study


Modeling patient pathways through hospitals

Partners: Haystack Informatics

Mapped out the probabilistic patient journeys through hospitals based on tens of thousands of patient experiences, giving hospitals a better view into the timing of the activities in their departments and how they relate to operational efficiency.

Approaches include: Predictive modeling, activity-based costing, Spark, production web application


Predicting public health risks from restaurant reviews

Partners: Yelp, Harvard University, City of Boston

Flagged public health risks at restaurants by combining Yelp reviews with open city data on past inspections. An algorithmic approach discovers 25% more violations with the same number of inspections.

Approaches include: Machine learning challenge, natural language processing (NLP), open data, alternative data sources

case study blog


Smart auto-tagging of K-12 school spending

Partners: Education Resource Strategies

Built algorithms that put apples-to-apples labels on school budget line items so that districts understand how their spending stacks up and where they can improve, saving months of manual processing each year.

Approaches include: Natural language processing (NLP), machine learning challenge, Excel tooling, ranked prioritization for manual follow-up


Building data tools to fight human trafficking in Nepal

Partners: Love Justice

Aided anti-trafficking efforts at border crossings and airports by combining data across locations and surfacing insights that give interviewers greater intelligence about the right questions to ask and how to direct them.

Approaches include: Data entry user experience design, data repository, GIS analysis, dynamic dashboard


Putting AI into the hands of lung cancer clinicians

Partners: GO2 Foundation for Lung Cancer

Translated advances in machine learning research to practical software for clinical settings, building an open source application through a new kind of data challenge.

Approaches include: Data challenge, deep learning, open source software, computer vision, predictive modeling, computer-aided diagnosis


Driving data education through custom competitions

Partners: Microsoft

Developed online, white-label data science competitions for students to synthesize their learnings and test their skills on applied challenges. Each capstone features a real-world dataset that focuses on an important issue in the social sector.

Approaches include: Private data challenge, regression analysis, predictive modeling, data science education