blog

Find all the pangolins


by Allen Downey

Videos from trail cameras are a useful tool for noninvasive observation of wildlife, but if you are studying a rare species, you might have to look at a lot of videos before you find it. Zamba, an AI tool for wildlife research and conservation, can help. Using Zamba's probablistic classifications, you can find videos that contain a particular category of animals very efficiently.

In a previous article, we used Zamba to remove blank videos, that is, videos that don't contain animals. We found that we could remove more than 80% of blanks while losing fewer than 10% of non-blank videos, which is generally good. But as you might expect, we are more likely to miss small, rare animals. If those are the animals you are interested in, removing blank videos might be a bad strategy.

An alternative is a targeted search. For each video in a dataset, Zamba computes the probability that it contains each category of animal (species or group of species). If we are looking for a particular species, we can sort the videos in descending order by the probability they contain the category that contains the target species. If we watch the videos in that order, we can find what we are looking for much more quickly than if we chose videos at random.

Here's how we'll quantify targeted search performance:

  1. To train Zamba's classification model, we collected more than 280,000 videos from researchers working in West, Central, and East Africa.

  2. We used a training set of about 250,000 of those videos to train the model and reserved a holdout set of about 30,000 for testing. The videos in the holdout set are selected on a transect-by-transect basis; that is, videos from each camera location are assigned entirely to the training set or entirely to the holdout set. So the performance of the model on the holdout set should reflect its performance on videos from a location the model has never seen.

  3. We use selected videos from the holdout set to simulate the use of the model for targeted searches.

Let's see how well it works.

Chimpanzees

Suppose you are a wildlife researcher working with the Pan African Programme, which has installed trail cameras at 40 sites in 12 countries in West, Central, and East Africa. One of the goals of this project is to study behavioral diversity across different chimpanzee populations. So it would be useful to find the maximum number of videos showing chimpanzee behavior while minimizing viewing time.

Here's an example showing just the sort of behavior you might be interested in:


In the holdout set we're using for testing, we have 14,980 videos collected by the Pan African Programme. Of those, 719 contain chimpanzees, so the prevalence is about 4.8%. So, if you started watching videos at random, you would expect to find one chimpanzee out of every 20 videos.

On the other hand, if you watched the 10 videos Zamba assigned the highest probability of containing a chimp, you would find that all 10 of them do. And of you watched the top 100 videos, you would find that 99 of them contain chimps. So if the goal is to find a sample of videos with chimpanzees, we can do that very quickly.

But suppose we want to find as many chimp videos as possible. In that case, we can use the probabilities Zamba generates to start with the videos assigned the highest probabilities and work our way down. The following figure shows the number of videos you would find with chimpanzees as a function of the number of videos you watched.

The blue line shows the actual rate of discovery. The gray line on the left shows the maximum possible rate, if you knew with certainty which videos contained chimpanzees; the gray line on the right shows the rate we would expect if we chose videos at random.

Initially, the actual curve is close to the ideal, but as we watch videos with lower probabilities, the discovery rate slows down. So when should we stop looking?

One strategy is to continue until the average probability of the selected videos falls below a threshold. For example, with a threshold of 90%, we would select the first 701 videos. Of those, 601 contain chimpanzees, which is 86% of the videos we selected.

So, after watching 701 videos, which is less than 5% of all videos, we find 601 with chimpanzees, which is 84% of the ones that contain chimpanzees. The circle marker in the figure shows where we reach this point.

Variation across locations

In the previous example, we combined videos from 40 locations, but if you are a researcher studying differences in behavior, you would probably search the videos from each location separately. To simulate that process, we used videos from the holdout set again and selected the locations where we find the largest numbers of videos showing chimpanzees. The following table shows the top-ten locations.

location # videos count prevalence
Bakoun Classified Forest, Guinea 2312 128 5.5%
Ugalla River National Park, Tanzania 1313 71 5.4%
Sapo National Park, Liberia 396 67 16.9%
Korup National Park, Cameroon 625 66 10.6%
Taï National Park, Ivory Coast 10050 63 0.6%
Campo Ma'an National Park, Cameroon 489 61 12.5%
Lopé National Park, Gabon 75 50 66.7%
Conkouati-Douli National Park, Congo Republic 549 38 6.9%
Bwindi Impenetrable National Park, Uganda 1070 35 3.3%
Gashaka-Gumti National Park, Nigeria 889 25 2.8%


In each location, we simulated a targeted search with a range of thresholds. The following figure shows the results for the top-five locations; to make them comparable, we scaled the axes to show the proportion of the videos watched, and chimpanzees found, rather than the number.

On each curve, the circle marker shows the result with a 90% threshold. In the best case, we find 96% of the videos showing chimpanzees after watching only 6% of all videos. In the worst case, we find only 73% of the chimps after watching 13% of the videos. Still, that's more than five times more efficient than searching at random. And in each case this strategy does a good job of finding the "knee" of the curve, where the discovery rate starts to drop off.

Pangolins

Suppose one day, while you are watching videos of chimpanzees, you get a message from a researcher at the The IUCN SSC Pangolin Specialist Group, which leads efforts to save pangolins from poaching and illegal trade. Toward that end, they would like to locate wild pangolins in Africa, and they suggest that your videos could help. In fact, during your search for chimpanzees, you have seen a few pangolins shuffle by. Here's an example:


Of 14,980 videos from the Pan African Programme in the holdout set, 42 show pangolins, according to human viewers. So the prevalence is only 0.3%, which means that if you search at random, you would expect to find one pangolin out of 357 videos. You don't want to watch that many videos, you don't want to send all of your videos to other researchers, and they don't want that many videos, either!

A targeted search can help. If you select videos with the highest probability of containing pangolins, the following figure shows how many you would find:

Initially the discovery rate is close to ideal, but then it drops off. How many you find depends on how many videos you are willing to share.

  • Because pangolins are small, nocturnal, and rare, there are only 15 videos where Zamba assigns the pangolin category a probability greater than 50%. But all 15 of them actually show pangolins -- so that's a good start!

  • After that, the probabilities drop off quickly. Nevertheless, if you select the 100 videos with the highest probabilities, 23 of them show pangolins.

  • And if you select the top 1000 videos, 33 of them show pangolins. That's more than 79% of the pangolin videos after watching less than 7% of all videos.

In this way, automated classification can facilitate collaboration between researchers studying different species. One group's bycatch is another group's treasure!

Summary

Zamba does not just classify videos; it generates probabilistic classifications. In this article, we considered one way to use these probabilities, a targeted search for a particular category of animal. We presented two examples motivated by actual research projects: searches for chimpanzees and pangolins.

Chimpanzees obey something like an 80-10 rule, where we can find 80% of the videos showing chimpanzees after watching only 10% of the videos. But the efficiency of the search depends on the fraction of videos showing the target species, which varies from one location to another.

It also varies from one species to another. In our dataset, pangolins are much less common than chimpanzees; less than 0.3% of videos contain pangolins. Nevertheless, a targeted search finds almost 80% of the pangolins after viewing less than 7% of the videos.

When Zamba assigns a high probability that a video contains a particular species, it is likely to be right. For almost every species in almost every location, the video assigned the highest probability actually contains the target species. And even when it is uncertain, its probablistic classifications make it possible to find the videos we want much more efficiently than we could by chance.


Special thanks to Dr. Mimi Arandjelovic at the Pan African Programme for guidance in the preparation of this article.