blog

Introduction to Image Classification using Camera Trap Images


by Michael Schlauch

Introduction to image classification using camera trap images

Camera traps are a tool used by conservationists to study and monitor a wide range of ecologies while limiting human interference. However, they also generate a vast amount of data that quickly exceeds the capacity of humans to sift through. That's where machine learning can help! Advances in computer vision can help automate tasks like species detection and identification, so that humans can spend more time learning from and protecting these ecologies.

This post walks through an initial approach for the Conservision Practice Area challenge on DrivenData, a practice competition where you identify animal species in a real world dataset of wildlife images from Tai National Park in Côte d'Ivoire. This is a practice competition designed to be accessible to participants at all levels. That makes it a great place to dive into the world of data science competitions and computer vision.

camera trap images

We will go through the following steps in order to train a PyTorch model that can be used to identify the species of animal in a given image:

  1. Set up your environment (feel free to skip)
  2. Download the data
  3. Explore the data
  4. Split into train and evaluation sets
  5. Build the Model
  6. Training
  7. Evaluation
  8. Create submission

The only pre-requisite is a basic familiarity with Python and some of the basic concepts behind deep learning. We'll guide you step-by-step through the rest.

Let's get started!

1. Set up your environment

Feel free to skip this step if you already have an environment set up.

The folks on our team typically use conda to manage environments. Once you have conda installed you can create a new "conserviz" environment (name it whatever you like) with:

conda create -n conserviz python=3.8

Then we activate the new environment and install the required libraries with pip. The pip command below includes all the libraries we'll need for this notebook. Launch a jupyter notebook from this new environment.

conda activate conserviz
pip install pandas matplotlib Pillow tqdm scikit-learn torch torchvision

2. Download the data

Download the competition data from the Data Download page. You'll need to first register for the competition by clicking on "Compete" and agreeing to the rules.

The competition.zip file contains everything you need to take part in this competition, including this notebook benchmark.ipynb. Unzip the archive into a location of your choice. The file structure should look like this:

├── benchmark.ipynb
├── submission_format.csv
├── test_features
│   ├── ZJ000000.jpg
│   ├── ZJ000001.jpg
│   └── ...
├── test_features.csv
├── train_features
│   ├── ZJ016488.jpg
│   ├── ZJ016489.jpg
│   └── ...
├── train_features.csv
└── train_labels.csv

Next, let's import some of the usual suspects:

In [1]:
%matplotlib inline

import matplotlib.pyplot as plt
import pandas as pd
from PIL import Image
from tqdm import tqdm

Read in the train and test CSVs first and see what they look like.

In [2]:
train_features = pd.read_csv("train_features.csv", index_col="id")
test_features = pd.read_csv("test_features.csv", index_col="id")
train_labels = pd.read_csv("train_labels.csv", index_col="id")

The features CSVs contain the image ID, filepath and site ID for each image.

In [3]:
train_features.head()
Out[3]:
filepath site
id
ZJ000000 train_features/ZJ000000.jpg S0120
ZJ000001 train_features/ZJ000001.jpg S0069
ZJ000002 train_features/ZJ000002.jpg S0009
ZJ000003 train_features/ZJ000003.jpg S0008
ZJ000004 train_features/ZJ000004.jpg S0036
In [4]:
test_features.head()
Out[4]:
filepath site
id
ZJ016488 test_features/ZJ016488.jpg S0082
ZJ016489 test_features/ZJ016489.jpg S0040
ZJ016490 test_features/ZJ016490.jpg S0040
ZJ016491 test_features/ZJ016491.jpg S0041
ZJ016492 test_features/ZJ016492.jpg S0040

The train_labels CSV is an indicator matrix of the species identified in each of the training images. Some images are labeled as "blank" if no animal was detected.

In [5]:
train_labels.head()
Out[5]:
antelope_duiker bird blank civet_genet hog leopard monkey_prosimian rodent
id
ZJ000000 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
ZJ000001 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
ZJ000002 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
ZJ000003 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
ZJ000004 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

Let's store a sorted list of the labels, so that we can sort the inputs and outputs to our model in a consistent way.

In [6]:
species_labels = sorted(train_labels.columns.unique())
species_labels
Out[6]:
['antelope_duiker',
 'bird',
 'blank',
 'civet_genet',
 'hog',
 'leopard',
 'monkey_prosimian',
 'rodent']

3. Explore the data

Now let's see what some of the actual images look like. The code below iterates through a list of species and selects a single random image from each species to display, along with its image ID and label. You can try changing the random_state variable to display a new set of images.

In [7]:
import matplotlib.image as mpimg

random_state = 42

# we'll create a grid with 8 positions, one for each label (7 species, plus blanks)
fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(20, 20))

# iterate through each species
for species, ax in zip(species_labels, axes.flat):
    # get an image ID for this species
    img_id = (
        train_labels[train_labels.loc[:,species] == 1]
        .sample(1, random_state=random_state)
        .index[0]
    )
    # reads the filepath and returns a numpy array
    img = mpimg.imread(train_features.loc[img_id].filepath)
    # plot etc
    ax.imshow(img)
    ax.set_title(f"{img_id} | {species}")

Can you spot the animals? I'm still not sure where the rodent is. Birds can be tough to spot too.

Let's look at the distribution of species across the training set, first in terms of overall counts and then in percentage terms.

In [8]:
train_labels.sum().sort_values(ascending=False)
Out[8]:
monkey_prosimian    2492.0
antelope_duiker     2474.0
civet_genet         2423.0
leopard             2254.0
blank               2213.0
rodent              2013.0
bird                1641.0
hog                  978.0
dtype: float64
In [9]:
train_labels.sum().divide(train_labels.shape[0]).sort_values(ascending=False)
Out[9]:
monkey_prosimian    0.151140
antelope_duiker     0.150049
civet_genet         0.146955
leopard             0.136705
blank               0.134219
rodent              0.122089
bird                0.099527
hog                 0.059316
dtype: float64

In case you're curious, this distribution is not exactly what we find in the wild. The competition dataset has been curated a little bit to produce a more uniform distribution than we would see in the actual data.

There's a lot more data exploration to do. For example, you might also want to look at the distribution of image dimensions or camera trap sites. But since our primary goal here is to develop a benchmark, let's move on to the modeling!

4. Split into train and evaluation sets

First, we'll need to split the images into train and eval sets. We'll put aside 25% of the data for evaluation and stratify by the target labels to ensure we have similar relative frequencies of each class in the train and eval sets.

For the purposes of this benchmark, we're also going to limit ourselves to a 50% subset of the training data, just so that things run faster. But feel free to adjust frac or remove it entirely if you want to run the training on the full set.

In [10]:
from sklearn.model_selection import train_test_split

frac = 0.5

y = train_labels.sample(frac=frac, random_state=1)
x = train_features.loc[y.index].filepath.to_frame()

# note that we are casting the species labels to an indicator/dummy matrix
x_train, x_eval, y_train, y_eval = train_test_split(
    x, y, stratify=y, test_size=0.25
)

Here's what x_train and y_train look like now:

In [11]:
x_train.head()
Out[11]:
filepath
id
ZJ002477 train_features/ZJ002477.jpg
ZJ012222 train_features/ZJ012222.jpg
ZJ013173 train_features/ZJ013173.jpg
ZJ000959 train_features/ZJ000959.jpg
ZJ008167 train_features/ZJ008167.jpg
In [12]:
y_train.head()
Out[12]:
antelope_duiker bird blank civet_genet hog leopard monkey_prosimian rodent
id
ZJ002477 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
ZJ012222 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
ZJ013173 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
ZJ000959 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
ZJ008167 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
In [13]:
x_train.shape, y_train.shape, x_eval.shape, y_eval.shape
Out[13]:
((6183, 1), (6183, 8), (2061, 1), (2061, 8))

Next, let's validate that our split has resulted in roughly similar relative distributions of species across the train and eval sets (because of how we passed stratify=y above).

In [14]:
split_pcts = pd.DataFrame(
    {
        "train": y_train.idxmax(axis=1).value_counts(normalize=True),
        "eval": y_eval.idxmax(axis=1).value_counts(normalize=True),
    }
)
print("Species percentages by split")
(split_pcts.fillna(0) * 100).astype(int)
Species percentages by split
Out[14]:
train eval
monkey_prosimian 15 15
antelope_duiker 14 14
civet_genet 14 14
blank 13 13
leopard 13 13
rodent 11 11
bird 9 9
hog 5 5

Good, this looks as expected.

5. Build the Model

Now we can start building our model.

The Dataset

First, we'll create an ImagesDataset class that will define how we access our data and any transformations we might want to apply.

This new class will inherit from the PyTorch Dataset class, but we'll also need to define our own __init__, __len__ and __getitem__ special methods:

  • __init__ will instantiate the dataset object with two dataframes: an x_train df containing image IDs and image file paths, and a y_train df containing image IDs and labels. This will run once when we first create the dataset object, e.g. with dataset = ImagesDataset(x_train, y_train).
  • __getitem__ will define how we access a sample from the data. This method gets called whenever we use an indexing operation like dataset[index]. In this case, whenever accessing a particular image sample (for example, to get the first image we'd do dataset[0]) the following will happen:
    • look up the image filepath using the index
    • load the image with PIL.Image
    • apply some transformations (more on this below)
    • return a dictionary containing the image ID, the image itself as a Tensor, and a label (if it exists)
  • __len__ simply returns the size of the dataset, which we do by calling len on the input dataframe.
In [15]:
import torch
from torch.utils.data import Dataset
from torchvision import transforms


class ImagesDataset(Dataset):
    """Reads in an image, transforms pixel values, and serves
    a dictionary containing the image id, image tensors, and label.
    """

    def __init__(self, x_df, y_df=None):
        self.data = x_df
        self.label = y_df
        self.transform = transforms.Compose(
            [
                transforms.Resize((224, 224)),
                transforms.ToTensor(),
                transforms.Normalize(
                    mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)
                ),
            ]
        )

    def __getitem__(self, index):
        image = Image.open(self.data.iloc[index]["filepath"]).convert("RGB")
        image = self.transform(image)
        image_id = self.data.index[index]
        # if we don't have labels (e.g. for test set) just return the image and image id
        if self.label is None:
            sample = {"image_id": image_id, "image": image}
        else:
            label = torch.tensor(self.label.iloc[index].values, 
                                 dtype=torch.float)
            sample = {"image_id": image_id, "image": image, "label": label}
        return sample

    def __len__(self):
        return len(self.data)

Notice that we are also defining a set of transformations, which are defined in the __init__ and called in the __getitem__ special methods. These are applied to each image before returning it. Here's what each of those transformations do and why:

  • transforms.Resize((224, 224)) ResNet50 was trained on images of size 224x224 so we resize to the same dimensions here. See pytorch docs and the ResNet paper.
  • transforms.ToTensor() converts the image to a tensor. Since we are passing in a PIL Image at this point, PyTorch can recognize it as an RGB image and will automatically convert the input values which are in the range [0, 255] to a range of [0, 1]. See more from the PyTorch docs.
  • transforms.Normalize(...) normalizes the image tensors using the mean and standard deviation of ImageNet images. Because this transformation was applied to images when training the ResNet model, we want to do the same here with our images. See more from the PyTorch docs on pretrained models.

The DataLoader

Next, we need to load the dataset into a dataloader. The DataLoader class lets us iterate through our dataset in batches.

In [16]:
from torch.utils.data import DataLoader

train_dataset = ImagesDataset(x_train, y_train)
train_dataloader = DataLoader(train_dataset, batch_size=32)

The data pieces are now largely in place!

6. Training

Now it's time to start building our model and then training it.

We'll use a pretrained ResNet50 model as our backbone. ResNets are one of the more popular networks for image classification tasks. The pretrained model outputs a 2048-dimension embedding, which we will then connect to two more dense layers, with a ReLU and Dropout step in between.

These final layers, defined in model.fc, are the new "head" of our model, and allow us to transform the image embeddings produced by the pretrained "backbone" into the 8-dimensional output required to learn the species classification task we're tackling here. Prior to redefining it below, model.fc would be the final, dense layer connecting the 2048-dimension embedding to a 1000-dimension output (corresponding to the 1000 ImageNet classes that the pretrained model was trained on). We will instead prepare the model for the current task by redefining model.fc to produce an 8-dimensional output corresponding to our 8 species classes (including blanks).

We'll also add a couple more layers in between. The ReLU layer introduces non-linearity into the model head, in effect activating important features and suppressing noise. And the Dropout layer is a commonly used regularization component that randomly drops some nodes from the previous layer's outputs (10% of nodes in this case) during each training step, mitigating our risk of overfitting.

In [17]:
from torch import nn
import torchvision.models as models


model = models.resnet50(pretrained=True)
model.fc = nn.Sequential(
    nn.Linear(2048, 100),  # dense layer takes a 2048-dim input and outputs 100-dim
    nn.ReLU(inplace=True),  # ReLU activation introduces non-linearity
    nn.Dropout(0.1),  # common technique to mitigate overfitting
    nn.Linear(
        100, 8
    ),  # final dense layer outputs 8-dim corresponding to our target classes
)

Define our loss function

Cross entropy loss (or log loss) is a commonly used loss function for multi-class (not multi-label) image classification. We'll use this to compute loss for each training batch and then update our parameters accordingly.

In [18]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

Train the model

We're now ready to train our model!

We'll start simple and just run it for one epoch, but feel free to run it for more num_epochs if you've got the time. We hope to see a decreasing loss as training progresses, which will provide some evidence that the model is learning. Note that we haven't frozen any weights in the pretrained model, a choice which you may want to revisit and we discuss in a little more detail below.

For each epoch we'll iterate through the batches, and for each batch we'll do the following:

  1. Zero out the gradients. PyTorch will sum the gradients from past batches when doing its backward pass, so in order to make sure we are only using the gradients computed for the current batch, we zero out the gradients at the beginning of each batch.
  2. Run the forward pass.
  3. Compute the loss and track it.
  4. Compute our gradients and update our weight parameters.
In [19]:
num_epochs = 1

tracking_loss = {}

for epoch in range(1, num_epochs + 1):
    print(f"Starting epoch {epoch}")

    # iterate through the dataloader batches. tqdm keeps track of progress.
    for batch_n, batch in tqdm(
        enumerate(train_dataloader), total=len(train_dataloader)
    ):

        # 1) zero out the parameter gradients so that gradients from previous batches are not used in this step
        optimizer.zero_grad()

        # 2) run the foward step on this batch of images
        outputs = model(batch["image"])

        # 3) compute the loss
        loss = criterion(outputs, batch["label"])
        # let's keep track of the loss by epoch and batch
        tracking_loss[(epoch, batch_n)] = float(loss)

        # 4) compute our gradients
        loss.backward()
        # update our weights
        optimizer.step()
Starting epoch 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 194/194 [35:46<00:00, 11.06s/it]

Now let's plot the loss by epoch and batch. The x-axis here is a tuple of (epoch, batch).

In [20]:
tracking_loss = pd.Series(tracking_loss)

plt.figure(figsize=(10, 5))
tracking_loss.plot(alpha=0.2, label="loss")
tracking_loss.rolling(center=True, min_periods=1, window=10).mean().plot(
    label="loss (moving avg)"
)
plt.xlabel("(Epoch, Batch)")
plt.ylabel("Loss")
plt.legend(loc=0)
Out[20]:
<matplotlib.legend.Legend at 0x7f8ef4d0e370>

Good news, the loss is going down! This is an encouraging start, especially since we haven't done anything fancy yet.

Save the model

We have the model loaded in memory already, so we don't really need to save the model, but it's often useful to do this so we can use it again later.

Here's how:

In [21]:
torch.save(model, "model.pth")

7. Evaluation

So far, not so bad. We've shown an improvement in the loss on the training set, but that tells us little about how our model will do on new data. Let's reload our saved model and try generating some predictions on the evaluation split we created earlier.

In [22]:
loaded_model = torch.load("model.pth")

We create the eval dataset and dataloader just like we did earlier with the training dataset and dataloader:

In [23]:
eval_dataset = ImagesDataset(x_eval, y_eval)
eval_dataloader = DataLoader(eval_dataset, batch_size=32)

Make predictions

We'll iterate through the eval dataloader in batches, just like we did for training, but this time we aren't going to need to compute gradients or update weights. For each batch, we'll do the following:

  1. Run the forward pass to get the model output or logits
  2. Apply a softmax function to convert the logits into probability space with range[0,1]. During training, the softmax operation was handled internally by nn.CrossEntropyLoss. We aren't computing the loss now because we are just doing evaluation, but we still want the predictions to be in the range[0,1].
  3. Store the results in a dataframe for further analysis
In [24]:
preds_collector = []

# put the model in eval mode so we don't update any parameters
model.eval()

# we aren't updating our weights so no need to calculate gradients
with torch.no_grad():
    for batch in tqdm(eval_dataloader, total=len(eval_dataloader)):
        # 1) run the forward step
        logits = model.forward(batch["image"])
        # 2) apply softmax so that model outputs are in range [0,1]
        preds = nn.functional.softmax(logits, dim=1)
        # 3) store this batch's predictions in df
        # note that PyTorch Tensors need to first be detached from their computational graph before converting to numpy arrays
        preds_df = pd.DataFrame(
            preds.detach().numpy(),
            index=batch["image_id"],
            columns=species_labels,
        )
        preds_collector.append(preds_df)

eval_preds_df = pd.concat(preds_collector)
eval_preds_df
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65/65 [03:59<00:00,  3.68s/it]
Out[24]:
antelope_duiker bird blank civet_genet hog leopard monkey_prosimian rodent
ZJ005376 0.171700 0.192484 0.176443 0.009301 0.030189 0.065699 0.318819 0.035365
ZJ011044 0.001978 0.002307 0.001948 0.000101 0.001681 0.990360 0.000906 0.000719
ZJ005242 0.210068 0.189871 0.121386 0.012617 0.021859 0.031385 0.358011 0.054803
ZJ004518 0.238102 0.253908 0.109069 0.008890 0.023841 0.037896 0.283491 0.044803
ZJ000101 0.283641 0.121174 0.159294 0.024840 0.041801 0.041539 0.260422 0.067289
... ... ... ... ... ... ... ... ...
ZJ011868 0.130162 0.207689 0.163008 0.021061 0.053528 0.120070 0.230920 0.073561
ZJ002183 0.247485 0.180399 0.125990 0.009920 0.018781 0.022285 0.352043 0.043098
ZJ014186 0.069278 0.339674 0.087901 0.004488 0.021389 0.341253 0.110601 0.025416
ZJ011633 0.223148 0.143238 0.156081 0.032510 0.046612 0.049409 0.277509 0.071493
ZJ001374 0.006084 0.005903 0.006707 0.000377 0.004165 0.968445 0.006105 0.002214

2061 rows × 8 columns

Predicted labels distribution

First let's review the species distribution we saw in the training set.

In [25]:
print("True labels (training):")
y_train.idxmax(axis=1).value_counts()
True labels (training):
Out[25]:
monkey_prosimian    973
antelope_duiker     925
civet_genet         896
blank               860
leopard             841
rodent              732
bird                608
hog                 348
dtype: int64

Here's the distribution of our predictions on the eval set.

In [26]:
print("Predicted labels (eval):")
eval_preds_df.idxmax(axis=1).value_counts()
Predicted labels (eval):
Out[26]:
monkey_prosimian    919
civet_genet         403
leopard             329
rodent              132
blank               122
antelope_duiker     106
bird                 48
hog                   2
dtype: int64

The actual evaluation set is more evenly distributed than our predictions, so we already know there is some room for improvement here.

In [27]:
print("True labels (eval):")
y_eval.idxmax(axis=1).value_counts()
True labels (eval):
Out[27]:
monkey_prosimian    325
antelope_duiker     308
civet_genet         298
blank               287
leopard             280
rodent              244
bird                203
hog                 116
dtype: int64

Accuracy

Now let's compute how accurate our model is and compare that against some trivial baseline models. First let's get the labels with the highest score for each image.

In [28]:
eval_predictions = eval_preds_df.idxmax(axis=1)
eval_predictions.head()
Out[28]:
ZJ005376    monkey_prosimian
ZJ011044             leopard
ZJ005242    monkey_prosimian
ZJ004518    monkey_prosimian
ZJ000101     antelope_duiker
dtype: object

Random guessing across 8 classes would yield an accuracy of 12.5% (1/8). But we could construct a slightly better trivial model by always guessing the most common class ("monkey_prosimian" images in this case).

If we were to always guess that an image is monkey_prosimian, we could achieve accuracy of 15.8%.

In [29]:
eval_true = y_eval.idxmax(axis=1)

(eval_true == "monkey_prosimian").sum() / len(eval_predictions)
Out[29]:
0.1576904415332363

Let's see how our model compares. We take the species with the highest score for each image (eval_predictions) and compare that to the true labels.

In [30]:
correct = (eval_predictions == eval_true).sum()
accuracy = correct / len(eval_predictions)
accuracy
Out[30]:
0.49199417758369723

Our accuracy on the evaluation set is about 50%, which is not a bad start for a very simple first pass and one epoch of training.

Let's look at the predictions from another angle.

We can see from the confusion matrix below that our model does reasonably well on some species, but we have plenty of room for improvement on antelopes, birds, hogs and blanks.

In [31]:
from sklearn.metrics import ConfusionMatrixDisplay

fig, ax = plt.subplots(figsize=(10, 10))
cm = ConfusionMatrixDisplay.from_predictions(
    y_eval.idxmax(axis=1),
    eval_preds_df.idxmax(axis=1),
    ax=ax,
    xticks_rotation=90,
    colorbar=True,
)

That's where you come in! What can you do to improve on this benchmark?

Here are some ideas you might want to try:

  • Train on the full training dataset. We've only used 50% of the training data so far.
  • Train for more epochs. We've only done 1 so far.
  • Try another pretrained model. For example, you may have more success with EfficientNet, or another ResNet model with more layers like ResNet152. See what's available from pytorch here. You may also want to review which models are or have been state of the art for image classification tasks, for example on paperswithcode.com. Keep in mind that different models will require different input and output dimensions, so you'll need to update how you construct model above.
  • Experiment with different loss functions.
  • Experiment with different learning rates or learning rate schedulers.
  • Add more layers to the model head (model.fc).
  • You also may want to consider freezing the weights in the backbone model and only training the head (model.fc). If this results in higher accuracy, that suggests the current approach may be overwriting the backbone weights in a problematic way. One approach here would be to train just the model head, and then unfreeze the backbone but train at a lower learning rate.
  • Training will be much faster using GPUs, but you will need to make some small tweaks to the code.
  • As you become more comfortable iterating through different versions of the model, you may want to try out PyTorch Lightning or Lightning Flash, which build upon PyTorch and eliminate a lot of boilerplate code, in addition to providing a more complete research framework for deep learning problems.

8. Create submission

Last but not least, we'll want to participate in the competition and see where we stand on the leaderboard.

To do this we need to create predictions for the competition test set (not the eval set we used above). You don't have labels for these.

We'll create predictions in the same way we did for the eval set, but this time using the test_features we downloaded from the competition website.

In [32]:
test_dataset = ImagesDataset(test_features.filepath.to_frame())
test_dataloader = DataLoader(test_dataset, batch_size=32)
In [33]:
preds_collector = []

# put the model in eval mode so we don't update any parameters
model.eval()

# we aren't updating our weights so no need to calculate gradients
with torch.no_grad():
    for batch in tqdm(test_dataloader, total=len(test_dataloader)):
        # run the forward step
        logits = model.forward(batch["image"])
        # apply softmax so that model outputs are in range [0,1]
        preds = nn.functional.softmax(logits, dim=1)
        # store this batch's predictions in df
        # note that PyTorch Tensors need to first be detached from their computational graph before converting to numpy arrays
        preds_df = pd.DataFrame(
            preds.detach().numpy(),
            index=batch["image_id"],
            columns=species_labels,
        )
        preds_collector.append(preds_df)

submission_df = pd.concat(preds_collector)
submission_df
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 140/140 [09:19<00:00,  4.00s/it]
Out[33]:
antelope_duiker bird blank civet_genet hog leopard monkey_prosimian rodent
ZJ016488 0.040287 0.016456 0.069033 0.639093 0.018142 0.018959 0.015982 0.182047
ZJ016489 0.257348 0.117252 0.138006 0.024684 0.049720 0.057412 0.294213 0.061365
ZJ016490 0.242068 0.075779 0.177219 0.098522 0.061314 0.060584 0.160722 0.123793
ZJ016491 0.009359 0.010228 0.009832 0.000861 0.008720 0.952021 0.005001 0.003978
ZJ016492 0.242449 0.117502 0.119074 0.007304 0.017310 0.022663 0.443947 0.029752
... ... ... ... ... ... ... ... ...
ZJ020947 0.176772 0.180421 0.153451 0.008706 0.040353 0.151517 0.258835 0.029944
ZJ020948 0.270863 0.149436 0.135711 0.009466 0.028289 0.054232 0.311244 0.040758
ZJ020949 0.072901 0.022102 0.097448 0.489528 0.043097 0.028081 0.024680 0.222163
ZJ020950 0.167972 0.235275 0.123283 0.008515 0.028146 0.047482 0.340371 0.048957
ZJ020951 0.002522 0.006116 0.004133 0.000110 0.002457 0.980648 0.003158 0.000856

4464 rows × 8 columns

Let's check a couple things on submission_df before submitting to the platform. We'll want to make sure our submission's index and column labels match the submission format. (The DrivenData platform will do these data integrity checks as well, but it will be quicker to detect problems this way.)

In [34]:
submission_format = pd.read_csv("submission_format.csv", index_col="id")

assert all(submission_df.index == submission_format.index)
assert all(submission_df.columns == submission_format.columns)

Looks like we're ready to submit! Save the dataframe out to a CSV file and then upload it via the Submissions page on the competition website.

In [35]:
submission_df.to_csv("submission_df.csv")

How did we do? We should get a score of ~1.8, though your results may differ slightly due to non-determinism in model training. (For reference, a randomly generated submission yields a score of something like ~2.4.)

benchmark submission

Now it is up to you to improve on this benchmark!

Head over to the competition for data and more background info, or the competition forum if you have any questions. Good luck!