AI for insect detection & classification

A simple, visual walkthrough of how AI can spot and identify insects using motion detection and YOLO.

After conducting the AI for Insect Detection & Classification Workshop at The Pennsylvania State University, I felt motivated to share some of the key ideas more widely.

This blog post is designed for readers with little to no coding experience. The goal is to explain how simple motion detection and modern object detection models like YOLO can be used to monitor insects — using visuals and plain language, without diving into heavy math or programming details. Images and videos used in the blog are obtained from the InsectEye system.

I hope you find this exploration both accessible and interesting!

Introduction

Step into a field at dawn, and you’ll hear a quiet symphony of tiny wings.
Some belong to helpful pollinators 🐝, while others are hungry pests 🐛.

Catching these insects early can protect crops, reduce chemical sprays, and help us understand how seasons and climate shape insect life. But how do you do that without draining batteries or overwhelming tiny edge devices?

We use a two-step camera system that’s smart and simple:

  1. Notice insect motion — “Something moved.”
  2. Detect pests with YOLO — “Which insect is it?”

Let’s dive into how motion spotting and fast object detection help keep a watchful eye on the fields — one tiny wingbeat at a time.


Motion detection for insects

Insect GIF
Motion Mask GIF
Fig. 1: Animated GIFs showing the insect video sequence (top) and the corresponding motion masks (bottom).

When a camera sits still, most of the scene stays the same. If a handful of pixels suddenly change, that signals motion. Here’s a simple background subtraction method to pick out moving insects:

  1. Build a background model
    Watch the scene for a few seconds with no insects, then average those frames to create a ‘clean’ background image. Alternatively, use a single snapshot without any bugs.
    Background Image
    Fig. 2: Background model for our video sequence.
  2. Frame difference
    • Subtract the background model from the current frame to get a difference image.
    • Large pixel differences usually mean motion.
    • Threshold the difference image into a black-and-white motion mask (white = moving pixels; black = background).
Motion Detection Mask
Fig. 3: Difference mask obtained by subtracting the current frame from the background and then thresholding.

Motion detection is all around us. Here are a few real-world examples:


Insect detection with YOLO

Motion detection points out where to look, and YOLO tells us what is moving. YOLO is an object detection model that draws boxes around objects and labels them. We use YOLO for insects because:

We now describe the original version of YOLO. Although many improvements have been made across subsequent versions, the core ideas outlined below remain the same.

The quick YOLO tour

  1. Candidate frames
    After motion detection, we extract snapshots that might contain insects.
    Input Image
    Fig. 4: Input image for insect detection, obtained from motion detection.
  2. Resize & grid
    Resize each snapshot to 448×448 pixels and overlay a 7×7 grid.
    Cropped and 7x7 grid overlayed image
    Fig. 5: A 7×7 grid overlaid on the cropped image.
  3. One glance, many predictions
    For each grid cell, YOLO’s neural network predicts several bounding boxes, confidence scores, and class probabilities for different insects.
    YOLO grid cell predictions
    Fig. 6: The green box has a high confidence score and class probability; the red boxes are low confidence.
  4. Keep the best
    Non-Maximum Suppression (NMS) keeps the highest-scoring box and removes overlapping, lower-scoring ones.
    Non Maximum Suppression process
    Fig. 7: NMS removes duplicate boxes, leaving one clear box per object.

    After predicting many boxes, NMS:

    • Keeps the box with the highest confidence score.
    • Removes boxes that overlap too much (measured by Intersection over Union, or IoU).

    This leaves one clean box per insect. In our diagram, the green box remains and the yellow ones are discarded.

  5. Final output
    We get the insect’s class, a confidence score, and the bounding box coordinates.

You might be wondering:

  1. How are the bounding boxes represented?
    Bounding box representation
    Fig. 8: Bounding box representation.
    • Each box is defined by (cx, cy, w, h).
    • (cx, cy) is the box center, normalized within its grid cell.
    • (w, h) is the width and height, relative to the full image.
  2. What if the insect overlaps two grid cells?
    Overlapping grid cells
    Fig. 9: Handling overlapping grid cells.
    • Only the cell (marked as 1) containing the insect’s center (marked by a star) makes the prediction.
    • Neighboring cells ignore this insect and focus on objects whose centers lie inside them.

Conclusion & next steps

Combining motion detection with YOLO creates a nimble tag-team. On a 90,000-frame video, motion detection trimmed the candidates down to just 1,000 frames. This shows why pairing motion and object detection is essential for bio-monitoring systems that run on tight compute and energy budgets.

Next on our roadmap:

Have ideas, questions, or cool bug clips? Drop a comment below — we’d love to chat! 🪲👋