Motivation
Most image classification research assumes access to large, curated datasets and GPU clusters. This project investigates the opposite scenario: what happens when you have fewer than 100 images, no specialized hardware, and need to distinguish between two visually similar species — Eastern cottontail rabbits and European hares?
Approach
85 images were scraped from the web and split using Stratified K-Fold Cross Validation to ensure balanced evaluation despite the small, uneven dataset. The CNN architecture was kept deliberately simple to avoid overfitting, with dropout regularization as the primary defense.
A systematic search over optimizers (Adam, SGD, SGD+momentum), batch sizes (2, 4, 8), and architecture depth was conducted. Each configuration was evaluated across all folds to produce reliable metrics.
Results
The best configuration — SGD with momentum and a batch size of 2 — achieved:
- Test accuracy: 0.647
- F1 score: 0.575
- Precision: 0.8
- Recall: 0.625
Smaller batch sizes consistently outperformed larger ones, likely because they introduced beneficial noise during gradient updates on such a small dataset.
Key Takeaway
Architecture and regularization can only compensate so much for insufficient data. Despite exploring a wide range of hyperparameter combinations and applying standard small-dataset techniques, accuracy plateaued well below what would be achievable with a larger corpus. The fundamental bottleneck is data quantity — not model capacity or training strategy.