Training

We provide a train.py script to train detector and segmenter models on preprocessed datasets. You must download the datasets first (see Data).

Prerequisites

Our training pipeline requires wandb to be installed and configured for logging purposes.

Detector training

To train a detector, copy and modify one of the config files under canopyrs/config/detectors/. For example, start from dino_swinL_multi_NQOS.yaml.

Workflow

Download datasets:

python -m canopyrs.tools.detection.download_datasets -d SelvaBox Detectree2 -o /data

Copy and modify a detector config:
```
cp canopyrs/config/detectors/dino_swinL_multi_NQOS.yaml canopyrs/config/detectors/my_detector.yaml
```
Edit my_detector.yaml and update the training-specific fields marked with TODO:
- data_root_path — path to your dataset root folder
- train_output_path — path for model checkpoints and logs
- train_dataset_names / valid_dataset_names — location folders to use. See Data for more info on data structure
- wandb_project — your wandb project name
Run training:

Linux / macOSWindows (PowerShell)

python train.py \
  -m detector \
  -c canopyrs/config/detectors/my_detector.yaml

python train.py `
  -m detector `
  -c canopyrs/config/detectors/my_detector.yaml

Configuration reference

Model parameters

Parameter	Description
`model`	Model type: `dino_detrex` for detrex-based DINO models or `faster_rcnn_detectron2` for detectron2-based Faster R-CNN models
`architecture`	Model architecture (see supported architectures below)
`checkpoint_path`	Path to pretrained model checkpoint. Keep our pretrained checkpoint to fine-tune, or replace with a detrex COCO checkpoint. If left as `null` for a detectron2 model (Faster R-CNN), it will download a pretrained COCO checkpoint automatically.

Supported architectures

Model type	Architecture
DINO (Swin-L)	`dino-swin/dino_swin_large_384_5scale_36ep.py`
DINO (ResNet-50)	`dino-resnet/dino_r50_4scale_24ep.py`
Faster R-CNN	`COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml`

Data parameters

Parameter	Description
`data_root_path`	Path to your dataset root folder (the `<DATA_ROOT>` folder where extracted datasets are located)
`train_dataset_names`	List of location folder names to train on
`valid_dataset_names`	List of location folder names to validate on
`train_output_path`	Path to output folder for model checkpoints and logs
`wandb_project`	Name of the wandb project to log to

Dataset locations

SelvaBox has three locations: - brazil_zf2 - ecuador_tiputini - panama_aguasalud

Detectree2 has one location: - malaysia_detectree2

You can choose to train on all locations or a subset of them.

Other parameters

You can also modify parameters such as batch_size, lr, and more in the config file.

Segmenter training

To train a segmenter, copy and modify one of the config files under canopyrs/config/segmenters/. For example, start from mask2former_swinL_multi_selvamask.yaml.