Dataset
- class geodataset.dataset.BaseDataset[source]
Bases:
ABC
Abstract class for a dataset. Requires the implementation of methods needed for use with PyTorch’s DataLoader: __getitem__, __len__ and __iter__.
- SUPPORTED_IMG_EXTENSIONS = ['tif', 'png', 'jpg', 'jpeg']
- class geodataset.dataset.UnlabeledRasterDataset(root_path, transform=None, fold=None)[source]
Bases:
BaseDataset
A dataset class for loading unlabeled raster tiles. It will recursively search for all ‘.tif’ files in the specified root and its sub-folders.
It can directly be used with a torch.utils.data.DataLoader.
- Parameters:
fold (str) – The dataset fold to load (e.g., ‘train’, ‘valid’, ‘test’…). This parameter is not used in this class, but is kept for consistency with the other dataset classes.
root_path (str or List[str] or pathlib.Path or List[pathlib.Path]) – The root directory of the dataset.
transform (albumentations.core.composition.Compose) – A composition of transformations to apply to the tiles and their associated annotations (applied in __getitem__).
- __getitem__(idx)[source]
Retrieves a tile and its annotations by index, applying the transform passed to the constructor of the class, if any. It also normalizes the tile data between 0 and 1.
- Parameters:
idx (int) – The index of the tile to retrieve
- Return type:
Union
[ndarray
,Tuple
[ndarray
,Any
]]- Returns:
numpy.ndarray – The transformed tile (image) data, normalized between 0 and 1.
If include_polygon_id is True, returns a tuple of (transformed_image, polygon_id).
- class geodataset.dataset.BaseLabeledRasterCocoDataset(fold, root_path, transform=None, other_attributes_names_to_pass=None)[source]
Bases:
BaseDataset
,ABC
Abstract class for a dataset that loads COCO datasets and their associated tiles (images). It will recursively search for COCO json files and image tiles (.tif, .png…) in the specified root folder and its sub-folders. The COCO json files should follow the naming convention defined in the
utils.CocoNameConvention
class. COCO jsons generated by this library should automatically follow this convention.This class implements the __len__ and __iter__ methods, and requires the implementation of the __getitem__ method. It is a great starting point to create your own custom dataset, if the ones provided in this library (ex:
DetectionLabeledRasterCocoDataset
,SegmentationLabeledRasterCocoDataset
, …) do not fit your needs.- Parameters:
fold (str) – The dataset fold to load (e.g., ‘train’, ‘valid’, ‘test’…).
root_path (str or List[str] or pathlib.Path or List[pathlib.Path]) – The root directory of the dataset.
transform (albumentations.core.composition.Compose) – A composition of transformations to apply to the tiles.
other_attributes_names_to_pass (List[str]) –
- A list of the names of some other COCO annotations attributes to return when iterating over the dataset
(like a global_id, confidence_score…).
- class geodataset.dataset.DetectionLabeledRasterCocoDataset(fold, root_path, transform=None, box_padding_percentage=0.0, force_binary_class=None, other_attributes_names_to_pass=None)[source]
Bases:
BaseLabeledRasterCocoDataset
A dataset class that loads COCO datasets and their associated tiles (images). It will recursively search for COCO json files and .tif tiles in the specified root folder and its sub-folders. The COCO json files should follow the naming convention defined in the
CocoNameConvention
class. COCO jsons generated by this library should automatically follow this convention.Can be used for object detection tasks, where the annotations are bounding boxes OR segmentations (in this case this class will only use the bounding box of the segmentation).
It can directly be used with a torch.utils.data.DataLoader.
- Parameters:
fold (str) – The dataset fold to load (e.g., ‘train’, ‘valid’, ‘test’…).
root_path (str or List[str] or pathlib.Path or List[pathlib.Path]) – The root directory of the dataset.
transform (albumentations.core.composition.Compose) – A composition of transformations to apply to the tiles and their associated annotations (applied in __getitem__).
other_attributes_names_to_pass (List[str]) –
- A list of the names of some other COCO annotations attributes to return when iterating over the dataset
(like a global_id, confidence_score…).
- __getitem__(idx)[source]
Retrieves a tile and its annotations by index, applying the transform passed to the constructor of the class, if any. It also normalizes the tile data between 0 and 1.
- Parameters:
idx (int) – The index of the tile to retrieve
- Returns:
The transformed tile (image) data, normalized between 0 and 1, and a dictionary containing the annotations and metadata of the tile. The dictionary has the following keys:
boxes (list of numpy.ndarray): A list of bounding boxes for the annotations.
labels (numpy.ndarray): An array of category ids for the annotations (same length as ‘boxes’).
area (list of float): A list of areas for the bounding boxes annotations (same length as ‘boxes’).
iscrowd (numpy.ndarray): An array of zeros (same length as ‘boxes’). Currently not implemented.
image_id (numpy.ndarray): A single-value array containing the index of the tile.
- Return type:
tuple of (numpy.ndarray, dict)
- class geodataset.dataset.SegmentationLabeledRasterCocoDataset(fold, root_path, transform=None, force_binary_class=None, other_attributes_names_to_pass=None)[source]
Bases:
BaseLabeledRasterCocoDataset
A dataset class that loads COCO datasets and their associated tiles (images). It will recursively search for COCO json files and .tif tiles in the specified root folder and its sub-folders. The COCO json files should follow the naming convention defined in the
CocoNameConvention
class. COCO jsons generated by this library should automatically follow this convention.Can be used for semantic segmentation tasks, where the annotations are segmentations.
It can directly be used with a torch.utils.data.DataLoader.
- Parameters:
fold (str) – The dataset fold to load (e.g., ‘train’, ‘valid’, ‘test’…).
root_path (str or List[str] or pathlib.Path or List[pathlib.Path]) – The root directory of the dataset.
transform (albumentations.core.composition.Compose) – A composition of transformations to apply to the tiles and their associated annotations (applied in __getitem__).
other_attributes_names_to_pass (List[str]) –
- A list of the names of some other COCO annotations attributes to return when iterating over the dataset
(like a global_id, confidence_score…).
- __getitem__(idx)[source]
Retrieves a tile and its annotations by index, applying the transform passed to the constructor of the class, if any. It also normalizes the tile data between 0 and 1.
- Parameters:
idx (int) – The index of the tile to retrieve
- Returns:
The transformed tile (image) data, normalized between 0 and 1, and a dictionary containing the annotations and metadata of the tile. The dictionary has the following keys:
masks (list of numpy.ndarray): A list of segmentation masks for the annotations.
labels (numpy.ndarray): An array of category ids for the annotations (same length as ‘masks’).
area (list of float): A list of areas for the segmentation masks annotations (same length as ‘masks’).
iscrowd (numpy.ndarray): An array of zeros (same length as ‘masks’). Currently not implemented.
image_id (numpy.ndarray): A single-value array containing the index of the tile.
- Return type:
tuple of (numpy.ndarray, dict)