Tilerizers
- class geodataset.tilerize.RasterTilerizer(raster_path, output_path, tile_size, tile_overlap, global_aoi=None, aois_config=None, ground_resolution=None, scale_factor=None, output_name_suffix=None, ignore_black_white_alpha_tiles_threshold=0.8, temp_dir='./tmp')[source]
Bases:
BaseDiskRasterTilerizer
A standard tilerizer for Raster data without annotations or labels. The generate_tiles method generates and then saves the tiles to the disk.
- Parameters:
raster_path (str or pathlib.Path) – Path to the raster (.tif, .png…).
output_path (str or pathlib.Path) – Path to parent folder where to save the image tiles.
tile_size (int) – The wanted size of the tiles (tile_size, tile_size).
tile_overlap (float) – The overlap between the tiles (should be 0 <= overlap < 1).
global_aoi (str or pathlib.Path or geopandas.GeoDataFrame, optional) –
Path to the global AOI file, or directly a GeoDataFrame. If provided, only the tiles intersecting this AOI will be kept, even if some tiles are inside one of the aois in aois_config (if AOIFromPackageConfig).
This parameter can be really useful to create a kfold dataset in association with an AOIGeneratorConfig config like this:
- aois_config = AOIGeneratorConfig(aois={
‘zone1’: {‘percentage’: 0.2, ‘position’: 1, ‘actual_name’: f’train{kfold_id}’}, ‘zone2’: {‘percentage’: 0.2, ‘position’: 2, ‘actual_name’: f’train{kfold_id}’}, ‘zone3’: {‘percentage’: 0.2, ‘position’: 3, ‘actual_name’: f’valid{kfold_id}’}, ‘zone4’: {‘percentage’: 0.2, ‘position’: 4, ‘actual_name’: f’train{kfold_id}’}, ‘zone5’: {‘percentage’: 0.2, ‘position’: 5, ‘actual_name’: f’train{kfold_id}’}
}, aoi_type=’band’
)
aois_config (
AOIGeneratorConfig
orAOIFromPackageConfig
or None) – An instance of AOIConfig to use, or None if all tiles should be kept in a DEFAULT_AOI_NAME AOI.ground_resolution (float) – The ground resolution in meter per pixel desired when loading the raster. Only one of ground_resolution and scale_factor can be set at the same time.
scale_factor (float) – Scale factor for rescaling the data (change pixel resolution). Only one of ground_resolution and scale_factor can be set at the same time.
output_name_suffix (str) – Suffix to add to the output file names.
ignore_black_white_alpha_tiles_threshold (float) – Threshold ratio of black, white or transparent pixels in a tile to skip it. Default is 0.8.
temp_dir (str or pathlib.Path) – Temporary directory to store the resampled Raster, if it is too big to fit in memory.
- class geodataset.tilerize.RasterTilerizerGDF(raster_path, tile_size, tile_overlap, aois_config=None, ground_resolution=None, scale_factor=None, output_name_suffix=None, ignore_black_white_alpha_tiles_threshold=0.8, temp_dir='./tmp')[source]
Bases:
BaseRasterTilerizer
A standard tilerizer for Raster data without annotations or labels. The generate_tiles_gdf method returns tiles extents as a GeoDataFrame.
- Parameters:
raster_path (str or pathlib.Path) – Path to the raster (.tif, .png…).
tile_size (int) – The size of the tiles in pixels (tile_size, tile_size).
tile_overlap (float) – The overlap between the tiles (0 <= overlap < 1).
aois_config (
AOIGeneratorConfig
orAOIFromPackageConfig
or None) – An instance of AOIConfig to use, or None if all tiles should be kept in an DEFAULT_AOI_NAME AOI.ground_resolution (float, optional) – The ground resolution in meter per pixel desired when loading the raster. Only one of ground_resolution and scale_factor can be set at the same time.
scale_factor (float, optional) – Scale factor for rescaling the data (change pixel resolution). Only one of ground_resolution and scale_factor can be set at the same time.
output_name_suffix (str, optional) – Suffix to add to the output file names.
ignore_black_white_alpha_tiles_threshold (float, optional) – Threshold ratio of black, white or transparent pixels in a tile to skip it. Default is 0.8.
temp_dir (str or pathlib.Path) – Temporary directory to store the resampled Raster, if it is too big to fit in memory.
- class geodataset.tilerize.LabeledRasterTilerizer(raster_path, labels_path, output_path, tile_size, tile_overlap, labels_gdf=None, global_aoi=None, aois_config=None, ground_resolution=None, scale_factor=None, output_name_suffix=None, ignore_black_white_alpha_tiles_threshold=0.8, use_rle_for_labels=True, min_intersection_ratio=0.9, ignore_tiles_without_labels=False, geopackage_layer_name=None, main_label_category_column_name=None, other_labels_attributes_column_names=None, coco_n_workers=5, coco_categories_list=None, temp_dir='./tmp')[source]
Bases:
BaseDiskRasterTilerizer
This class is used to create image tiles from a raster and their associated labels from a .geojson, .gpkg or .csv file. COCO json files are generated for each AOI (or for the DEFAULT_AOI_NAME AOI).
- Parameters:
raster_path (str or pathlib.Path) – Path to the raster (.tif, .png…).
labels_path (str or pathlib.Path or None) – Path to the labels. Supported formats are: .gpkg, .geojson, .shp, .xml, .csv.
output_path (str or pathlib.Path) – Path to parent folder where to save the image tiles and associated labels.
tile_size (int) – The size of the tiles in pixels (tile_size, tile_size).
tile_overlap (float) – The overlap between the tiles (0 <= overlap < 1).
labels_gdf (geopandas.GeoDataFrame, optional) – A GeoDataFrame containing the labels. If provided, labels_path must be None.
global_aoi (str or pathlib.Path or geopandas.GeoDataFrame, optional) –
Path to the global AOI file, or directly a GeoDataFrame. If provided, only the tiles intersecting this AOI will be kept, even if some tiles are inside one of the aois in aois_config (if AOIFromPackageConfig).
This parameter can be really useful to create a kfold dataset in association with an AOIGeneratorConfig config like this:
- aois_config = AOIGeneratorConfig(aois={
‘zone1’: {‘percentage’: 0.2, ‘position’: 1, ‘actual_name’: f’train{kfold_id}’}, ‘zone2’: {‘percentage’: 0.2, ‘position’: 2, ‘actual_name’: f’train{kfold_id}’}, ‘zone3’: {‘percentage’: 0.2, ‘position’: 3, ‘actual_name’: f’valid{kfold_id}’}, ‘zone4’: {‘percentage’: 0.2, ‘position’: 4, ‘actual_name’: f’train{kfold_id}’}, ‘zone5’: {‘percentage’: 0.2, ‘position’: 5, ‘actual_name’: f’train{kfold_id}’}
}, aoi_type=’band’
)
aois_config (
AOIGeneratorConfig
orAOIFromPackageConfig
or None) – An instance of AOIConfig to use, or None if all tiles should be kept in a DEFAULT_AOI_NAME AOI.ground_resolution (float, optional) – The ground resolution in meter per pixel desired when loading the raster. Only one of ground_resolution and scale_factor can be set at the same time.
scale_factor (float, optional) – Scale factor for rescaling the data (change pixel resolution). Only one of ground_resolution and scale_factor can be set at the same time.
output_name_suffix (str, optional) – Suffix to add to the output file names.
ignore_black_white_alpha_tiles_threshold (float, optional) – Threshold ratio of black, white or transparent pixels in a tile to skip it. Default is 0.8.
use_rle_for_labels (bool, optional) – Whether to use RLE encoding for the labels. If False, the labels will be saved as polygons.
min_intersection_ratio (float, optional) – When finding the associated polygon labels to a tile, this ratio will specify the minimal required intersection ratio (intersecting_polygon_area / polygon_area) between a candidate polygon and the tile in order to keep this polygon as a label for that tile.
ignore_tiles_without_labels (bool, optional) – Whether to ignore (skip) tiles that don’t have any associated labels.
geopackage_layer_name (str, optional) – The name of the layer in the geopackage file to use as labels. Only used if the labels_path is a .gpkg, .geojson or .shp file. Only useful when the labels geopackage file contains multiple layers.
main_label_category_column_name (str, optional) – The name of the column in the labels file that contains the main category of the labels.
other_labels_attributes_column_names (list of str, optional) – The names of the columns in the labels file that contains other attributes of the labels, which should be kept as a dictionary in the COCO annotations data.
coco_n_workers (int, optional) – Number of workers to use when generating the COCO dataset. Useful when use_rle_for_labels=True as it is quite slow.
coco_categories_list (list of dict, optional) –
A list of category dictionaries in COCO format.
If provided, category ids for the annotations in the final COCO file will be determined by matching the category name (defined by ‘main_label_category_column_name’ parameter) of each polygon with the categories names in coco_categories_list.
If a polygon has a category that is not in this list, its category_id will be set to None in its COCO annotation.
If ‘main_label_category_column_name’ is not provided, but ‘coco_categories_list’ is a single coco category dictionary, then it will be used for all annotations automatically.
If ‘coco_categories_list’ is None, the categories ids will be automatically generated from the unique categories found in the ‘main_label_category_column_name’ column.
To assign a category_id to a polygon, the code will check the ‘name’ and ‘other_names’ fields of the categories.
IMPORTANT: It is strongly advised to provide this list if you want to have consistent category ids across multiple COCO datasets.
Exemple of 2 categories, one being the parent of the other:
[{ "id": 1, "name": "Pinaceae", "other_names": [], "supercategory": null }, { "id": 2, "name": "Picea", "other_names": ["PIGL", "PIMA", "PIRU"], "supercategory": 1 }]
temp_dir (str or pathlib.Path) – Temporary directory to store the resampled Raster, if it is too big to fit in memory.
- generate_additional_coco_dataset(labels_gdf, aoi_name_mapping, geopackage_layer_name=None, main_label_category_column_name=None, other_labels_attributes_column_names=None)[source]
Useful when you want to create a second dataset from another set of labels or predictions, while using the exact same tiles as before, without having to generate+save them another time. A mapping from original aoi names to new aoi names must be provided to avoid overwriting previous COCO datasets. Example: {‘groundtruth’: ‘infer’} could be used if you want to first generate a ground truth COCO dataset and associated tiles using the generate_coco_dataset method, and then generate inference COCO for the same tiles with this generate_additional_coco_dataset method, in order to run some evaluation script afterward.