Utils
- geodataset.utils.read_raster(path, ground_resolution=None, scale_factor=None, temp_dir='./tmp')[source]
Open a raster file and return a view (WarpedVRT) that applies the given scaling.
- Parameters:
path (Path) – The path to the raster file.
ground_resolution (float, optional) – The desired ground resolution in meters.
scale_factor (float, optional) – The scale factor to apply to the raster.
- Returns:
vrt (WarpedVRT) – A virtual dataset that you can use to read windows on the fly.
profile (dict) – An updated profile of the raster reflecting the scaling.
x_scale_factor, y_scale_factor (float) – The scale factors applied in the x and y directions.
- geodataset.utils.polygon_to_mask(polygon, array_height, array_width)[source]
Encodes a Polygon or MultiPolygon object into a binary mask.
- Parameters:
polygon (Polygon or MultiPolygon) – The polygon to encode.
array_height (int) – The height of the array to encode the polygon into.
array_width (int) – The width of the array to encode the polygon into.
- Returns:
A binary mask of the polygon.
- Return type:
np.ndarray
- geodataset.utils.mask_to_polygon(mask, simplify_tolerance=1.0, min_contour_points=3, remove_rings=False, remove_small_geoms=10)[source]
Converts a binary mask to simplified shapely Polygon(s).
- Parameters:
mask (np.ndarray) – The mask to convert, in HW format
simplify_tolerance (float) – The tolerance for simplifying polygons
min_contour_points (int) – Minimum number of points required for a valid contour
remove_rings (bool) – Whether to remove inner rings (holes) from the polygons
remove_small_geoms (int or None) – Remove small geoms with less than this area from the MultiPolygon
- Returns:
Simplified polygon(s) representing the mask
- Return type:
Union[Polygon, MultiPolygon]
- geodataset.utils.polygon_to_coco_coordinates_segmentation(polygon)[source]
Encodes a polygon into a list of coordinates supported by COCO.
- Parameters:
polygon (shapely.Polygon or shapely.MultiPolygon) – The polygon to encode.
- Returns:
A list of coordinates in the format expected by COCO.
- Return type:
list
- geodataset.utils.coco_coordinates_segmentation_to_bbox(segmentation)[source]
Calculates the bounding box from a polygon list of coordinates in COCO format.
- Parameters:
segmentation (list) – A list of coordinates in the format expected by COCO.
- Returns:
A shapely box representing the bounding box of the polygon.
- Return type:
shapely.box
- geodataset.utils.coco_coordinates_segmentation_to_polygon(segmentation)[source]
Converts a list of polygon coordinates in COCO format to a shapely Polygon or MultiPolygon.
- Parameters:
segmentation (list) – A list of coordinates in the format expected by COCO.
- Returns:
A shapely Polygon object representing the outer boundary of the polygon.
- Return type:
Polygon
- geodataset.utils.polygon_to_coco_rle_segmentation(polygon, tile_height, tile_width)[source]
Encodes a Polygon or MultiPolygon object into a COCO annotation RLE mask.
- Parameters:
polygon (Polygon or MultiPolygon) – The polygon to encode.
tile_height (int) – The height of the tile the polygon is in.
tile_width (int) – The width of the tile the polygon is in.
- Returns:
A COCO RLE mask segmentation.
- Return type:
dict
- geodataset.utils.coco_rle_segmentation_to_mask(rle_segmentation)[source]
Decodes a COCO annotation RLE segmentation into a binary mask.
- Parameters:
rle_segmentation (dict) – The RLE segmentation to decode of a Polygon or MultiPolygon.
- Returns:
A binary mask of the segmentation.
- Return type:
np.ndarray
- geodataset.utils.coco_rle_segmentation_to_bbox(rle_segmentation)[source]
Calculates the bounding box from a COCO annotation RLE segmentation.
- Parameters:
rle_segmentation (dict) – The RLE segmentation to decode.
- Returns:
A shapely box representing the bounding box of the segmentation.
- Return type:
shapely.box
- geodataset.utils.coco_rle_segmentation_to_polygon(rle_segmentation, simplify_tolerance=1.0, min_contour_points=3)[source]
Decodes a COCO annotation RLE segmentation into a shapely Polygon or MultiPolygon.
- Parameters:
rle_segmentation (dict) – The RLE segmentation to decode.
simplify_tolerance (float) – The tolerance for simplifying polygons.
min_contour_points (int) – Minimum number of points required for a valid contour.
- Returns:
A shapely Polygon or MultiPolygon representing the segmentation.
- Return type:
Polygon or MultiPolygon
- class geodataset.utils.COCOGenerator(description, tiles_paths, polygons, scores, categories, other_attributes, output_path, use_rle_for_labels, n_workers, coco_categories_list)[source]
Bases:
object
A class to generate a COCO dataset from a list of tiles and their associated polygons. After instantiating the class, the
generate_coco()
method should be used to generate and save the COCO dataset.- Parameters:
description (str) – A description of the COCO dataset.
tiles_paths (List[Path]) – A list of paths to the tiles/images.
polygons (List[List[Polygon]]) – A list of lists of polygons associated with each tile.
scores (List[List[float or None]] or None) – A list of lists of scores associated with each polygon.
categories (List[List[str or int]] or None) – A list of lists of categories (str or int) associated with each polygon.
other_attributes (List[List[Dict]] or None) –
A list of lists of dictionaries of other attributes associated with each polygon. Such a dict could be:
{ 'attribute1': value1, 'attribute2': value2 }
IMPORTANT: the ‘score’ attribute is reserved for the score associated with the polygon.
output_path (Path) – The path to save the COCO dataset JSON file (should have .json extension).
use_rle_for_labels (bool) – Whether to use RLE encoding for the labels or not. If False, the polygon’s exterior coordinates will be used. RLE Encoding takes less space on disk but takes more time to encode.
n_workers (int) – The number of workers to use for parallel processing.
coco_categories_list (List[dict] or None) –
A list of category dictionaries in COCO format.
If provided, category ids for the annotations in the final COCO file will be determined by matching the category name (defined by ‘main_label_category_column_name’ parameter) of each polygon with the categories names in coco_categories_list.
If a polygon has a category that is not in this list, its category_id will be set to None in its COCO annotation.
If ‘main_label_category_column_name’ is not provided, but ‘coco_categories_list’ is a single coco category dictionary, then it will be used for all annotations automatically.
If ‘coco_categories_list’ is None, the categories ids will be automatically generated from the unique categories found in the ‘main_label_category_column_name’ column.
To assign a category_id to a polygon, the code will check the ‘name’ and ‘other_names’ fields of the categories.
IMPORTANT: It is strongly advised to provide this list if you want to have consistent category ids across multiple COCO datasets.
Exemple of 2 categories, one being the parent of the other:
[{ "id": 1, "name": "Pinaceae", "other_names": [], "supercategory": null }, { "id": 2, "name": "Picea", "other_names": ["PIGL", "PIMA", "PIRU"], "supercategory": 1 }]
- classmethod from_gdf(description, gdf, tiles_paths_column, polygons_column, scores_column, categories_column, other_attributes_columns, output_path, use_rle_for_labels, n_workers, coco_categories_list, tiles_paths_order=None)[source]
Instantiate a COCOGenerator from a GeoDataFrame.
- Parameters:
description (str) – A description for the COCO dataset.
gdf (gpd.GeoDataFrame) – A GeoDataFrame containing the annotations. Each row is expected to represent one polygon associated with a tile/image.
tiles_paths_column (str) – The name of the column in the GeoDataFrame that contains the tile/image path.
polygons_column (str) – The name of the column in the GeoDataFrame that contains the polygon geometry.
scores_column (str or None, optional) – The name of the column in the GeoDataFrame that contains the score for the polygon. If None, scores will not be provided.
categories_column (str or None, optional) – The name of the column in the GeoDataFrame that contains the category for the polygon. If None, categories will not be provided.
other_attributes_columns (List[str] or None, optional) – A list of column names in the GeoDataFrame whose values should be included as additional attributes for each polygon. If None, no additional attributes will be provided.
output_path (Path) – The path where the generated COCO JSON file will be saved.
use_rle_for_labels (bool) – Whether to use RLE encoding for the labels or not.
n_workers (int) – The number of workers to use for parallel processing.
coco_categories_list (List[dict] or None, optional) – A list of COCO category dictionaries in COCO format. If provided, category ids for the annotations in the final COCO file will be determined by matching the category name of each polygon with the categories names in coco_categories_list.
tiles_paths_order (List[Path] or None, optional) – The order in which the tiles should be stored in the COCO file. If None, the order will be determined by the order in which the tiles are encountered in the GeoDataFrame. This parameter could be useful if you plan to use the same order for multiple COCO datasets (e.g using pycocotools COCOEval between truth and preds).
- Returns:
An instance of COCOGenerator initialized with data extracted from the GeoDataFrame.
- Return type:
- geodataset.utils.create_coco_folds(train_coco_path, output_dir, num_folds=5, seed=0, predefined_image_folds=None)[source]
Create folds for a COCO dataset by splitting the images randomly or using predefined folds.
- Parameters:
train_coco_path (str or Path) – The path to the train COCO JSON file.
output_dir (str or Path) – The directory where the folds will be saved.
num_folds (int) – The number of folds to create.
seed (int or None) – The random seed for shuffling image IDs if predefined_image_folds is None.
predefined_image_folds (dict or None) – A dictionary mapping image ids to fold IDs. If provided, this overrides random splitting.
- geodataset.utils.coco_to_geopackage(coco_json_path, images_directory, convert_to_crs_coordinates, geopackage_output_path)[source]
Converts a COCO JSON dataset into a GeoDataFrame, then saved if needed as a GeoPackage file.
The resulting GeoDataFrame (or GeoPackage if saved) will have the following columns:
geometry: The polygon geometry
tile_id: The ID of the tile the polygon belongs to
tile_path: The path to the tile image
category_id: The ID of the category of the polygon
category_name: The name of the category of the polygon
any other attributes found in the ‘other_attributes’ field of the COCO JSON annotations
- Parameters:
coco_json_path (str) – The path to the COCO JSON dataset (.json).
images_directory (str) – The directory containing the images associated with the COCO dataset.
convert_to_crs_coordinates (bool) – Whether to convert the polygon pixel coordinates to a common CRS (uses the CRS of the first .tif tile).
geopackage_output_path (str or None) – The path to save the GeoPackage file. If None, the GeoPackage file will not be saved to the disk.
- Returns:
A GeoDataFrame containing the polygons from the COCO dataset
- Return type:
GeoDataFrame
- geodataset.utils.tiles_polygons_gdf_to_crs_gdf(dataframe)[source]
Converts a GeoDataFrame of polygons from multiple tiles to a common CRS. The dataframe passed must have a ‘tile_path’ column containing the path to the tile image, as the function needs to read each tile metadata to get their respective CRS.
- Parameters:
dataframe (GeoDataFrame) – The GeoDataFrame containing the polygons from multiple tiles.
- Returns:
A GeoDataFrame containing the polygons in a common CRS.
- Return type:
GeoDataFrame