Examples
========

Area of Interest
~~~~~~~~~~~~~~~~

In order to provide more flexibility when tilerizing a raster, geodataset supports areas of interest (AOI).
2 types of AOIs configs are supported: AOIGeneratorConfig and AOIFromPackageConfig.
These configs can then be passed to the tilerizers to divide the raster into groups of tiles for each AOI.

Specifying an AOI config for a Tilerizer is optional.
If no AOI config is passed to the Tilerizer, all the tiles will be kept in a single 'all' dataset.

.. code-block:: python

    from pathlib import Path
    from geodataset.aoi import AOIFromPackageConfig, AOIGeneratorConfig

    # Automatically creates AOIs based on the percentage of tiles that should be in each AOI.
    # The 'position' value can be None (random) or a unique value between 1 and n_aois, to force the AOIs to specific bands/corners.
    aoi_gen_config = AOIGeneratorConfig(
        aoi_type='band',     # currently supports 'band' and 'corner'
        aois={'train': {'percentage': 0.7, 'position': 2},
              'valid': {'percentage': 0.15, 'position': 1},
              'test': {'percentage': 0.15, 'position': 3}
              }
    )

    # AOIs are provided as polygons in geopackages (.gpkg, .geojson or .shp)
    aoi_gpkg_config = AOIFromPackageConfig(
        aois={'train': 'QGIS_projects/train_aoi.gpkg',
              'valid': 'QGIS_projects/valid_aoi.shp',
              'test': 'Data/raw/quebec_trees_dataset_2021-09-02/inference_zone.gpkg'
              }
    )

    # For AOIGeneratorConfig, you can also specify additional parameters to control the generation of the AOIs,
    # where the 'actual_name' separates a same AOI 'train' into 2 parts, allowing an other aoi 'valid' in the middle.
    # The 'priority_aoi' can be used on a single aoi to force its tiles to be whole and not partially blacked-out because
    # they overlap other aois tiles (this is useful for small aois compared to others, like when "percentage" = 0.01).
    aoi_gen_config = AOIGeneratorConfig(
    aoi_type="band",  # currently supports 'band' and 'corner'
    aois={"train1": {"percentage": 0.495, "position": 1, "actual_name": "train"},
          "valid": {"percentage": 0.01, "position": 2, "priority_aoi": True},
          "train2": {"percentage": 0.495, "position": 3, "actual_name": "train"}},
    )

Unlabeled Raster
~~~~~~~~~~~~~~~~

The class RasterTilerizer can tilerize a raster, without labels. The tiles are then stored in the output_path/tiles.

.. code-block:: python

    from pathlib import Path
    from geodataset.tilerize import RasterTilerizer

    tilerizer = RasterTilerizer(
        raster_path='/Data/raw/wwf_ecuador/RGB Orthomosaics/Carlos Vera Arteaga RGB.tif',
        output_path='/Data/pre_processed/test',
        tile_size=1024,
        tile_overlap=0.5,
        aois_config=aoi_gen_config,
        ground_resolution=0.05,                          # optional, scale_factor must be None if used.
        scale_factor=0.5,                                # optional, ground_resolution must be None if used.
        ignore_black_white_alpha_tiles_threshold=0.8     # optional
    )

    tilerizer.generate_tiles()

The class RasterTilerizerGDF can tilerize a raster, without labels, and return the tiles as boxes in a GeoDataFrame. It does not output anything to the disk.

.. code-block:: python

    from pathlib import Path
    from geodataset.tilerize import RasterTilerizerGDF

    tilerizer = RasterTilerizerGDF(
        raster_path='/Data/raw/wwf_ecuador/RGB Orthomosaics/Carlos Vera Arteaga RGB.tif',
        tile_size=1024,
        tile_overlap=0.5,
        aois_config=aoi_gen_config,
        ground_resolution=0.05,                          # optional, scale_factor must be None if used.
        scale_factor=0.5,                                # optional, ground_resolution must be None if used.
        ignore_black_white_alpha_tiles_threshold=0.8     # optional
    )

    tiles_boxes_gdf = tilerizer.generate_tiles_gdf()

Labeled Raster
~~~~~~~~~~~~~~

The class LabeledRasterTilerizer can tilerize a raster and its labels (.gpkg, .geojson, .shp, .csv and .xml).

.. code-block:: python

    from pathlib import Path
    from geodataset.tilerize import LabeledRasterTilerizer

    tilerizer = LabeledRasterTilerizer(
        raster_path='Data/raw/quebec_trees_dataset_2021-09-02/2021-09-02/zone1/2021-09-02-sbl-z1-rgb-cog.tif',
        labels_path='Data/raw/quebec_trees_dataset_2021-09-02/Z1_polygons.gpkg',
        output_path='Data/pre_processed/test',
        tile_size=1024,
        tile_overlap=0.5,
        labels_gdf=None,                                 # optional (useful if you have the gdf already loaded in memory)
        aois_config=aoi_gpkg_config,                     # can be omitted if no AOI is needed (everything will be in an 'all' dataset)
        ground_resolution=0.05,                          # optional, scale_factor must be None if used.
        scale_factor=0.5,                                # optional, ground_resolution must be None if used.
        use_rle_for_labels=True,                         # optional
        min_intersection_ratio=0.9,                      # optional
        ignore_tiles_without_labels=False,               # optional
        ignore_black_white_alpha_tiles_threshold=0.8,    # optional
        main_label_category_column_name='Label',         # optional
        other_labels_attributes_column_names=None        # optional
    )

    tilerizer.generate_coco_dataset()

Dataset
~~~~~~~

Geodataset provides the DetectionLabeledRasterCocoDataset and SegmentationLabeledRasterCocoDataset classes which given a single or a list of root folder(s), will recursively go into each subdirectory and parse the COCO json files matching a specific 'fold',
and the associated images paths.

There is also a DetectionUnlabeledRasterDataset class which only loads tiles (useful for inference, where we don't have labels, or for pre-training a model in a self-supervised manner).

These classes can then be directly used with a torch Dataloader.

You can also provide an albumentation transform (optional) to the dataset classes to augment the data when training a model.

.. code-block:: python

    from pathlib import Path
    from geodataset.dataset import DetectionLabeledRasterCocoDataset, SegmentationLabeledRasterCocoDataset, UnlabeledRasterDataset
    import albumentations as A

    augment_transform = A.Compose([
                A.HorizontalFlip(),
                A.VerticalFlip(),
            ],
                bbox_params=A.BboxParams(
                    format='pascal_voc',
                    label_fields=['labels'],
                    min_area=0.,
                    min_visibility=0.,
                ))

    # Labeled Detection Dataset
    detection_train_ds = DetectionLabeledRasterCocoDataset(
        root_path=['Data/pre_processed/subset_1',
                   'Data/pre_processed/subset_2'],
        fold="train",
        transform=augment_transform
    )

    # Labeled Segmentation Dataset
    segmentation_valid_ds = SegmentationLabeledRasterCocoDataset(
        root_path='Data/pre_processed/all_datasets',
        fold="valid",
        transform=None
    )

    # Unlabeled Dataset (useful for inference or unsupervised pre-training)
    unlabeled_infer_ds = UnlabeledRasterDataset(
        root_path='Data/pre_processed/inference_data',
        fold="infer",     # assuming the tiles were tilerized using an aoi 'infer' instead of 'train', 'valid'...
        transform=None
    )


Aggregator
~~~~~~~~~~

The Aggregator class can be used to apply Non-Maximum Suppression style algorithms to aggregate bounding box or instance
segmentation predictions from a model from multiple tiles/images, and then save the results in a COCO json file.

For aggregating detection bounding boxes, you should currently use the nms_algorithm='iou' option.
For aggregating instance segmentation polygons, you can use both 'iou' and 'ioa-disambiguate', depending on what you need.


.. code-block:: python

    from shapely.geometry import box, Polygon
    from geodataset.aggregator import Aggregator

    # aggregating detection bounding boxes from a coco file on the disk:
    aggregator = Aggregator.from_coco(
        output_path='your_output_path',
        tiles_folder_path='path_to_folder_containing_tiles',
        coco_json_path='path_to_coco_json_file',
        polygons=[[box(0, 0, 1, 1), box(1, 1, 2, 2)],
                  [box(0, 0, 1, 1), box(1, 1, 2, 2)]],
        scores_names=['detection_score'],
        classes_names=['detection_class'],
        score_threshold=0.3,
        nms_threshold=0.8,
        nms_algorithm='iou'
    )

    # aggregating detection bounding boxes from in-memory polygons:
    aggregator = Aggregator.from_polygons(
        output_path='your_output_path',
        tiles_paths=['tile_1_path', 'tile_2_path'],
        polygons=[[box(0, 0, 1, 1), box(1, 1, 2, 2)],
                  [box(0, 0, 1, 1), box(1, 1, 2, 2)]],
        scores=[[0.9, 0.8],
                [0.7, 0.85]],
        classes=[[1, 2],
                 [2, 1]],
        score_threshold=0.3,
        nms_threshold=0.8,
        nms_algorithm='iou'
    )

    # aggregating instance segmentation polygons from in-memory polygons, with 2 different sets of scores
    # (you can also only use 1 set of scores if you want):
    aggregator = Aggregator.from_polygons(
        output_path='your_output_path',
        tiles_paths=['tile_1_path', 'tile_2_path'],
        polygons=[[Polygon([(0, 0), (1, 0), (0, 1)]), Polygon([(1, 1), (2, 1), (1, 2)])],
                  [Polygon([(2, 2), (3, 2), (2, 3)]), Polygon([(3, 3), (4, 3), (3, 4)])]],
        scores={'detection_score': [[0.9, 0.8],
                                    [0.7, 0.85]],
                'segmentation_score': [[0.6, 0.5],
                                       [0.9, 0.3]]},
        classes=[[1, 2],
                 [2, 1]],
        scores_weights={'detection_score': 2,
                        'segmentation_score': 1},
        score_threshold=0.3,
        nms_threshold=0.8,
        nms_algorithm='ioa-disambiguate',
        best_geom_keep_area_ratio=0.5
    )