Skip to content

vector

Functions:

  • applyGeopandasMethod

    Applies an arbitrary function to the data frames provided.

  • countFeatures

    Returns the number of features found in the given source and within a

  • createDataFrameFromGeoDataFrame

    Creates a geokit-style dataframe from a geopandas geodataframe.

  • createGeoDataFrame

    Creates a gdf from an Reskit shape pd.DataFrame.

  • createGeoJson

    Convert a set of geometries to a geoJSON object.

  • createVector

    Create a vector on disk from geometries or a DataFrame with 'geom' column.

  • extractAndClipFeatures

    Extracts features from a source and clips them to the boundaries of a given geom.

  • extractAsDataFrame

    Convenience function calling extractFeatures and structuring the output as

  • extractFeature

    Convenience function calling extractFeatures which assumes there is only

  • extractFeatures

    Creates a generator which extract the features contained within the source.

  • filterLayer

    GeoKit internal.

  • listLayers

    Returns the layer names for each layer that is stored in a geopackage.

  • loadVector

    Load a vector dataset from a path to a file on disc.

  • loopFeatures

    Geokit internal.

  • mutateVector

    Process a vector dataset according to an arbitrary function.

  • ogrType

    Tries to determine the corresponding OGR type according to the input.

  • rasterize

    Rasterize a vector datasource onto a raster context.

  • vectorInfo

    Extract general information about a vector source.

applyGeopandasMethod

applyGeopandasMethod(geopandasMethod, *dfs, **kwargs)

Applies an arbitrary function to the data frames provided.

Convenience function to apply geopandas methods to a geokit-style dataframe with 'geom' column with osgeo.ogr.Geometry objects. NOTE: All arguments besides **kwargs must be passed as positional arguments.

geopandasMethod : str, executable Geopandas method to apply to the dataframe, either str-formatted method name or method as a callable function. dfs : pd.DataFrames One or multiple comma-separated pd.DataFrames with 'geom' column with osgeo.ogr.Geometry objects. Will be passed to the geopandas method as positional arguments, starting from the first position. *kwargs Will be passed on to the geopandas function.

Source code in geokit/core/vector.py
def applyGeopandasMethod(geopandasMethod, *dfs, **kwargs):
    """
    Applies an arbitrary function to the data frames provided.

    Convenience function to apply geopandas methods to a geokit-style
    dataframe with 'geom' column with osgeo.ogr.Geometry objects.
    NOTE: All arguments besides **kwargs must be passed as positional
    arguments.

    geopandasMethod : str, executable
        Geopandas method to apply to the dataframe, either str-formatted
        method name or method as a callable function.
    *dfs : pd.DataFrames
        One or multiple comma-separated pd.DataFrames with 'geom' column
        with osgeo.ogr.Geometry objects. Will be passed to the geopandas
        method as positional arguments, starting from the first position.
    **kwargs
        Will be passed on to the geopandas function.
    """
    # load geopandas
    # try:
    #     import geopandas as gpd
    # except:
    #     raise ImportError(
    #         "'geopandas' is required for geokit.vector.createGeoDataFrame() but is not installed in the current environment."
    #     )
    # get the method as callable
    if callable(geopandasMethod):
        # we have a callable function already, just make sure its gpd
        if not geopandasMethod.__module__.split(".")[0] == "geopandas":
            raise AttributeError(f"'{geopandasMethod}' is not a callable method from geopandas!")
    else:
        # not an executable yet, get the executable function via its name
        if not isinstance(geopandasMethod, str):
            # must be a str formatted name
            raise TypeError(f"'geopandasMethod' must be str-formatted geopandas method name if not callable.")
        geopandasMethod = getattr(gpd, geopandasMethod, None)

    # create geodataframes from input geokit dfs
    assert all([isinstance(_df, pd.DataFrame) and "geom" in _df.columns for _df in dfs]), (
        f"positional *dfs arguments must be geokit-style pd.DataFrames with 'geom' column."
    )
    gdfs = [createGeoDataFrame(dfGeokit=_df) for _df in dfs]

    # now apply geopandas method onto gdfs
    gdf = geopandasMethod(*gdfs, **kwargs)
    # now convert the result back to geokit style and return
    df = createDataFrameFromGeoDataFrame(gdf)
    return df

countFeatures

countFeatures(source, geom=None, where=None)

Returns the number of features found in the given source and within a given geometry and/or where-statement.

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector datasource to read from

  • geom

    (ogr.Geometry; optional, default: None ) –

    The geometry to search within * All features are extracted which touch this geometry

  • where

    (str; optional, default: None ) –

    An SQL-like where statement to apply to the source * Feature attribute name do not need quotes * String values should be wrapped in 'single quotes' Example: If the source vector has a string attribute called "ISO" and a integer attribute called "POP", you could use....

    where = "ISO='DEU' AND POP>1000"
    

Returns:

  • int
Source code in geokit/core/vector.py
def countFeatures(source, geom=None, where=None):
    """Returns the number of features found in the given source and within a
    given geometry and/or where-statement.

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector datasource to read from

    geom : ogr.Geometry; optional
        The geometry to search within
        * All features are extracted which touch this geometry

    where : str; optional
        An SQL-like where statement to apply to the source
        * Feature attribute name do not need quotes
        * String values should be wrapped in 'single quotes'
        Example: If the source vector has a string attribute called "ISO" and
                 a integer attribute called "POP", you could use....

            where = "ISO='DEU' AND POP>1000"

    Returns
    -------
    int
    """
    ds = loadVector(source)
    layer = ds.GetLayer()
    filterLayer(layer, geom, where)
    return layer.GetFeatureCount()

createDataFrameFromGeoDataFrame

createDataFrameFromGeoDataFrame(
    gdf: GeoDataFrame,
) -> DataFrame

Creates a geokit-style dataframe from a geopandas geodataframe.

Parameters:

  • gdf

    (DataFrame) –

    geopandas-style pd.DataFrame, need a 'geometry' column.

Returns:

  • DataFrame

    Same as the previous, just as an geokit-style dataframe with 'geom' column with osgeo.ogr.Geometry objects.

Source code in geokit/core/vector.py
def createDataFrameFromGeoDataFrame(gdf: gpd.GeoDataFrame) -> pd.DataFrame:
    """Creates a geokit-style dataframe from a geopandas geodataframe.

    Parameters
    ----------
    gdf : pd.DataFrame
        geopandas-style pd.DataFrame, need a 'geometry' column.

    Returns
    -------
    pd.DataFrame
        Same as the previous, just as an geokit-style dataframe with 'geom'
        column with osgeo.ogr.Geometry objects.
    """
    assert isinstance(gdf, pd.DataFrame)
    assert "geometry" in gdf.columns

    # # import the required external packages - these are not part of the requirements.yml and are possibly not installed
    # try:
    #     import geopandas as gpd
    # except:
    #     raise ImportError(
    #         "'geopandas' is required for geokit.vector.createGeoDataFrame() but is not installed in the current environment."
    #     )

    # get the CRS/SRS of the gdf
    crs_wkt = gdf.crs.to_wkt()
    srs = SRS.loadSRS(crs_wkt)
    # create WKT geometry and then convert to osgeo.ogr.Geometry
    gdf["geom"] = gdf["geometry"].apply(lambda x: GEOM.convertWKT(x.wkt, srs=srs))

    # remove geometry column and return geokit df
    df = gdf.drop(columns="geometry")
    return df

createGeoDataFrame

createGeoDataFrame(dfGeokit: DataFrame) -> GeoDataFrame

Creates a gdf from an Reskit shape pd.DataFrame.

Parameters:

  • dfGeokit

    (DataFrame) –

    Reskit shape pd.DataFrame, need a 'geom' column.

Returns:

  • GeoDataFrame

    Same as the previous, just as an GeodataFrame

Source code in geokit/core/vector.py
def createGeoDataFrame(dfGeokit: pd.DataFrame) -> gpd.GeoDataFrame:
    """Creates a gdf from an Reskit shape pd.DataFrame.

    Parameters
    ----------
    dfGeokit : pd.DataFrame
        Reskit shape pd.DataFrame, need a 'geom' column.

    Returns
    -------
    gpd.GeoDataFrame
        Same as the previous, just as an GeodataFrame
    """
    assert isinstance(dfGeokit, pd.DataFrame)
    assert "geom" in dfGeokit.columns

    # import the required external packages - these are not part of the requirements.yml and are possibly not installed
    try:
        import shapely
    except:
        raise ImportError(
            "'shapely' is required for geokit.vector.createGeoDataFrame() but is not installed in the current environment."
        )

    try:
        import geopandas as gpd
    except:
        raise ImportError(
            "'geopandas' is required for geokit.vector.createGeoDataFrame() but is not installed in the current environment."
        )

    # get values
    values = {}
    for col in dfGeokit.columns:
        # geoms need to be converted to shapely
        if col == "geom":
            values["geometry"] = [shapely.wkt.loads(g.ExportToWkt()) for g in dfGeokit.geom]  # this takes some time :/
        # other values are just stored as they are
        else:
            values[col] = list(dfGeokit[col])

    # get srs as Well known text
    crs = dfGeokit.geom.iloc[0].GetSpatialReference().ExportToWkt()

    # create gdf and set index
    gdf = gpd.GeoDataFrame(values, crs=crs)
    gdf = gdf.set_index(dfGeokit.index, drop=True)

    # over and out!
    return gdf

createGeoJson

createGeoJson(
    geoms, output=None, srs=4326, topo=False, fill=""
)

Convert a set of geometries to a geoJSON object.

Source code in geokit/core/vector.py
def createGeoJson(geoms, output=None, srs=4326, topo=False, fill=""):
    """Convert a set of geometries to a geoJSON object."""
    if srs:
        srs = SRS.loadSRS(srs)

    # arrange geom, index, and data
    if isinstance(geoms, ogr.Geometry):  # geoms is a single geometry
        finalGeoms = [
            geoms,
        ]
        data = None
        index = [
            0,
        ]

    elif isinstance(geoms, pd.Series):
        index = geoms.index
        finalGeoms = geoms.values
        data = None

    elif isinstance(geoms, pd.DataFrame):
        index = geoms.index
        finalGeoms = geoms.geom.values
        data = geoms.loc[:, geoms.columns != "geom"]
        data["_index"] = index
    else:
        finalGeoms = list(geoms)
        data = None
        index = list(range(len(finalGeoms)))

    if len(finalGeoms) == 0:
        raise GeoKitVectorError("Empty geometry list given")

    # Transform?
    if not srs is None:
        finalGeoms = GEOM.transform(finalGeoms, toSRS=srs)

    # Make JSON object
    from io import BytesIO

    if not output is None and not isinstance(output, str):
        if not output.writable():
            raise GeoKitVectorError("Output object is not writable")

        if topo:
            fo = BytesIO()
        else:
            fo = output
    elif isinstance(output, str) and not topo:
        fo = open(output, "wb")
    else:
        fo = BytesIO()

    fo.write(bytes('{"type":"FeatureCollection","features":[', encoding="utf-8"))

    for j, i, g in zip(range(len(index)), index, finalGeoms):
        fo.write(bytes('%s{"type":"Feature",' % ("" if j == 0 else ","), encoding="utf-8"))
        if data is None:
            fo.write(bytes('"properties":{"_index":%s},' % str(i), encoding="utf-8"))
        else:
            fo.write(
                bytes(
                    '"properties":%s,' % data.loc[i].fillna(fill).to_json(),
                    encoding="utf-8",
                )
            )

        fo.write(bytes('"geometry":%s}' % g.ExportToJson(), encoding="utf-8"))
        # fo.write(bytes('"geometry": {"type": "Point","coordinates": [125.6, 10.1] }}', encoding='utf-8'))
    fo.write(bytes("]}", encoding="utf-8"))
    fo.flush()

    # Put in the right format
    if topo:
        from io import TextIOWrapper

        from topojson import conversion

        fo.seek(0)
        topo = conversion.convert(TextIOWrapper(fo), object_name="primary")  # automatically closes fo
        topo = str(topo).replace("'", '"')

    # Done!
    if output is None:
        if topo:
            return topo
        else:
            fo.seek(0)
            geojson = fo.read()
            fo.close()
            return geojson.decode("utf-8")

    elif isinstance(output, str):
        if topo:
            with open(output, "w") as fo:
                fo.write(topo)
        else:
            pass  # we already wrote to the file!
        return output

    else:
        if topo:
            output.write(bytes(topo, encoding="utf-8"))
        else:
            pass  # We already wrote to the file!
        return None

createVector

createVector(
    geoms,
    output: str | None = None,
    srs=None,
    driverName: str = "ESRI Shapefile",
    layerName: str = "default",
    fieldVals=None,
    fieldDef=None,
    checkAllGeoms=False,
    overwrite: bool = True,
)

Create a vector on disk from geometries or a DataFrame with 'geom' column.

Parameters:

  • geoms

    (Geometry or [Geometry] or DataFrane) –

    The geometries to write into the vector file * If a DataFrame is given, it must have a column called 'geom' * All geometries must share the same type (point, line, polygon, etc...) * All geometries must share the same SRS * If geometry SRS differs from the 'srs' input, then all geometries will be projected to the input srs

  • output

    (str; optional, default: None ) –

    A path on disk to create the output vector * If output is None, the vector dataset will be created in memory * Assumed to be of "ESRI Shapefile" format * Will create a number of files with different extensions

  • srs

    (Anything acceptable to geokit.srs.loadSRS(); optional, default: None ) –

    The srs of the vector to create * If not given, the geometries' inherent srs is used * If srs does not match the inherent srs, all geometries will be transformed

  • driverName

    (str; optional, default: 'ESRI Shapefile' ) –

    The name of the driver to use when creating the vector. Currently supported options are: - ESRI Shapefile - GPKG

    For a list of all supported vector drivers by OGR, see: https://gdal.org/drivers/vector/index.html

  • layerName

    (str; optional, default: 'default' ) –

    The name of the layer to create within the vector. (Only applicable for GeoPackages) * If the layer already exists, it will be overwritten

  • fieldVals

    (dict of lists or pandas.DataFrame; optional, default: None ) –

    Explicit attribute values to assign to each geometry * If a dict is given, must be a dictionary of the attribute names and associated lists of the attribute values * If a DataFrame is given, the field names are taken from column names and attribute values from the corresponding column data * The order of each column/list will correspond to the order geometries are written into the dataset * The length of each column/list MUST match the number of geometries * All values in a single column/list must share the same type - Options are int, float, or str

  • fieldDef

    (dict; optional, default: None ) –

    A dictionary specifying the datatype of each attribute when written into the final dataset * Options are defined from ogr.OFT[...] - ex. Integer, Real, String * The ogrType() function can be used to map typical python and numpy types to appropriate ogr types

  • checkAllGeoms

    (bool, default: False ) –

    If True, all geoms will be asserted in object type and exact srs. Else, only the first geom in the geom column/iterable will be checked for performance reasons. By default False.

  • overwrite

    (bool; optional, default: True ) –

    Determines whether the preexisting files should be overwritten * Only used when output is not None

Returns:

  • * If 'output' is None: gdal.Dataset
  • * If 'output' is given: None
Source code in geokit/core/vector.py
def createVector(
    # geoms: ogr.Geometry | list[ogr.Geometry | gdal.Dataset | str] | pd.DataFrame | np.ndarray | str | gdal.Dataset,
    geoms,
    output: str | None = None,
    srs=None,
    driverName: str = "ESRI Shapefile",
    layerName: str = "default",
    fieldVals=None,
    fieldDef=None,
    checkAllGeoms=False,
    overwrite: bool = True,
):
    """
    Create a vector on disk from geometries or a DataFrame with 'geom' column.

    Parameters
    ----------
    geoms : ogr.Geometry or [ogr.Geometry, ] or pandas.DataFrane
        The geometries to write into the vector file
        * If a DataFrame is given, it must have a column called 'geom'
        * All geometries must share the same type (point, line, polygon, etc...)
        * All geometries must share the same SRS
        * If geometry SRS differs from the 'srs' input, then all geometries will
          be projected to the input srs

    output : str; optional
        A path on disk to create the output vector
        * If output is None, the vector dataset will be created in memory
        * Assumed to be of "ESRI Shapefile" format
        * Will create a number of files with different extensions

    srs : Anything acceptable to geokit.srs.loadSRS(); optional
        The srs of the vector to create
          * If not given, the geometries' inherent srs is used
          * If srs does not match the inherent srs, all geometries will be
            transformed

    driverName : str; optional
        The name of the driver to use when creating the vector.
        Currently supported options are:
            - ESRI Shapefile
            - GPKG

        For a list of all supported vector drivers by OGR, see: https://gdal.org/drivers/vector/index.html

    layerName : str; optional
        The name of the layer to create within the vector. (Only applicable for GeoPackages)
        * If the layer already exists, it will be overwritten

    fieldVals : dict of lists or pandas.DataFrame; optional
        Explicit attribute values to assign to each geometry
        * If a dict is given, must be a dictionary of the attribute names and
          associated lists of the attribute values
        * If a DataFrame is given, the field names are taken from column names
          and attribute values from the corresponding column data
        * The order of each column/list will correspond to the order geometries
          are written into the dataset
        * The length of each column/list MUST match the number of geometries
        * All values in a single column/list must share the same type
            - Options are int, float, or str

    fieldDef : dict; optional
        A dictionary specifying the datatype of each attribute when written into
        the final dataset
        * Options are defined from ogr.OFT[...]
          - ex. Integer, Real, String
        * The ogrType() function can be used to map typical python and numpy types
          to appropriate ogr types

    checkAllGeoms : bool, optional
        If True, all geoms will be asserted in object type and exact srs. Else, only
        the first geom in the geom column/iterable will be checked for performance reasons.
        By default False.

    overwrite : bool; optional
        Determines whether the preexisting files should be overwritten
        * Only used when output is not None

    Returns
    -------
    * If 'output' is None: gdal.Dataset
    * If 'output' is given: None
    """
    if srs:
        srs = SRS.loadSRS(srs)

    # make geom or wkt list into a list of ogr.Geometry objects
    finalGeoms = []
    geomIndex = None
    if isinstance(geoms, str) or isinstance(geoms, ogr.Geometry):  # geoms is a single geometry
        geoms = [
            geoms,
        ]
    elif isinstance(geoms, pd.Series):
        geomIndex = geoms.index
        geoms = geoms.values
    elif isinstance(geoms, pd.DataFrame):
        if not fieldVals is None:
            raise GeoKitVectorError("fieldVals must be None when geoms input is a DataFrame")

        fieldVals = geoms.copy()
        geoms = geoms.geom.values
        fieldVals.drop("geom", inplace=True, axis=1)

    if len(geoms) == 0:
        raise GeoKitVectorError("Empty geometry list given")

    # Test if the first geometry is an ogr-Geometry type
    if isinstance(geoms[0], ogr.Geometry):
        #  (Assume all geometries in the array have the same type)
        geomSRS = geoms[0].GetSpatialReference()
        if checkAllGeoms:
            # check if all other geoms have the same SRS
            assert all([geomSRS.IsSame(g.GetSpatialReference()) for g in geoms]), (
                f"Not all geoms have the same SRS, srs of first geom: {geomSRS}"
            )
        # Set up some variables for the upcoming loop
        doTransform = False
        setSRS = False

        if srs and geomSRS and not srs.IsSame(geomSRS):  # Test if a transformation is needed
            trx = osr.CoordinateTransformation(geomSRS, srs)
            doTransform = True
        # Test if an srs was NOT given, but the incoming geometries have an SRS already
        elif srs is None and geomSRS:
            srs = geomSRS
        # Test if an srs WAS given, but the incoming geometries do NOT have an srs already
        elif srs and geomSRS is None:
            # In which case, assume the geometries are meant to have the given srs
            setSRS = True

        # Create geoms
        for i in range(len(geoms)):
            # clone just in case the geometry is tethered outside of function
            finalGeoms.append(geoms[i].Clone())
            if doTransform:
                finalGeoms[i].Transform(trx)  # Do transform if necessary
            if setSRS:
                finalGeoms[i].AssignSpatialReference(srs)  # Set srs if necessary

    # Otherwise, the geometry array should contain only WKT strings
    elif isinstance(geoms[0], str):
        if srs is None:
            raise ValueError("srs must be given when passing wkt strings")

        # Create geoms
        finalGeoms = [GEOM.convertWKT(wkt, srs) for wkt in geoms]

    else:
        raise ValueError("Geometry inputs must be ogr.Geometry objects or WKT strings")

    geoms = finalGeoms  # Overwrite geoms array with the conditioned finalGeoms

    # Determine geometry types
    types = set()
    for g in geoms:
        # Get a set of all geometry type-names (POINT, POLYGON, etc...)
        types.add(g.GetGeometryName())

    if types.issubset({"POINT", "MULTIPOINT"}):
        geomType = ogr.wkbPoint
    elif types.issubset({"LINESTRING", "MULTILINESTRING"}):
        geomType = ogr.wkbLineString
    elif types.issubset({"POLYGON", "MULTIPOLYGON"}):
        geomType = ogr.wkbPolygon
    else:
        # geomType = ogr.wkbGeometryCollection
        raise RuntimeError("Could not determine output shape's geometry type")

    # Create a driver and datasource
    # driver = gdal.GetDriverByName("ESRI Shapefile")
    # dataSource = driver.Create(output, 0, 0)

    if output is None:
        # Create datasource if no output path is provided
        driver = gdal.GetDriverByName("Memory")

        # Using 'Create' from a Memory driver leads to an error. But creating
        #  a temporary shapefile driver (it does not actually produce a file, I think)
        #  and then using 'CreateCopy' seems to work
        tmp_driver = gdal.GetDriverByName("ESRI Shapefile")
        t = TemporaryDirectory()
        tmp_dataSource = tmp_driver.Create(t.name + "tmp.shp", 0, 0)

        dataSource = driver.CreateCopy("MEMORY", tmp_dataSource)
        t.cleanup()
        del tmp_dataSource, tmp_driver, t
    elif output is not None:
        # Create datasource if output path is provided
        if overwrite is True:
            # Search for directory
            if os.path.dirname(output) == "":  # If no directory is given, assume current directory
                output = os.path.join(os.getcwd(), output)

            elif not os.path.isdir(os.path.dirname(output)):  # If directory does not exist, raise error
                raise FileNotFoundError(f"Directory {os.path.dirname(output)} does not exist")

            # Remove file if it exists
            if os.path.isfile(output):
                os.remove(output)

            driver = ogr.GetDriverByName(driverName)
            dataSource = driver.CreateDataSource(output)

        elif overwrite is False:
            dataSource = ogr.Open(output, 1)
            assert dataSource is not None, f"Could not open {output}"
    # Wrap the whole writing function in a 'try' statement in case it fails
    try:
        # Create the layer
        if output is not None and overwrite is False:
            layerName = layerName
            assert layerName not in listLayers(output), (
                f"Layer name '{layerName}' already exists in {output}. Please Specify a new layer name or set overwrite=True."
            )

        else:
            layerName = layerName

        layer = dataSource.CreateLayer(layerName, srs, geomType)
        assert layer is not None, "Could not create layer!"

        # Setup fieldVals and fieldDef dicts
        if not fieldVals is None:
            # Ensure fieldVals is a dataframe
            if not isinstance(fieldVals, pd.DataFrame):
                # If not, try converting it
                fieldVals = pd.DataFrame(fieldVals)
            if not geomIndex is None:
                fieldVals = fieldVals.loc[geomIndex]

            # check if length is good
            if fieldVals.shape[0] != len(geoms):
                raise GeoKitVectorError("Values table length does not equal geometry count")

            # Ensure fieldDefs exists and is a dict
            if fieldDef is None:  # Try to determine data types
                fieldDef = {}
                for k, v in fieldVals.items():
                    fieldDef[k] = ogrType(v.dtype)

            elif isinstance(fieldDef, dict):  # Assume fieldDef is a dict mapping fields to inappropriate
                #   datatypes. Fix these to ogr indicators!
                for k in fieldVals.columns:
                    try:
                        fieldDef[k] = ogrType(fieldDef[k])
                    except KeyError as e:
                        print('"%s" not in attribute definition table' % k)
                        raise e

            else:  # Assume a single data type was given. Apply to all fields
                _type = ogrType(fieldDef)
                fieldDef = {}
                for k in fieldVals.keys():
                    fieldDef[k] = _type

            # Write field definitions to layer
            for fieldName, dtype in fieldDef.items():
                layer.CreateField(ogr.FieldDefn(str(fieldName), getattr(ogr, dtype)))

            # Ensure list lengths match geom length
            for k, v in fieldVals.items():
                if len(v) != len(geoms):
                    raise RuntimeError("'{}' length does not match geom list".format(k))

        # Create features
        for gi in range(len(geoms)):
            # Create a blank feature
            feature = ogr.Feature(layer.GetLayerDefn())

            # Fill the attributes, if required
            if not fieldVals is None:
                for fieldName, value in fieldVals.items():
                    _type = fieldDef[fieldName]

                    # cast to basic type
                    if _type == "OFTString":
                        val = str(value.iloc[gi])
                    elif _type == "OFTInteger" or _type == "OFTInteger64":
                        val = int(value.iloc[gi])
                    else:
                        val = float(value.iloc[gi])

                    # Write to feature
                    feature.SetField(str(fieldName), val)

            # Set the Geometry
            feature.SetGeometry(geoms[gi])

            # Create the feature
            layer.CreateFeature(feature)
            feature.Destroy()  # Free resources (probably not necessary here)

        # Finish
        if output:
            return output
        else:
            return dataSource

    # Delete the datasource in case it failed
    except Exception as e:
        dataSource = None
        raise e

extractAndClipFeatures

extractAndClipFeatures(
    source: load_vector_input | DataFrame,
    geom,
    where=None,
    srs=None,
    onlyGeom=False,
    indexCol=None,
    skipMissingGeoms=True,
    layerName=None,
    scaleAttrs=None,
    minShare=0.001,
    **kwargs,
) -> DataFrame | Series | Generator

Extracts features from a source and clips them to the boundaries of a given geom. Optionally scales numeric attribute values linearly to the overlapping area share.

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector data source to read from

  • geom

    (Geometry) –

    The geometry to search with * All features touching this geometry are extracted and clipped to the geometry boundaries.

  • where

    (str; optional, default: None ) –

    An SQL-like where statement to apply to the source * Feature attribute name do not need quotes * String values should be wrapped in 'single quotes' Example: If the source vector has a string attribute called "ISO" and a integer attribute called "POP", you could use....

    where = "ISO='DEU' AND POP>1000"
    
  • srs

    (Anything acceptable to geokit.srs.loadSRS(); optional, default: None ) –

    The srs of the geometries to extract * If not given, the geom's inherent srs is used * If srs does not match the inherent srs, all geometries will be transformed

  • onlyGeom

    (bool; optional, default: False ) –

    If True, only feature geometries will be returned

  • indexCol

    (str; optional, default: None ) –

    The feature identifier to use as the DataFrams's index * Only useful when as DataFrame is True

  • skipMissingGeoms

    (bool; optional, default: True ) –

    If True, then the parser will not read a feature which are missing a geometry

  • layerName

    (str; optional, default: None ) –

    The name of the layer to extract from the source vector dataset (only applicable in case of a geopackage).

  • scaleAttrs

    (str or list, default: None ) –

    attribute names of the source with numeric values. The values will be scaled linearly with the area share of the feature overlapping the geom.

  • minShare

    (float, default: 0.001 ) –

    The min. area share of a polygon that falls either inside or outside the clipping geom. Allows to deal with imperfect boundary alignments. 0 means that all clipped geoms, however small they may be, are considered. Example: If minShare=0.001 (0.1%), a polygon that overlaps with the clipping geom by 99.92% of its area will NOT be clipped. If another polygon also at minShare=0.001 overlaps by only 0.06% of its area, it will be disregarded. By default 0.001.

Returns:

  • * pandas.DataFrame or pandas.Series
Source code in geokit/core/vector.py
def extractAndClipFeatures(
    source: load_vector_input | pd.DataFrame,
    geom,
    where=None,
    srs=None,
    onlyGeom=False,
    indexCol=None,
    skipMissingGeoms=True,
    layerName=None,
    scaleAttrs=None,
    minShare=0.001,
    **kwargs,
) -> pd.DataFrame | pd.Series | Generator:
    """
    Extracts features from a source and clips them to the boundaries of a given geom.
    Optionally scales numeric attribute values linearly to the overlapping area share.

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector data source to read from

    geom : ogr.Geometry
        The geometry to search with
        * All features touching this geometry are extracted and clipped to the geometry boundaries.

    where : str; optional
        An SQL-like where statement to apply to the source
        * Feature attribute name do not need quotes
        * String values should be wrapped in 'single quotes'
        Example: If the source vector has a string attribute called "ISO" and
                 a integer attribute called "POP", you could use....

            where = "ISO='DEU' AND POP>1000"

    srs : Anything acceptable to geokit.srs.loadSRS(); optional
        The srs of the geometries to extract
          * If not given, the geom's inherent srs is used
          * If srs does not match the inherent srs, all geometries will be
            transformed

    onlyGeom : bool; optional
        If True, only feature geometries will be returned

    indexCol : str; optional
        The feature identifier to use as the DataFrams's index
        * Only useful when as DataFrame is True

    skipMissingGeoms : bool; optional
        If True, then the parser will not read a feature which are missing a geometry

    layerName : str; optional
        The name of the layer to extract from the source vector dataset (only applicable in case of a geopackage).

    scaleAttrs : str or list, optional
        attribute names of the source with numeric values. The values will be scaled linearly with the
        area share of the feature overlapping the geom.

    minShare : float
        The min. area share of a polygon that falls either inside or
        outside the clipping geom. Allows to deal with imperfect boundary
        alignments. 0 means that all clipped geoms, however small they
        may be, are considered. Example: If minShare=0.001 (0.1%), a
        polygon that overlaps with the clipping geom by 99.92% of its
        area will NOT be clipped. If another polygon also at
        minShare=0.001 overlaps by only 0.06% of its area, it will be
        disregarded. By default 0.001.

    Returns
    -------
    * pandas.DataFrame or pandas.Series
    """
    assert 0 <= minShare <= 1, f"minShare must be greater or equal to 0 and less or equal to 1.0"
    # assert and preprocess input source
    if isinstance(source, pd.DataFrame):
        # check validity of input dataframe
        if not "geom" in source.columns:
            raise AttributeError(f"source is given as a pd.DataFrame but has not 'geom' column.")
        if not isinstance(source.geom.iloc[0], ogr.Geometry):
            raise TypeError(
                f"source is given as a pd.DataFrame but value in 'geom' column is not of type osgeo.ogr.Geometry."
            )
        # return empty dataframe with empty expected "areaShare" column if no geometries contained since vector cannot be created without geometries
        if len(source) == 0:
            source["areaShare"] = None
            return source
        # generate a vector from source dataframe
        source = createVector(source)
    elif isinstance(source, str) or isinstance(source, pathlib.Path):
        if not os.path.isfile(source):
            raise FileNotFoundError(f"source is given as a string but is not an existing filepath: {source}")
        # load as vector file
        source = loadVector(source)
    elif not isinstance(source, gdal.Dataset):
        raise TypeError(
            f"source must either be a pd.DataFrame, a gdal.Dataset vector instance or a str formatted shapefile path."
        )

    # extract only the overlapping geoms, first define srs
    if srs is None:
        srs = geom.GetSpatialReference()
    else:
        geom = GEOM.transform(geom, toSRS=srs)
    df = extractFeatures(
        source=source,
        geom=geom,
        where=where,
        srs=srs,
        onlyGeom=onlyGeom,
        indexCol=indexCol,
        skipMissingGeoms=skipMissingGeoms,
        layerName=layerName,
        **kwargs,
    )
    if scaleAttrs is None:
        scaleAttrs = []
    elif isinstance(scaleAttrs, str):
        scaleAttrs = [scaleAttrs]
    else:
        assert isinstance(scaleAttrs, list), f"scaleAttrs must be a str or a list thereof if not None."
    if len(df) > 0:
        for _attr in scaleAttrs:
            if not _attr in list(df.columns):
                raise AttributeError(
                    f"'{_attr}' was given as scaleAttrs but is not an attribute of the source dataframe."
                )
            if not all([isinstance(_val, numbers.Number) for _val in df[_attr]]):
                raise TypeError(f"All values in column '{_attr}' in scaleAttrs must be numeric.")

    # check if we have any features to clip at all
    if len(df) == 0:
        # if not, add the mandatory areaShare column in case that it is not there already and return empty dataframe
        df["areaShare"] = None
        return df
    # else add the expected areaShare column
    assert not "areaShare" in list(df.columns), f"source data must not contain a 'areaShare' attribute."
    df["areaShare"] = 1.0
    # check if we need to clip the geometries at all
    if df.geom.iloc[0].GetGeometryName()[:5] == "POINT":
        # we have only points and no further clipping is needed
        return df

    # else we need to add an ID column and generate a new vector
    assert not "clippingID" in list(df.columns), f"source data must not contain a 'clippingID' attribute."
    df["clippingID"] = range(len(df))
    _vec = createVector(df)
    # extract only these features intersected by the outer geom boundary
    outer_df = extractFeatures(
        source=_vec,
        geom=geom.Boundary(),
        where=where,
        srs=srs,
        indexCol=indexCol,
        skipMissingGeoms=skipMissingGeoms,
        layerName=layerName,
        **kwargs,
    )
    del _vec
    if len(outer_df) == 0:
        # we have no features intersecting with the geom boundary, return all included features

        return df.drop(columns="clippingID")

    # else clip these features that are intersected by the geom
    _clippedIDs = list()
    _clippedGeoms = list()
    _areaShares = list()
    for i, feat in zip(outer_df.clippingID, outer_df.geom):
        _clipped = feat.Intersection(geom)
        _areaShare = _clipped.Area() / feat.Area()
        if _areaShare >= 1.0 - minShare:
            # the feature is only touched by the boundary (or protrudes minimally outside the geom edge) -> will not be reduced
            continue
        elif _areaShare <= 0.0 + minShare:
            # the feature is fully outside the geom and only touches the geom boundary (or overlaps only minimally with geom)
            # -> set clipped feature geometry to np.nan to filter out later
            _clipped = np.nan
        _clippedGeoms.append(_clipped)
        _areaShares.append(_areaShare)
        _clippedIDs.append(i)

    if len(_clippedIDs) == 0:
        # we have not clipped any feature at all, return df
        return df.drop(columns="clippingID")

    # else replace the original feature geometries with the clipped ones where needed and add area shares
    df.loc[df.clippingID.isin(_clippedIDs), "geom"] = _clippedGeoms
    df.loc[df.clippingID.isin(_clippedIDs), "areaShare"] = _areaShares
    for _attr in scaleAttrs:
        df[_attr] = df.apply(lambda x: x[_attr] * x.areaShare, axis=1)

    # drop nan geometries that do not (sufficiently) overlap filter geom
    df = df[~df.geom.isna()].reset_index(drop=True)

    # return the adapted dataframe
    return df.drop(columns="clippingID")

extractAsDataFrame

extractAsDataFrame(
    source: load_vector_input,
    indexCol: str | None = None,
    geom: None | Geometry = None,
    where=None,
    srs: srs_input | None = None,
    **kwargs,
)

Convenience function calling extractFeatures and structuring the output as a pandas DataFrame.

  • Geometries are written to a column called 'geom'
  • Attributes are written to a column of the same name

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector datasource to read from

  • indexCol

    (str; optional, default: None ) –

    The feature identifier to use as the DataFrams's index

  • geom

    (ogr.Geometry; optional, default: None ) –

    The geometry to search within * All features are extracted which touch this geometry

  • where

    (str; optional, default: None ) –

    An SQL-like where statement to apply to the source * Feature attribute name do not need quotes * String values should be wrapped in 'single quotes' Example: If the source vector has a string attribute called "ISO" and a integer attribute called "POP", you could use....

    where = "ISO='DEU' AND POP>1000"
    
  • srs

    (Anything acceptable to geokit.srs.loadSRS(); optional, default: None ) –

    The srs of the geometries to extract * If not given, the source's inherent srs is used * If srs does not match the inherent srs, all geometries will be transformed

Returns:

  • DataFrame
Source code in geokit/core/vector.py
def extractAsDataFrame(
    source: load_vector_input,
    indexCol: str | None = None,
    geom: None | ogr.Geometry = None,
    where=None,
    srs: srs_input | None = None,
    **kwargs,
):
    """Convenience function calling extractFeatures and structuring the output as
    a pandas DataFrame.

    * Geometries are written to a column called 'geom'
    * Attributes are written to a column of the same name

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector datasource to read from

    indexCol : str; optional
        The feature identifier to use as the DataFrams's index

    geom : ogr.Geometry; optional
        The geometry to search within
        * All features are extracted which touch this geometry

    where : str; optional
        An SQL-like where statement to apply to the source
        * Feature attribute name do not need quotes
        * String values should be wrapped in 'single quotes'
        Example: If the source vector has a string attribute called "ISO" and
                 a integer attribute called "POP", you could use....

            where = "ISO='DEU' AND POP>1000"

    srs : Anything acceptable to geokit.srs.loadSRS(); optional
        The srs of the geometries to extract
          * If not given, the source's inherent srs is used
          * If srs does not match the inherent srs, all geometries will be
            transformed

    Returns
    -------
    pandas.DataFrame
    """
    warnings.warn(
        "This function will be removed in favor of geokit.vector.extractFeatures",
        DeprecationWarning,
    )
    return extractFeatures(source=source, indexCol=indexCol, geom=geom, where=where, srs=srs, **kwargs)

extractFeature

extractFeature(
    source: load_vector_input,
    where: None | str | int = None,
    geom=None,
    srs=None,
    onlyGeom=False,
    onlyAttr=False,
) -> Feature | Geometry | dict

Convenience function calling extractFeatures which assumes there is only one feature to extract.

  • Will raise an error if multiple features are found

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector datasource to read from

  • where

    (str, int; optional, default: None ) –

    If string -> An SQL-like where statement to apply to the source If int -> The feature's ID within the dataset * Feature attribute name do not need quotes * String values should be wrapped in 'single quotes' Example: If the source vector has a string attribute called "ISO" and a integer attribute called "POP", you could use....

    where = "ISO='DEU' AND POP>1000"
    
  • geom

    (ogr.Geometry; optional, default: None ) –

    The geometry to search with * All features are extracted which touch this geometry

  • outputSRS

    (Anything acceptable to geokit.srs.loadSRS(); optional) –

    The srs of the geometries to extract * If not given, the source's inherent srs is used * If srs does not match the inherent srs, all geometries will be transformed

  • onlyGeom

    (bool; optional, default: False ) –

    If True, only feature geometries will be returned

  • onlyAttr

    (bool; optional, default: False ) –

    If True, only feature attributes will be returned

Returns:

  • * If onlyGeom and onlyAttr are False: namedtuple -> (geom, attr)
  • * If onlyGeom is True: ogr.Geometry
  • * If onlyAttr is True: dict
Source code in geokit/core/vector.py
def extractFeature(
    source: load_vector_input,
    where: None | str | int = None,
    geom=None,
    srs=None,
    onlyGeom=False,
    onlyAttr=False,
) -> UTIL.Feature | ogr.Geometry | dict:
    """Convenience function calling extractFeatures which assumes there is only
    one feature to extract.

    * Will raise an error if multiple features are found

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector datasource to read from

    where : str, int; optional
        If string -> An SQL-like where statement to apply to the source
        If int -> The feature's ID within the dataset
        * Feature attribute name do not need quotes
        * String values should be wrapped in 'single quotes'
        Example: If the source vector has a string attribute called "ISO" and
                 a integer attribute called "POP", you could use....

            where = "ISO='DEU' AND POP>1000"

    geom : ogr.Geometry; optional
        The geometry to search with
        * All features are extracted which touch this geometry

    outputSRS : Anything acceptable to geokit.srs.loadSRS(); optional
        The srs of the geometries to extract
          * If not given, the source's inherent srs is used
          * If srs does not match the inherent srs, all geometries will be
            transformed

    onlyGeom : bool; optional
        If True, only feature geometries will be returned

    onlyAttr : bool; optional
        If True, only feature attributes will be returned

    Returns
    -------
    * If onlyGeom and onlyAttr are False: namedtuple -> (geom, attr)
    * If onlyGeom is True: ogr.Geometry
    * If onlyAttr is True: dict
    """
    warnings.warn(message="extractFeature is deprecated use extractFeatures instead.", category=DeprecationWarning)
    if isinstance(where, int):
        ds = loadVector(source)
        lyr = ds.GetLayer()
        ftr = lyr.GetFeature(where)

        fGeom = ftr.GetGeometryRef().Clone()
        fItems = ftr.items().copy()

    else:
        getter = _extractFeatures(
            source=source,
            geom=geom,
            where=where,
            srs=srs,
            onlyGeom=onlyGeom,
            onlyAttr=onlyAttr,
            skipMissingGeoms=False,
        )

        # Get first result
        res = next(getter)

        if onlyGeom:
            fGeom = res
        elif onlyAttr:
            fItems = res
        else:
            (fGeom, fItems) = res

        # try to get a second result
        try:
            next(getter)
        except StopIteration:
            pass
        else:
            raise GeoKitVectorError("More than one feature found")

    if not srs is None and not onlyAttr:
        srs = SRS.loadSRS(srs)
        if not fGeom.GetSpatialReference().IsSame(srs):
            fGeom.TransformTo(srs)

    # Done!
    if onlyGeom:
        return fGeom
    elif onlyAttr:
        return fItems
    else:
        return UTIL.Feature(fGeom, fItems)

extractFeatures

extractFeatures(
    source,
    where=None,
    geom=None,
    srs=None,
    onlyGeom=False,
    onlyAttr=False,
    asPandas=True,
    indexCol=None,
    skipMissingGeoms=True,
    layerName=None,
    spatialPredicate: Literal[
        "Touches", "Overlaps", "CentroidWithin"
    ] = "Touches",
) -> DataFrame | Series | Generator

Creates a generator which extract the features contained within the source.

  • Iteratively returns (feature-geometry, feature-fields)
Note:

Be careful when filtering by a geometry as the extracted features may not necessarily be IN the given shape * Sometimes they may only overlap * Sometimes they are only in the geometry's envelope * To be sure an extracted geometry fits the selection criteria, you may still need to do further processing or use extractAndClipFeatures()

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector data source to read from

  • geom

    (ogr.Geometry; optional, default: None ) –

    The geometry to search with * All features are extracted which touch this geometry

  • where

    (str; optional, default: None ) –

    An SQL-like where statement to apply to the source * Feature attribute name do not need quotes * String values should be wrapped in 'single quotes' Example: If the source vector has a string attribute called "ISO" and a integer attribute called "POP", you could use....

    where = "ISO='DEU' AND POP>1000"
    
  • srs

    (Anything acceptable to geokit.srs.loadSRS(); optional, default: None ) –

    The srs of the geometries to extract * If not given, the source's inherent srs is used * If srs does not match the inherent srs, all geometries will be transformed

  • onlyGeom

    (bool; optional, default: False ) –

    If True, only feature geometries will be returned

  • onlyAttr

    (bool; optional, default: False ) –

    If True, only feature attributes will be returned

  • asPandas

    (bool; optional, default: True ) –

    Whether or not the result should be returned as a pandas.DataFrame (when onlyGeom is False) or pandas.Series (when onlyGeom is True)

  • indexCol

    (str; optional, default: None ) –

    The feature identifier to use as the DataFrams's index * Only useful when as DataFrame is True

  • skipMissingGeoms

    (bool; optional, default: True ) –

    If True, then the parser will not read a feature which are missing a geometry

  • layerName

    (str; optional, default: None ) –

    The name of the layer to extract from the source vector dataset (only applicable in case of a geopackage).

  • spatialPredicate

    (str, default: 'Touches' ) –

    Applies only in combination with given 'geom' filter. If "Touches", all geometries will be extracted that simply touch the filter geom. If "Overlaps", geometries to be extracted must overlap (for lines, this represents an "Intersect") either partially or completely (i.e. it includes "Within"), and if "CentroidWithin" the centroid of the extracted geom must be within or on the filter geom. By default "Touches". NOTE: When filter geom is a polygon, centroids exactly on the filter geom boundary will NOT be extracted.

Returns:

  • * If asPandas is True: pandas.DataFrame or pandas.Series
  • * If asPandas is False: generator
Source code in geokit/core/vector.py
def extractFeatures(
    source,
    where=None,
    geom=None,
    srs=None,
    onlyGeom=False,
    onlyAttr=False,
    asPandas=True,
    indexCol=None,
    skipMissingGeoms=True,
    layerName=None,
    spatialPredicate: Literal["Touches", "Overlaps", "CentroidWithin"] = "Touches",
) -> pd.DataFrame | pd.Series | Generator:
    """Creates a generator which extract the features contained within the source.

    * Iteratively returns (feature-geometry, feature-fields)

    Note:
    -----
    Be careful when filtering by a geometry as the extracted features may not
    necessarily be IN the given shape
    * Sometimes they may only overlap
    * Sometimes they are only in the geometry's envelope
    * To be sure an extracted geometry fits the selection criteria, you may
      still need to do further processing or use extractAndClipFeatures()

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector data source to read from

    geom : ogr.Geometry; optional
        The geometry to search with
        * All features are extracted which touch this geometry

    where : str; optional
        An SQL-like where statement to apply to the source
        * Feature attribute name do not need quotes
        * String values should be wrapped in 'single quotes'
        Example: If the source vector has a string attribute called "ISO" and
                 a integer attribute called "POP", you could use....

            where = "ISO='DEU' AND POP>1000"

    srs : Anything acceptable to geokit.srs.loadSRS(); optional
        The srs of the geometries to extract
          * If not given, the source's inherent srs is used
          * If srs does not match the inherent srs, all geometries will be
            transformed

    onlyGeom : bool; optional
        If True, only feature geometries will be returned

    onlyAttr : bool; optional
        If True, only feature attributes will be returned

    asPandas : bool; optional
        Whether or not the result should be returned as a pandas.DataFrame (when
        onlyGeom is False) or pandas.Series (when onlyGeom is True)

    indexCol : str; optional
        The feature identifier to use as the DataFrams's index
        * Only useful when as DataFrame is True

    skipMissingGeoms : bool; optional
        If True, then the parser will not read a feature which are missing a geometry

    layerName : str; optional
        The name of the layer to extract from the source vector dataset (only applicable in case of a geopackage).

    spatialPredicate : str, optional
        Applies only in combination with given 'geom' filter. If "Touches",
        all geometries will be extracted that simply touch the filter
        geom. If "Overlaps", geometries to be extracted must overlap (for
        lines, this represents an "Intersect") either partially or
        completely (i.e. it includes "Within"), and if "CentroidWithin"
        the centroid of the extracted geom must be within or on the
        filter geom. By default "Touches".
        NOTE: When filter geom is a polygon, centroids exactly on the
        filter geom boundary will NOT be extracted.

    Returns
    -------
    * If asPandas is True: pandas.DataFrame or pandas.Series
    * If asPandas is False: generator
    """
    # arrange output
    if not asPandas:
        return _extractFeatures(
            source=source,
            geom=geom,
            where=where,
            srs=srs,
            onlyGeom=onlyGeom,
            onlyAttr=onlyAttr,
            skipMissingGeoms=skipMissingGeoms,
            layerName=layerName,
            spatialPredicate=spatialPredicate,
        )
    else:
        fields = defaultdict(list)
        fields["geom"] = []

        for g, a in _extractFeatures(
            source=source,
            geom=geom,
            where=where,
            srs=srs,
            onlyGeom=False,
            onlyAttr=False,
            skipMissingGeoms=skipMissingGeoms,
            layerName=layerName,
            spatialPredicate=spatialPredicate,
        ):
            fields["geom"].append(g.Clone())
            for k, v in a.items():
                fields[k].append(v)

        df = pd.DataFrame(fields)
        if not indexCol is None:
            df.set_index(indexCol, inplace=True, drop=False)

        if onlyGeom:
            assert not onlyAttr, f"onlyGeom cannot be combined with onlyAttr."
            return df["geom"]
        elif onlyAttr:
            return df.drop("geom", axis=1)
        else:
            return df

filterLayer

filterLayer(layer, geom=None, where=None)

GeoKit internal.

Filters an ogr Layer object according to a geometry and where statement

Source code in geokit/core/vector.py
def filterLayer(layer, geom=None, where=None):
    """GeoKit internal.

    Filters an ogr Layer object according to a geometry and where statement
    """
    if not geom is None:
        if isinstance(geom, ogr.Geometry):
            if geom.GetSpatialReference() is None:
                raise GeoKitVectorError("Input geom must have a srs")
            if not geom.GetSpatialReference().IsSame(layer.GetSpatialRef()):
                geom = geom.Clone()
                geom.TransformTo(layer.GetSpatialRef())
            layer.SetSpatialFilter(geom)
        else:
            if isinstance(geom, tuple):  # maybe geom is a simple tuple
                xMin, yMin, xMax, yMax = geom
            else:
                try:  # maybe geom is an extent object
                    xMin, yMin, xMax, yMax = geom.castTo(layer.GetSpatialRef()).xyXY
                except:
                    raise GeoKitVectorError("Geom input not understood")
            layer.SetSpatialFilterRect(xMin, yMin, xMax, yMax)

    if not where is None:
        r = layer.SetAttributeFilter(where)
        if r != 0:
            raise GeoKitVectorError("Error applying where statement")

listLayers

listLayers(source)

Returns the layer names for each layer that is stored in a geopackage.

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector datasource to read from

Returns:

  • list

    A list of layer names for the source geopackage.

Source code in geokit/core/vector.py
def listLayers(
    source,
):
    """Returns the layer names for each layer that is stored in a geopackage.

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector datasource to read from

    Returns
    -------
    list
        A list of layer names for the source geopackage.
    """
    layer_names = []
    ds = loadVector(source)

    # Loop over the layers to get their names.
    for i in range(ds.GetLayerCount()):
        name = ds.GetLayer(i).GetName()
        layer_names.append(name)
    return layer_names

loadVector

loadVector(x: load_vector_input) -> Dataset

Load a vector dataset from a path to a file on disc.

Parameters:

  • source

    (str or Dataset) –
    • If a string is given, it is assumed as a path to a vector file on disc
    • If a gdal.Dataset is given, it is assumed to already be an open vector and is returned immediately

Returns:

  • Dataset
Source code in geokit/core/vector.py
def loadVector(x: load_vector_input) -> gdal.Dataset:
    """
    Load a vector dataset from a path to a file on disc.

    Parameters
    ----------
    source : str or gdal.Dataset
        * If a string is given, it is assumed as a path to a vector file on disc
        * If a gdal.Dataset is given, it is assumed to already be an open vector
          and is returned immediately

    Returns
    -------
    gdal.Dataset
    """
    if isinstance(x, str) or isinstance(x, pathlib.Path):
        if not os.path.exists(x):
            raise FileNotFoundError(f"Vector file, directory, or resource not found: {x}")

        # Since we know the path exists, try to open it.
        ds = gdal.OpenEx(x)
        if ds is None:
            # This error now clearly means path exists, but GDAL failed to open it or the file is corrupted.
            raise GeoKitVectorError(f"Could not load input dataSource: {x}")

    elif x is None:
        # Raise and error if 'None' is passed as the object
        raise GeoKitVectorError("Input dataSource cannot be None.")

    elif isinstance(x, gdal.Dataset):
        # If it is an already-opened GDAL Dataset object.
        ds = x

    else:
        # Handle any other invalid type
        raise TypeError(f"Invalid input type: Expected str, or gdal.Dataset, got {type(x)}")

    return ds

loopFeatures

loopFeatures(source: load_vector_input)

Geokit internal.

*Loops over an input layer's features * Will reset the reading counter before looping is initiated

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector datasource to read from

Source code in geokit/core/vector.py
def loopFeatures(source: load_vector_input):
    """Geokit internal.

    *Loops over an input layer's features
    * Will reset the reading counter before looping is initiated

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector datasource to read from
    """
    if isinstance(source, str) or isinstance(source, pathlib.Path):  # assume input source is a path to a datasource
        ds = ogr.Open(source)
        layer = ds.GetLayer()
    else:  # otherwise, assume input source is an ogr layer object
        # loop over all features
        layer = source
        layer.ResetReading()

    while True:
        ftr = layer.GetNextFeature()
        if ftr:
            yield ftr
        else:
            return

mutateVector

mutateVector(
    source,
    processor=None,
    srs=None,
    geom=None,
    where=None,
    fieldDef=None,
    output=None,
    keepAttributes=True,
    _slim: bool = False,
    **create_vector_kwargs,
) -> None | Dataset | str

Process a vector dataset according to an arbitrary function.

Note:

If this is called without a processor, it simply clips the vector source to the selection criteria given by 'geom' and 'where' as well as translates all geometries to 'srs'

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector datasource to read from

  • processor

    (function; optional, default: None ) –

    The function which mutates each feature * If no function is given, this stage is skipped * The function will take 1 arguments: a pandas.Series object containing one 'geom' key indicating the geometry and the other keys indicating attributes * The function must return something understandable by pd.Series containing the geometry under the index 'geom' and any other keys - These will be used to update the old geometries and attributes * The attribute dictionary's values should only be numerics and strings * See example below for more info

  • srs

    (Anything acceptable to geokit.srs.loadSRS(); optional, default: None ) –

    The srs of the vector to create * If not given, the source's inherent srs is used * If the given SRS is different from the source's SRS, all feature geometries will be cast to the given SRS before processing

  • geom

    (ogr.Geometry; optional, default: None ) –

    The geometry to search within * All features are extracted which touch this geometry

  • where

    (str; optional, default: None ) –

    An SQL-like where statement to apply to the source * Feature attribute name do not need quotes * String values should be wrapped in 'single quotes' Example: If the source vector has a string attribute called "ISO" and a integer attribute called "POP", you could use....

    where = "ISO='DEU' AND POP>1000"
    
  • fieldDef

    (dict; optional, default: None ) –

    A dictionary specifying the datatype of each attribute when written into the final dataset * Options are defined from ogr.OFT[...] - ex. Integer, Real, String * The ogrType() function can be used to map typical python and numpy types to appropriate ogr types

  • output

    (str; optional, default: None ) –

    A path on disk to create the output vector * If output is None, the vector dataset will be created in memory * Assumed to be of "ESRI Shapefile" format * Will create a number of files with different extensions

  • keepAttributes

    (bool; optional, default: True ) –

    If True, the old attributes will be kept in the output vector * Unless they are over written by the processor If False, only the newly specified attributes are kept

  • _slim

    (bool, default: False ) –

Returns:

  • * If 'output' is None: gdal.Dataset
  • * If 'output' is given: None
Example:

Say you had a vector source which contains point geometries, and where each feature also had an float-attribute called "value". You want to create a new vector set wherein you have circle geometries at the same locations as the original points and whose radius is equal to the original features' "value" attribute. Furthermore, let's say you only want to do this for feature's who's "value" attribute is greater than zero and less than 10. Do as such...

def growPoints( row ): # Create a new geom newGeom = row.geom.Buffer(row.radius)

# Return the new geometry/attribute set
return { 'geom':newGeom }

result = processVector( , where="value>0 AND value<10", processor=growPoints )

Source code in geokit/core/vector.py
def mutateVector(
    source,
    processor=None,
    srs=None,
    geom=None,
    where=None,
    fieldDef=None,
    output=None,
    keepAttributes=True,
    _slim: bool = False,
    **create_vector_kwargs,
) -> None | gdal.Dataset | str:
    """Process a vector dataset according to an arbitrary function.

    Note:
    -----
    If this is called without a processor, it simply clips the vector source to
    the selection criteria given by 'geom' and 'where' as well as translates
    all geometries to 'srs'

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector datasource to read from

    processor : function; optional
        The function which mutates each feature
        * If no function is given, this stage is skipped
        * The function will take 1 arguments: a pandas.Series object containing
          one 'geom' key indicating the geometry and the other keys indicating
          attributes
        * The function must return something understandable by pd.Series
          containing the geometry under the index 'geom' and any other keys
            - These will be used to update the old geometries and attributes
        * The attribute dictionary's values should only be numerics and strings
        * See example below for more info

    srs : Anything acceptable to geokit.srs.loadSRS(); optional
        The srs of the vector to create
          * If not given, the source's inherent srs is used
          * If the given SRS is different from the source's SRS, all feature
            geometries will be cast to the given SRS before processing

    geom : ogr.Geometry; optional
        The geometry to search within
        * All features are extracted which touch this geometry

    where : str; optional
        An SQL-like where statement to apply to the source
        * Feature attribute name do not need quotes
        * String values should be wrapped in 'single quotes'
        Example: If the source vector has a string attribute called "ISO" and
                 a integer attribute called "POP", you could use....

            where = "ISO='DEU' AND POP>1000"

    fieldDef : dict; optional
        A dictionary specifying the datatype of each attribute when written into
        the final dataset
        * Options are defined from ogr.OFT[...]
          - ex. Integer, Real, String
        * The ogrType() function can be used to map typical python and numpy types
          to appropriate ogr types

    output : str; optional
        A path on disk to create the output vector
        * If output is None, the vector dataset will be created in memory
        * Assumed to be of "ESRI Shapefile" format
        * Will create a number of files with different extensions

    keepAttributes : bool; optional
        If True, the old attributes will be kept in the output vector
            * Unless they are over written by the processor
        If False, only the newly specified attributes are kept

    _slim: bool

    Returns
    -------
    * If 'output' is None: gdal.Dataset
    * If 'output' is given: None

    Example:
    --------
    Say you had a vector source which contains point geometries, and where each
    feature also had an float-attribute called "value". You want to create a new
    vector set wherein you have circle geometries at the same locations as the
    original points and whose radius is equal to the original features' "value"
    attribute. Furthermore, let's say you only want to do this for feature's who's
    "value" attribute is greater than zero and less than 10. Do as such...

    >>> def growPoints( row ):
    >>>     # Create a new geom
    >>>     newGeom = row.geom.Buffer(row.radius)
    >>>
    >>>     # Return the new geometry/attribute set
    >>>     return { 'geom':newGeom }
    >>>
    >>> result = processVector( <source-path>, where="value>0 AND value<10",
    >>>                         processor=growPoints )
    >>>
    """
    # Extract filtered features
    geoms = extractFeatures(source, geom=geom, where=where, srs=srs)
    if geoms.size == 0:
        return None

    # Hold on to the SRS in case we need it
    if srs is None:
        vecds = loadVector(source)
        veclyr = vecds.GetLayer()
        srs = veclyr.GetSpatialRef()

    # Do processing
    if not processor is None:
        result = geoms.apply(lambda x: pd.Series(processor(x)), axis=1)
        if keepAttributes:
            for c in result.columns:
                geoms[c] = result[c].values
        else:
            geoms = result

        if not "geom" in geoms:
            raise GeoKitVectorError("There is no 'geom' field in the resulting vector table")

        # make sure the geometries have an srs
        if not geoms.geom[0].GetSpatialReference():
            srs = SRS.loadSRS(srs)
            geoms.geom.apply(lambda x: x.AssignSpatialReference(srs))

    # Create a new shapefile from the results

    if _slim is True:
        return createVector(geoms.geom)
    else:
        return createVector(geoms, srs=srs, output=output, fieldDef=fieldDef, **create_vector_kwargs)

ogrType

ogrType(s)

Tries to determine the corresponding OGR type according to the input.

Source code in geokit/core/vector.py
def ogrType(s):
    """Tries to determine the corresponding OGR type according to the input."""
    if isinstance(s, str):
        if hasattr(ogr, s):
            return s
        elif s.lower() in _ogrStrToType:
            return _ogrStrToType[s.lower()]
        elif hasattr(ogr, "OFT%s" % s):
            return "OFT%s" % s
        return "OFTString"

    elif s is str:
        return "OFTString"
    elif isinstance(s, np.dtype):
        return ogrType(str(s))
    elif isinstance(s, np.generic):
        return ogrType(s.dtype)
    elif s is bool:
        return "OFTInteger"
    elif s is int:
        return "OFTInteger64"
    elif isinstance(s, int):
        return _ogrIntToType[s]
    elif s is float:
        return "OFTReal"
    elif isinstance(s, Iterable):
        return ogrType(s[0])

    raise ValueError("OGR type could not be determined")

rasterize

rasterize(
    source: str | Path | Geometry | Dataset,
    pixelWidth: numeric,
    pixelHeight: numeric,
    srs: srs_input | None = None,
    bounds: tuple[numeric, numeric, numeric, numeric]
    | Extent
    | None = None,
    where: str | None = None,
    value: numeric | str = 1,
    output: str | None = None,
    dtype: geokit_c_data_types_literal | None = None,
    compress=True,
    noData=None,
    overwrite: bool = True,
    **kwargs,
) -> Dataset | str

Rasterize a vector datasource onto a raster context.

Note:

When creating an 'in memory' raster vs one which is saved to disk, a slightly different algorithm is used which can sometimes add an extra row of pixels. Be aware of this if you intend to compare value-matricies directly from rasters generated with this function.

Parameters:

  • source

    (str or Geometry) –

    If str, the path to the vector file to load If ogr.Geometry, an Polygon geometry - Will be immediately turned into a vector

  • pixelWidth

    (numeric) –

    The pixel width of the raster in the working srs * Is 'srs' is not given, these are the units of the source's inherent srs

  • pixelHeight

    (numeric) –

    The pixel height of the raster in the working srs * Is 'srs' is not given, these are the units of the source's inherent srs

  • srs

    (Anything acceptable to geokit.srs.loadSRS(); optional, default: None ) –

    The srs of the point to create * If 'bounds' is an Extent object, the bounds' internal srs will override this input

  • bounds

    ((xMin, yMix, xMax, yMax) or Extent; optional, default: None ) –

    The geographic extents spanned by the raster * If not given, the whole bounds spanned by the input is used

  • where

    (str; optional, default: None ) –

    An SQL-like where statement to use to filter the vector before rasterizing

  • value

    ((numeric, str), default: 1 ) –

    The values to burn into the raster * If a numeric is given, all pixels are burned with the specified value * If a string is given, then one the feature attribute names is expected

  • output

    (str; optional, default: None ) –

    A path to an output file * If output is None, the raster will be created in memory and a dataset handle will be returned * If output is given, the raster will be written to disk and nothing will be returned

  • dtype

    (str; optional, default: None ) –

    The datatype of the represented by the created raster's band * Options are: Byte, Int16, Int32, Int64, Float32, Float64 * If dtype is None and data is None, the assumed datatype is a 'Byte' * If dtype is None and data is not None, the datatype will be inferred from the given data

  • compress

    (bool, default: True ) –

    A flag instructing the output raster to use a compression algorithm * only useful if 'output' has been defined * "DEFLATE" used for Linux/Mac, "LZW" used for Windows

  • noData

    (numeric; optional, default: None ) –

    Specifies which value should be considered as 'no data' in the created raster * Must be the same datatype as the 'dtype' input (or that which is derived)

  • overwrite

    (bool, default: True ) –

    A flag to overwrite a pre-existing output file * If set to False and an 'output' is specified which already exists, an error will be raised

Returns:

  • * If 'output' is None: gdal.Dataset
  • * If 'output' is a string: The path to the output is returned (for easy opening)
Source code in geokit/core/vector.py
def rasterize(
    source: str | pathlib.Path | ogr.Geometry | gdal.Dataset,
    pixelWidth: numeric,
    pixelHeight: numeric,
    srs: srs_input | None = None,
    bounds: tuple[numeric, numeric, numeric, numeric] | Extent | None = None,
    where: str | None = None,
    value: numeric | str = 1,
    output: str | None = None,
    dtype: geokit_c_data_types_literal | None = None,
    compress=True,
    noData=None,
    overwrite: bool = True,
    **kwargs,
) -> gdal.Dataset | str:
    """Rasterize a vector datasource onto a raster context.

    Note:
    -----
    When creating an 'in memory' raster vs one which is saved to disk, a slightly
    different algorithm is used which can sometimes add an extra row of pixels. Be
    aware of this if you intend to compare value-matricies directly from rasters
    generated with this function.

    Parameters
    ----------
    source : str or ogr.Geometry
        If str, the path to the vector file to load
        If ogr.Geometry, an Polygon geometry
            - Will be immediately turned into a vector

    pixelWidth : numeric
        The pixel width of the raster in the working srs
        * Is 'srs' is not given, these are the units of the source's inherent srs

    pixelHeight : numeric
        The pixel height of the raster in the working srs
        * Is 'srs' is not given, these are the units of the source's inherent srs

    srs : Anything acceptable to geokit.srs.loadSRS(); optional
        The srs of the point to create
        * If 'bounds' is an Extent object, the bounds' internal srs will override
          this input

    bounds : (xMin, yMix, xMax, yMax) or Extent; optional
        The geographic extents spanned by the raster
        * If not given, the whole bounds spanned by the input is used

    where : str; optional
        An SQL-like where statement to use to filter the vector before rasterizing

    value : numeric, str
        The values to burn into the raster
        * If a numeric is given, all pixels are burned with the specified value
        * If a string is given, then one the feature attribute names is expected

    output : str; optional
        A path to an output file
        * If output is None, the raster will be created in memory and a dataset
          handle will be returned
        * If output is given, the raster will be written to disk and nothing will
          be returned

    dtype : str; optional
        The datatype of the represented by the created raster's band
        * Options are: Byte, Int16, Int32, Int64, Float32, Float64
        * If dtype is None and data is None, the assumed datatype is a 'Byte'
        * If dtype is None and data is not None, the datatype will be inferred
          from the given data

    compress : bool
        A flag instructing the output raster to use a compression algorithm
        * only useful if 'output' has been defined
        * "DEFLATE" used for Linux/Mac, "LZW" used for Windows

    noData : numeric; optional
        Specifies which value should be considered as 'no data' in the created
        raster
        * Must be the same datatype as the 'dtype' input (or that which is derived)

    overwrite : bool
        A flag to overwrite a pre-existing output file
        * If set to False and an 'output' is specified which already exists,
          an error will be raised

    Returns
    -------
    * If 'output' is None: gdal.Dataset
    * If 'output' is a string: The path to the output is returned (for easy opening)
    """
    # Normalize some inputs
    if isinstance(source, ogr.Geometry):
        source = createVector(source)
    else:
        source = loadVector(source)

    # Get the vector's info
    vecinfo = vectorInfo(source)

    if srs is None:
        srs = vecinfo.srs
        srsOkay = True
    else:
        srs = SRS.loadSRS(srs)
        if srs.IsSame(vecinfo.srs):
            srsOkay = True
        else:
            srsOkay = False

    # Look for bounds input
    if bounds is None:
        bounds = vecinfo.bounds
        if not srsOkay:
            bounds = GEOM.boundsToBounds(bounds, vecinfo.srs, srs)
    else:
        try:
            bounds = bounds.xyXY  # Get a tuple from an Extent object
        except:
            pass  # Bounds should already be a tuple

    bounds = UTIL.fitBoundsTo(bounds, pixelWidth, pixelHeight)

    # Collect rasterization options
    if output is None and not "bands" in kwargs:
        kwargs["bands"] = [1]

    list_of_data_types = []

    if isinstance(dtype, str):
        list_of_data_types.append(dtype)

    list_of_numbers = []

    if isinstance(value, str):
        kwargs["attribute"] = value
        data_type_of_field_as_string = vecinfo.attribute_data_types_str
        list_of_data_types.append(data_type_of_field_as_string[value])

    else:
        kwargs["burnValues"] = [
            value,
        ]
        list_of_numbers.append(value)

    if isinstance(noData, (numeric, bool)):
        list_of_numbers.append(noData)

    # minimum_data_type = dtype
    # just to raise error early if invalid
    # Do 'in memory' rasterization
    # We need to follow this path in both cases since the below fails when simultaneously rasterizing and writing to disk (I couldn't figure out why...)
    if output is None or not srsOkay:
        minimum_data_type_string = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(
            list_of_numbers=list_of_numbers, minimum_gdal_type_list=list_of_data_types
        )
        # Create temporary output file
        outputDS = UTIL.quickRaster(
            bounds=bounds,
            srs=srs,
            dx=pixelWidth,
            dy=pixelHeight,
            dtype=minimum_data_type_string,
            noData=noData,
        )

        # Do rasterize
        tmp = gdal.Rasterize(outputDS, source, where=where, **kwargs)
        if tmp == 0:
            raise GeoKitRasterError("Rasterization failed!")
        outputDS.FlushCache()

        if output is None:
            return outputDS
        else:
            ri = RASTER.rasterInfo(outputDS)
            RASTER.createRasterLike(ri, output=output, data=RASTER.extractMatrix(outputDS))
            return output

    # Do a rasterization to a file on disk
    else:
        # Check for existing file
        if os.path.isfile(output):
            if overwrite == True:
                os.remove(output)
                if os.path.isfile(output + ".aux.xml"):  # Because QGIS....
                    os.remove(output + ".aux.xml")
            else:
                raise GeoKitRasterError("Output file already exists: %s" % output)

        # Arrange some inputs
        aligned = kwargs.pop("targetAlignedPixels", True)

        if not "creationOptions" in kwargs:
            if compress:
                creation_options = RASTER.COMPRESSION_OPTION
            else:
                creation_options = []
        else:
            creation_options = kwargs.pop("creationOptions")

        # Fix the bounds issue by making them  just a little bit smaller, which should be fixed by gdalwarp
        bounds = (
            bounds[0] + 0.001 * pixelWidth,
            bounds[1] + 0.001 * pixelHeight,
            bounds[2] - 0.001 * pixelWidth,
            bounds[3] - 0.001 * pixelHeight,
        )

        minimum_data_type_constant = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_constant(
            list_of_numbers=list_of_numbers, minimum_gdal_type_list=list_of_data_types
        )
        # Do rasterize
        tmp = gdal.Rasterize(
            output,
            source,
            outputBounds=bounds,
            xRes=pixelWidth,
            yRes=pixelHeight,
            outputSRS=srs,
            noData=noData,
            where=where,
            creationOptions=creation_options,
            targetAlignedPixels=aligned,
            outputType=minimum_data_type_constant,
            **kwargs,
        )

        if not UTIL.isRaster(tmp):
            raise GeoKitRasterError("Rasterization failed!")

        return output

vectorInfo

vectorInfo(source) -> vecInfo

Extract general information about a vector source.

Determines:

Parameters:

  • source

    (Anything acceptable by loadVector()) –

    The vector datasource to read from

Returns:

  • namedtuple -> (srs : The source's SRS system,

    bounds : The source's boundaries (in the srs's units), xMin : The source's xMin boundaries (in the srs's units), yMin : The source's xMax boundaries (in the srs's units), xMax : The source's yMin boundaries (in the srs's units), yMax : The source's yMax boundaries (in the srs's units), count : The number of features in the source, attributes : The attribute titles for the source's features,)

Source code in geokit/core/vector.py
def vectorInfo(source) -> vecInfo:
    """Extract general information about a vector source.

    Determines:

    Parameters
    ----------
    source : Anything acceptable by loadVector()
        The vector datasource to read from

    Returns
    -------
    namedtuple -> (srs : The source's SRS system,
                   bounds : The source's boundaries (in the srs's units),
                   xMin : The source's xMin boundaries (in the srs's units),
                   yMin : The source's xMax boundaries (in the srs's units),
                   xMax : The source's yMin boundaries (in the srs's units),
                   yMax : The source's yMax boundaries (in the srs's units),
                   count : The number of features in the source,
                   attributes : The attribute titles for the source's features,)
    """
    info = {}

    vecDS = loadVector(source)
    vecLyr: ogr.Layer = vecDS.GetLayer()
    info["srs"] = vecLyr.GetSpatialRef()

    xMin, xMax, yMin, yMax = vecLyr.GetExtent()
    info["bounds"] = (xMin, yMin, xMax, yMax)
    info["xMin"] = xMin
    info["xMax"] = xMax
    info["yMin"] = yMin
    info["yMax"] = yMax

    info["count"] = vecLyr.GetFeatureCount()

    info["source"] = vecDS.GetDescription()

    info["attributes"] = []
    info["attribute_data_types_constant"] = {}
    info["attribute_data_types_str"] = {}
    layerDef: ogr.FeatureDefn = vecLyr.GetLayerDefn()
    for layer_number in range(layerDef.GetFieldCount()):
        field_definition: ogr.FieldDefn = layerDef.GetFieldDefn(layer_number)
        field_name = field_definition.GetName()
        field_type_constant = field_definition.GetType()
        field_type_constant_str = gdal.GetDataTypeName(field_type_constant)
        info["attributes"].append(field_name)
        info["attribute_data_types_constant"][field_name] = field_type_constant
        info["attribute_data_types_str"][field_name] = field_type_constant_str

    return vecInfo(**info)