Extract, modify, and write attribute data in vector datasets¶
Vector datasets combine geometries with attribute data stored in a table-like structure. This example shows how to extract, insert, and modify attribute data in vector datasets to prepare them for further analysis. For instance, it calculates the area of all federal states in Germany and adds a random levelized cost of electricity to each state.
Import required packages
GeoKit is imported to provide the required functionality for working with vector data.
In [1]:
Copied!
import geokit.core.vector
import geokit.core.geom
import geokit.core.srs
import numpy as np
import pandas as pd
import pathlib
from geokit.core.get_test_data import get_test_data
import geokit.core.vector
import geokit.core.geom
import geokit.core.srs
import numpy as np
import pandas as pd
import pathlib
from geokit.core.get_test_data import get_test_data
In [2]:
Copied!
# Path to shapefile folder
path_to_shape_file = get_test_data(
file_name="gadm36_DEU_1.shp",
)
# Load vector dataset from a Shapefile as a pandas DataFrame and reproject to EPSG:3035
data_frame_germany: pd.DataFrame = geokit.core.vector.extractFeatures(path_to_shape_file, srs=3035)
# Path to shapefile folder
path_to_shape_file = get_test_data(
file_name="gadm36_DEU_1.shp",
)
# Load vector dataset from a Shapefile as a pandas DataFrame and reproject to EPSG:3035
data_frame_germany: pd.DataFrame = geokit.core.vector.extractFeatures(path_to_shape_file, srs=3035)
Add attributes to the vector dataset
In [3]:
Copied!
# Add average LCOE column with random values
# Generate random LCOE values for each row
n_rows = data_frame_germany.loc[:, "geom"].shape[0]
lcoe_values = np.random.uniform(low=30.0, high=80.0, size=n_rows)
# Add average LCOE column with random values
# Generate random LCOE values for each row
n_rows = data_frame_germany.loc[:, "geom"].shape[0]
lcoe_values = np.random.uniform(low=30.0, high=80.0, size=n_rows)
In [4]:
Copied!
# Calculate the size of each geometry (area)
areas_m2 = data_frame_germany.loc[:, "geom"].apply(lambda g: g.GetArea())
# Add new column (LCOE and area)to the vector dataset
data_frame_germany.loc[:, "avg_LCOE"] = pd.Series(lcoe_values)
data_frame_germany.loc[:, "area_km2"] = pd.Series(areas_m2 / 1e6)
# Show updated dataset
data_frame_germany
# Calculate the size of each geometry (area)
areas_m2 = data_frame_germany.loc[:, "geom"].apply(lambda g: g.GetArea())
# Add new column (LCOE and area)to the vector dataset
data_frame_germany.loc[:, "avg_LCOE"] = pd.Series(lcoe_values)
data_frame_germany.loc[:, "area_km2"] = pd.Series(areas_m2 / 1e6)
# Show updated dataset
data_frame_germany
Out[4]:
| geom | GID_0 | NAME_0 | GID_1 | NAME_1 | VARNAME_1 | NL_NAME_1 | TYPE_1 | ENGTYPE_1 | CC_1 | HASC_1 | avg_LCOE | area_km2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | MULTIPOLYGON (((4223953.58970637 2731509.51891... | DEU | Germany | DEU.1_1 | Baden-Württemberg | None | None | Land | State | 08 | DE.BW | 50.405873 | 36074.255068 |
| 1 | POLYGON ((4301467.52857983 2715580.87627153,43... | DEU | Germany | DEU.2_1 | Bayern | Bavaria | None | Freistaat | None | 09 | DE.BY | 53.105215 | 70544.892566 |
| 2 | POLYGON ((4536790.77005341 3258960.86848328,45... | DEU | Germany | DEU.3_1 | Berlin | None | None | Land | State | 11 | DE.BE | 47.308120 | 891.862007 |
| 3 | MULTIPOLYGON (((4475872.02032226 3238148.49116... | DEU | Germany | DEU.4_1 | Brandenburg | None | None | Land | State | 12 | DE.BR | 64.756372 | 29654.223234 |
| 4 | MULTIPOLYGON (((4234577.75904604 3327007.62237... | DEU | Germany | DEU.5_1 | Bremen | None | None | Freie Hansestadt | State | 04 | DE.HB | 76.230227 | 399.805154 |
| 5 | MULTIPOLYGON (((4334970.2892255 3379166.259019... | DEU | Germany | DEU.6_1 | Hamburg | None | None | Freie und Hansestadt | State | 02 | DE.HH | 55.083314 | 736.380193 |
| 6 | MULTIPOLYGON (((4225654.7277815 2946381.549486... | DEU | Germany | DEU.7_1 | Hessen | Hesse | None | Land | State | 06 | DE.HE | 54.703907 | 21115.145261 |
| 7 | MULTIPOLYGON (((4419026.73328481 3437109.38907... | DEU | Germany | DEU.8_1 | Mecklenburg-Vorpommern | Mecklenburg-West Pomerania | None | Land | State | 13 | DE.MV | 42.634180 | 23376.544328 |
| 8 | MULTIPOLYGON (((4106781.29483516 3394915.76735... | DEU | Germany | DEU.9_1 | Niedersachsen | Lower Saxony | None | Land | State | 03 | DE.NI | 42.152628 | 47672.270673 |
| 9 | POLYGON ((4183914.44196412 3066634.81647491,41... | DEU | Germany | DEU.10_1 | Nordrhein-Westfalen | North Rhine-Westphalia | None | Land | State | 05 | DE.NW | 59.164844 | 34115.675424 |
| 10 | POLYGON ((4078665.76666303 2942934.90358417,40... | DEU | Germany | DEU.11_1 | Rheinland-Pfalz | Rhineland-Palatinate | None | Land | State | 07 | DE.RP | 31.529621 | 19856.379920 |
| 11 | POLYGON ((4090806.83436472 2898538.79416255,40... | DEU | Germany | DEU.12_1 | Saarland | None | None | Land | State | 10 | DE.SL | 34.020833 | 2571.007667 |
| 12 | POLYGON ((4658284.00379539 3090217.58617005,46... | DEU | Germany | DEU.14_1 | Sachsen | Saxony | None | Freistaat | State | 14 | DE.SN | 51.328672 | 18448.852315 |
| 13 | MULTIPOLYGON (((4421662.49154591 3121957.38272... | DEU | Germany | DEU.13_1 | Sachsen-Anhalt | Saxony-Anhalt | None | Land | State | 15 | DE.ST | 68.976334 | 20552.378984 |
| 14 | MULTIPOLYGON (((4289942.76376931 3396511.97997... | DEU | Germany | DEU.15_1 | Schleswig-Holstein | None | None | Land | State | 01 | DE.SH | 66.342641 | 15628.669632 |
| 15 | POLYGON ((4442016.26070027 3033676.79953545,44... | DEU | Germany | DEU.16_1 | Thüringen | Thuringia | None | Freistaat | State | 16 | DE.TH | 64.359434 | 16201.007412 |
Inspect attributes
In [5]:
Copied!
data_frame_germany[["avg_LCOE", "area_km2"]].describe()
data_frame_germany[["avg_LCOE", "area_km2"]].describe()
Out[5]:
| avg_LCOE | area_km2 | |
|---|---|---|
| count | 16.000000 | 16.000000 |
| mean | 53.881389 | 22364.959365 |
| std | 12.546485 | 18717.109604 |
| min | 31.529621 | 399.805154 |
| 25% | 46.139635 | 12364.254140 |
| 50% | 53.904561 | 20204.379452 |
| 75% | 64.458668 | 30769.586282 |
| max | 76.230227 | 70544.892566 |
Filter attributes
In [6]:
Copied!
# Select regions with low average LCOE
low_cost_regions = data_frame_germany[data_frame_germany["avg_LCOE"] < 50]
# Show selected regions
print("Regions with low average LCOE (< 50):")
print(low_cost_regions["NAME_1"])
# Select regions higher than average area size
average_area = data_frame_germany["area_km2"].mean()
large_regions = data_frame_germany[data_frame_germany["area_km2"] > average_area]
# Show selected regions
print("")
print("Regions with area larger than average:")
print(large_regions["NAME_1"])
# Select regions with low average LCOE
low_cost_regions = data_frame_germany[data_frame_germany["avg_LCOE"] < 50]
# Show selected regions
print("Regions with low average LCOE (< 50):")
print(low_cost_regions["NAME_1"])
# Select regions higher than average area size
average_area = data_frame_germany["area_km2"].mean()
large_regions = data_frame_germany[data_frame_germany["area_km2"] > average_area]
# Show selected regions
print("")
print("Regions with area larger than average:")
print(large_regions["NAME_1"])
Regions with low average LCOE (< 50): 2 Berlin 7 Mecklenburg-Vorpommern 8 Niedersachsen 10 Rheinland-Pfalz 11 Saarland Name: NAME_1, dtype: object Regions with area larger than average: 0 Baden-Württemberg 1 Bayern 3 Brandenburg 7 Mecklenburg-Vorpommern 8 Niedersachsen 9 Nordrhein-Westfalen Name: NAME_1, dtype: object
In [7]:
Copied!
# Add average LCOE column with random values
# Generate random LCOE values for each row
n_rows = data_frame_germany.geom.shape[0]
lcoe_values = np.random.uniform(low=30.0, high=80.0, size=n_rows)
# Calculate the size of each geometry (area)
areas_m2 = data_frame_germany.geom.apply(lambda g: g.GetArea())
# Add new column (LCOE and area)to the vector dataset
data_frame_germany["avg_LCOE"] = pd.Series(lcoe_values)
data_frame_germany["area_km2"] = pd.Series(areas_m2 / 1e6)
# Show updated dataset
data_frame_germany
# Add average LCOE column with random values
# Generate random LCOE values for each row
n_rows = data_frame_germany.geom.shape[0]
lcoe_values = np.random.uniform(low=30.0, high=80.0, size=n_rows)
# Calculate the size of each geometry (area)
areas_m2 = data_frame_germany.geom.apply(lambda g: g.GetArea())
# Add new column (LCOE and area)to the vector dataset
data_frame_germany["avg_LCOE"] = pd.Series(lcoe_values)
data_frame_germany["area_km2"] = pd.Series(areas_m2 / 1e6)
# Show updated dataset
data_frame_germany
Out[7]:
| geom | GID_0 | NAME_0 | GID_1 | NAME_1 | VARNAME_1 | NL_NAME_1 | TYPE_1 | ENGTYPE_1 | CC_1 | HASC_1 | avg_LCOE | area_km2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | MULTIPOLYGON (((4223953.58970637 2731509.51891... | DEU | Germany | DEU.1_1 | Baden-Württemberg | None | None | Land | State | 08 | DE.BW | 48.977513 | 36074.255068 |
| 1 | POLYGON ((4301467.52857983 2715580.87627153,43... | DEU | Germany | DEU.2_1 | Bayern | Bavaria | None | Freistaat | None | 09 | DE.BY | 64.640346 | 70544.892566 |
| 2 | POLYGON ((4536790.77005341 3258960.86848328,45... | DEU | Germany | DEU.3_1 | Berlin | None | None | Land | State | 11 | DE.BE | 38.102770 | 891.862007 |
| 3 | MULTIPOLYGON (((4475872.02032226 3238148.49116... | DEU | Germany | DEU.4_1 | Brandenburg | None | None | Land | State | 12 | DE.BR | 71.825262 | 29654.223234 |
| 4 | MULTIPOLYGON (((4234577.75904604 3327007.62237... | DEU | Germany | DEU.5_1 | Bremen | None | None | Freie Hansestadt | State | 04 | DE.HB | 72.808301 | 399.805154 |
| 5 | MULTIPOLYGON (((4334970.2892255 3379166.259019... | DEU | Germany | DEU.6_1 | Hamburg | None | None | Freie und Hansestadt | State | 02 | DE.HH | 66.691763 | 736.380193 |
| 6 | MULTIPOLYGON (((4225654.7277815 2946381.549486... | DEU | Germany | DEU.7_1 | Hessen | Hesse | None | Land | State | 06 | DE.HE | 56.282231 | 21115.145261 |
| 7 | MULTIPOLYGON (((4419026.73328481 3437109.38907... | DEU | Germany | DEU.8_1 | Mecklenburg-Vorpommern | Mecklenburg-West Pomerania | None | Land | State | 13 | DE.MV | 40.142658 | 23376.544328 |
| 8 | MULTIPOLYGON (((4106781.29483516 3394915.76735... | DEU | Germany | DEU.9_1 | Niedersachsen | Lower Saxony | None | Land | State | 03 | DE.NI | 79.209917 | 47672.270673 |
| 9 | POLYGON ((4183914.44196412 3066634.81647491,41... | DEU | Germany | DEU.10_1 | Nordrhein-Westfalen | North Rhine-Westphalia | None | Land | State | 05 | DE.NW | 59.416601 | 34115.675424 |
| 10 | POLYGON ((4078665.76666303 2942934.90358417,40... | DEU | Germany | DEU.11_1 | Rheinland-Pfalz | Rhineland-Palatinate | None | Land | State | 07 | DE.RP | 40.169873 | 19856.379920 |
| 11 | POLYGON ((4090806.83436472 2898538.79416255,40... | DEU | Germany | DEU.12_1 | Saarland | None | None | Land | State | 10 | DE.SL | 42.997241 | 2571.007667 |
| 12 | POLYGON ((4658284.00379539 3090217.58617005,46... | DEU | Germany | DEU.14_1 | Sachsen | Saxony | None | Freistaat | State | 14 | DE.SN | 33.709525 | 18448.852315 |
| 13 | MULTIPOLYGON (((4421662.49154591 3121957.38272... | DEU | Germany | DEU.13_1 | Sachsen-Anhalt | Saxony-Anhalt | None | Land | State | 15 | DE.ST | 39.990045 | 20552.378984 |
| 14 | MULTIPOLYGON (((4289942.76376931 3396511.97997... | DEU | Germany | DEU.15_1 | Schleswig-Holstein | None | None | Land | State | 01 | DE.SH | 44.795939 | 15628.669632 |
| 15 | POLYGON ((4442016.26070027 3033676.79953545,44... | DEU | Germany | DEU.16_1 | Thüringen | Thuringia | None | Freistaat | State | 16 | DE.TH | 49.970084 | 16201.007412 |
Visualizing vector attributes¶
In [8]:
Copied!
# colored by LCOE
import matplotlib.pyplot as plt
# Create two subplots for side-by-side comparison
fig, ax = plt.subplots(ncols=2, figsize=(10, 5))
ax_handle_lcoe = geokit.core.geom.drawGeoms(
data_frame_germany,
colorBy="avg_LCOE",
srs=3035,
figsize=(6, 4),
cbarTitle="Average LCOE [€/MWh]",
ax=ax[0],
)
# colored by area Size
ax_handle_area = geokit.core.geom.drawGeoms(
data_frame_germany,
colorBy="area_km2",
srs=3035,
figsize=(6, 4),
cbarTitle="Area Size [km²]",
ax=ax[1],
)
# colored by LCOE
import matplotlib.pyplot as plt
# Create two subplots for side-by-side comparison
fig, ax = plt.subplots(ncols=2, figsize=(10, 5))
ax_handle_lcoe = geokit.core.geom.drawGeoms(
data_frame_germany,
colorBy="avg_LCOE",
srs=3035,
figsize=(6, 4),
cbarTitle="Average LCOE [€/MWh]",
ax=ax[0],
)
# colored by area Size
ax_handle_area = geokit.core.geom.drawGeoms(
data_frame_germany,
colorBy="area_km2",
srs=3035,
figsize=(6, 4),
cbarTitle="Area Size [km²]",
ax=ax[1],
)
Store Vector File¶
Lastly, you can save the new vector file as a shapefile.
In [9]:
Copied!
# Simple vector creation
geokit.core.vector.createVector(geoms=data_frame_germany, output="updated_vector_file.shp")
# Simple vector creation
geokit.core.vector.createVector(geoms=data_frame_germany, output="updated_vector_file.shp")
Out[9]:
'/home/docs/checkouts/readthedocs.org/user_builds/geokit/checkouts/latest/docs/Examples/_02_vector/updated_vector_file.shp'