How to install `ogr2ogr` on Google Colab (to write Parquet files)
ogr2ogr
is a geospatial file conversion tool, part of GDAL. For example, you can use it to read in a directory of GML (geo XML) files, and write them out to GeoPackage (.gpkg
), or even GeoParquet.
The short version (without Parquet support)
In Colab, the most straightforward way to install system packages is apt-get
. So:
!apt-get install -y gdal-bin
Which will let you run ogr2ogr
:
!ogr2ogr --version
# Returns:
# GDAL 3.6.4, released 2023/04/17
This gives you most GDAL drivers, such as GML, GPKG, etc.
However, if you wanted to write out (Geo)Parquet, the above would not be sufficient, as you can see:
!ogr2ogr --formats | grep parquet
# Returns nothing
With (Geo)Parquet support
The libgdal-arrow-parquet
extension package that we need is not available via apt
, but it can be installed via conda-forge. So let's first install conda-forge [1] (and update $PATH
[2]):
The curl download link comes from conda-forge and their installation instructions on GitHub. If this latter triggers a malicious content warning, then navigate there from https://github.com/conda-forge/miniforge ) ↩︎
the reason we edit environment variables in Colab in a Python cell, not in a shell cell, is so that it persists across different cells in the notebook.) ↩︎
import os
!curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
!bash Miniforge3-$(uname)-$(uname -m).sh -b -p /usr/local/miniforge
os.environ["PATH"] = "/usr/local/miniforge/bin:" + os.environ["PATH"]
Now we can add arrow/parquet support:
!conda install libgdal-arrow-parquet -y
os.environ["PROJ_LIB"] = "/usr/local/miniforge/share/proj"
And that's it:
!ogr2ogr --formats | grep parquet
# Returns:
# Parquet -vector- (rw+v): (Geo)Parquet (*.parquet)