How to install `ogr2ogr` on Google Colab (to write Parquet files)

How to install `ogr2ogr` on Google Colab (to write Parquet files)
Photo by Brandon Stoll / Unsplash

ogr2ogr is a geospatial file conversion tool, part of GDAL. For example, you can use it to read in a directory of GML (geo XML) files, and write them out to GeoPackage (.gpkg), or even GeoParquet.

The short version (without Parquet support)

In Colab, the most straightforward way to install system packages is apt-get. So:

!apt-get install -y gdal-bin

Which will let you run ogr2ogr :

!ogr2ogr --version
# Returns:
# GDAL 3.6.4, released 2023/04/17

This gives you most GDAL drivers, such as GML, GPKG, etc.

However, if you wanted to write out (Geo)Parquet, the above would not be sufficient, as you can see:

!ogr2ogr --formats | grep parquet
# Returns nothing

With (Geo)Parquet support

The libgdal-arrow-parquet extension package that we need is not available via apt, but it can be installed via conda-forge. So let's first install conda-forge [1] (and update $PATH [2]):


  1. The curl download link comes from conda-forge and their installation instructions on GitHub. If this latter triggers a malicious content warning, then navigate there from https://github.com/conda-forge/miniforge ) ↩︎

  2. the reason we edit environment variables in Colab in a Python cell, not in a shell cell, is so that it persists across different cells in the notebook.) ↩︎

import os
!curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
!bash Miniforge3-$(uname)-$(uname -m).sh -b -p /usr/local/miniforge

os.environ["PATH"] = "/usr/local/miniforge/bin:" + os.environ["PATH"]

Now we can add arrow/parquet support:

!conda install libgdal-arrow-parquet -y

os.environ["PROJ_LIB"] = "/usr/local/miniforge/share/proj"

And that's it:

!ogr2ogr --formats | grep parquet
# Returns:
#   Parquet -vector- (rw+v): (Geo)Parquet (*.parquet)