How to install `ogr2ogr` in a Databricks notebook (incl. Parquet support)

Last updated on 03 Jun 2025

💡

This was tested on Serverless notebook, Environment version 2, as well as on a classic notebook with DBR 15.4 LTS.

ogr2ogr is a geospatial file conversion tool, part of GDAL. For example, you can use it to read in a directory of GML (geo XML) files, and write them out to GeoPackage (.gpkg), or even GeoParquet.

The libgdal-arrow-parquet extension package that we need can be installed via conda-forge. So let's first install conda-forge ^[1] (and update $PATH ^[2]):

The curl download link comes from conda-forge and their installation instructions on GitHub.↩︎
the reason we edit environment variables in a Python cell, not in a shell cell, is so that it persists across different cells in the notebook. ↩︎

import os

!curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
!bash Miniforge3-$(uname)-$(uname -m).sh -b -p ~/miniforge
!rm Miniforge3-$(uname)-$(uname -m).sh

os.environ["PATH"] = "~/miniforge/bin:" + os.environ["PATH"]

Now we can add arrow/parquet support:

!conda install libgdal-arrow-parquet -y

os.environ["PROJ_LIB"] = f"{os.path.expanduser('~')}/miniforge/share/proj"

And that's it:

!ogr2ogr --formats | grep parquet
# Returns:
#   Parquet -vector- (rw+v): (Geo)Parquet (*.parquet)