PostGIS Setup Guide

These steps will provide a PostgreSQL database with spatial capabilities through PostGIS, ready for developing location-aware scripts. PostGIS provides hundreds of spatial functions that can be used for complex geometry analysis directly within your database.

Prerequisites Installation

# macOS
brew install postgis
# Ubuntu/Debian
sudo apt install postgis

    

Database Setup

psql -d 
createdb census_spatial;

    
psql -d census_spatial
CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;

    
psql -d census_spatial
CREATE TABLE county_boundaries (
    geoid VARCHAR(5) PRIMARY KEY,
    name VARCHAR(100),
    state_fips CHAR(2),
    county_fips CHAR(3),
    geom geometry(MultiPolygon, 4269)
);

    

Load Census Data

wget https://www2.census.gov/geo/tiger/TIGER2023/COUNTY/tl_2023_us_county.zip
unzip tl_2023_us_county.zip

    
shp2pgsql -s 4269 -I tl_2023_us_county.shp county_boundaries | psql -d census_spatial

    

Julia Setup

using Pkg
Pkg.add(["LibPQ", "DataFrames", "GeoInterface", "CairoMakie", "GeoMakie"])

    
using LibPQ, DataFrames
conn = LibPQ.Connection("dbname=census_spatial")
# Query example
counties = execute(conn, """
    SELECT geoid, name, ST_AsGeoJSON(geom) as geometry 
    FROM county_boundaries;
""") |> DataFrame

    

Creating Choropleth Maps

Here’s a basic example of creating a choropleth map using the data:

using Plots
using JSON
# Assuming you have a DataFrame 'data' with GEOID and values to plot
function create_choropleth(data, geom_df)
    # Merge geometry with data
    merged = leftjoin(geom_df, data, on=:geoid)
    # Parse GeoJSON and create plot
    plot()
    for row in eachrow(merged)
        geom = JSON.read(row.geometry)
        coords = geom["coordinates"][1][1]  # Assuming simple polygons
        x = [p[1] for p in coords]
        y = [p[2] for p in coords]
        plot!(x, y, fill_z = row.value, legend = false)
    end
    current()
end

    

Performance Optimization

Add spatial indices for better query performance:

psql -u **TK**
CREATE INDEX countyboundaries_geom_idx 
ON county_boundaries USING GIST (geom);

    

Consider adding additional indices based on your query patterns:

psql -u **TK**
CREATE INDEX countyboundaries_geoid_idx 
ON county_boundaries (geoid);

    

Maintenance

Regular maintenance tasks to keep the database performing well:

psql -d tiger
-- Analyze table statistics
ANALYZE county_boundaries;
-- Vacuum to reclaim space and update statistics
VACUUM ANALYZE county_boundaries;

    

Common Spatial Queries

Useful spatial queries for analysis:

-- Find adjacent counties
SELECT b.geoid, b.name 
FROM county_boundaries a 
JOIN county_boundaries b 
ON ST_Touches(a.geom, b.geom) 
WHERE a.geoid = '06037';  -- Los Angeles County

-- Calculate county areas
SELECT geoid, name, 
ST_Area(ST_Transform(geom, 3857))/1000000 as area_km2 
FROM county_boundaries;

-- Simplify geometries for faster rendering
SELECT geoid, name, 
ST_SimplifyPreserveTopology(geom, 0.01) as geom_simplified 
FROM county_boundaries;

    

Troubleshooting Tips

# On Ubuntu
sudo apt install proj-bin

    

Example of Usage and Comparative Speeds

Chapter 23 used the county-level shapefile and 2024 election data to find how small counties voted to produce a map showing the division among Democratic, Republican and third-party/blank votes/no shows. Not only is the PostGIS method simpler, it is almost 100 times faster than using pure Julia: 3 seconds compared to almost five minutes on a Mac Studio M3 Ultra with 96GB RAM. The difference will be much greater on less well-provisioned workstations.