JuliaMapping Documentation

Welcome to the JuliaMapping.jl documentation.

Overview

JuliaMapping is a Julia package designed for mapping and geospatial analysis tasks.

Getting Started

using JuliaMapping

API Reference

JuliaMapping.add_col_totals — Method

add_col_totals(df; total_row_name="Total", cols_to_sum=nothing)

Append a final row with per-column totals. Non-numeric columns get a type-compatible label.

source

JuliaMapping.add_row_totals — Method

add_row_totals(df; total_col_name="Total", cols_to_sum=nothing)

Add a column with per-row totals (skips missing). By default sums all numeric columns.

source

JuliaMapping.add_totals — Method

add_totals(df; total_row_name="Total", total_col_name="Total", cols_to_sum=nothing, format_commas=false)

Add both a row totals column and a bottom totals row (which also totals the new column). Optionally formats all numeric values with comma separators.

source

JuliaMapping.analyze_skewness — Method

analyze_skewness(df::DataFrame, column::Symbol)

Analyze the skewness of a dataset with multiple metrics.

Arguments

df: DataFrame containing the data
column: Column name to analyze

Returns

NamedTuple with skewness metrics

source

JuliaMapping.assess_data_spread — Function

assess_data_spread(df::DataFrame, col::Symbol, n_bins::Int=5)

Evaluate the distribution of data across equal-interval bins to determine if equal-interval binning is appropriate for the specified column.

Arguments

df::DataFrame: Input dataframe containing the data
col::Symbol: Column name to analyze
n_bins::Int=5: Number of bins to create for the analysis

Returns

Vector{Int}: Array containing the count of observations in each bin

Output

Prints a detailed bin distribution report and recommendation for binning strategy. Warns if any bins contain less than 5% of total observations, suggesting alternative methods like Fisher-Jenks or quantile-based binning for better balance.

Example

df = DataFrame(value = rand(1000))
bin_counts = assess_data_spread(df, :value, 10)

source

JuliaMapping.assess_uniform_distribution — Method

assess_uniform_distribution(df::DataFrame, col::Symbol)

Analyze whether data follows a uniform distribution pattern, helping determine if equal-interval binning is suitable for the specified column.

Arguments

df::DataFrame: Input dataframe containing the data
col::Symbol: Column name to analyze for uniformity

Returns

NamedTuple{(:skewness, :interval_cv), Tuple{Float64, Float64}}: Tuple containing:
- skewness: Measure of distribution asymmetry (0 indicates symmetry)
- interval_cv: Coefficient of variation for quantile intervals

Output

Prints skewness, interval coefficient of variation, and a uniformity assessment. Also generates a histogram visualization via raw_hist().

Notes

Skewness near 0 and interval CV < 0.3 suggest suitability for equal intervals
Higher values indicate consideration of alternative binning methods

Example

df = DataFrame(value = rand(1000))
stats = assess_uniform_distribution(df, :value)

source

JuliaMapping.bullseye — Method

bullseye(capital::String, capital_coords::String)

Create an interactive HTML map with concentric circles (bullseye) centered on the specified capital city.

This function generates a Leaflet-based interactive map displaying concentric circles at fixed radii of 50, 100, 200, and 400 miles from the specified capital. The map includes a marker at the center point and a legend showing the distance bands with their corresponding colors.

Arguments

capital::String: The name of the capital city (used for the marker popup and output filename)
capital_coords::String: Coordinates in DMS (Degrees, Minutes, Seconds) format "DD° MM′ SS″ N/S, DD° MM′ SS″ E/W"

Details

Uses OpenStreetMap tiles for the base map
Concentric circles are drawn at 50, 100, 200, and 400 miles from center
Default color scheme: #D32F2F, #388E3C, #1976D2, #FBC02D, #7B1FA2
Requires the dms_to_decimal function to convert coordinates

Returns

Nothing. Creates an HTML file and opens it in the default web browser.

Output Files

Creates an HTML file named "{capital}.html" in the current working directory.

Example

bullseye("Nashville", "36° 09′ 44″ N, 86° 46′ 28″ W")
# Creates "Nashville.html" and opens it in the browser

Notes

The generated HTML file is self-contained and can be shared or hosted independently.

source

JuliaMapping.check_outlier_emphasis — Method

check_outlier_emphasis(df::DataFrame, col::Symbol)

Identify and quantify outliers in data to determine if equal-interval binning would appropriately highlight extreme values.

Arguments

df::DataFrame: Input dataframe containing the data
col::Symbol: Column name to analyze for outliers

Returns

Nothing (results are printed to console)

Output

Prints the percentage of outliers detected using the IQR method (1.5 × IQR rule). Provides recommendations based on outlier prevalence:

If >5% outliers: Confirms equal intervals will highlight these extremes
Suggests quantiles as alternative for balanced visualization

Method

Uses Tukey's fence method: outliers are values outside [Q1 - 1.5×IQR, Q3 + 1.5×IQR]

Example

df = DataFrame(value = [rand(95); rand(5) .* 100])  # Data with outliers
check_outlier_emphasis(df, :value)

source

JuliaMapping.choose_binning_for_margins — Method

choose_binning_for_margins(df::DataFrame; k::Int=5) -> Nothing

Comprehensive analysis and recommendation for binning political margin data in choropleth maps.

This function integrates multiple diagnostic approaches specifically tailored for political margin/vote share data. It considers skewness, clustering patterns, bin width variation, and domain-specific characteristics (competitive vs. landslide districts) to provide an evidence-based binning recommendation.

Arguments

df::DataFrame: Input DataFrame containing margin data (must have :margin_pct column)
k::Int=5: Number of bins to create

Returns

Nothing: Function prints comprehensive analysis and recommendation

Details

The function performs the following analyses:

Skewness analysis: Evaluates distribution asymmetry
Clustering detection: Identifies natural breaks in the data
Quantile comparison: Assesses bin width variability
Domain analysis: Calculates percentages of competitive and landslide districts

Domain-Specific Thresholds

Competitive districts: |margin| < 10% (±0.1)
Landslide districts: |margin| > 30% (±0.3)

Recommendation Logic

Recommends Fisher-Jenks if:

30% of districts are competitive AND <10% are landslides (high clustering)
Quantile width CV > 2.0 (strong evidence of natural clusters)

Recommends Quantiles if:

Skewness > 1.5 (extreme skew requiring visual balance)
Data is more uniformly distributed

Example

choose_binning_for_margins(counties, k=5)
# === BINNING RECOMMENDATION FOR MARGIN DATA ===
# 
# [Skewness Analysis output]
# [Clustering Analysis output]
# [Quantile Comparison output]
# 
# === DOMAIN-SPECIFIC ANALYSIS ===
# Competitive districts (±10%): 1245 (39.8%)
# Landslide districts (>30%): 234 (7.5%)
# 
# === FINAL RECOMMENDATION ===
# ✓ Use FISHER-JENKS
#   Rationale: High concentration of competitive districts suggests
#             natural clustering that Jenks will reveal better than quantiles

Use Cases

Electoral data visualization (vote margins, partisan lean)
Policy analysis (approval ratings, opinion polls)
Any ratio/percentage data with potential clustering around central values

Prerequisites

Requires that the DataFrame has a :margin_pct column. For other margin columns, modify the function or create a standardized column first.

source

JuliaMapping.clip_rings_to_states — Method

clip_rings_to_states(rings, state_union)

Clip distance rings to state boundaries using spatial intersection.

Arguments

rings: Dictionary mapping distance values to ArchGDAL geometry objects representing distance rings
state_union: ArchGDAL geometry object representing the union of all state boundaries

Returns

Dictionary with the same keys as input rings, containing clipped geometry objects

Details

This function clips each distance ring to the state boundaries by computing the spatial intersection between each ring and the state union. If clipping fails for any ring, the original unclipped geometry is retained to ensure the workflow continues.

Example

using ArchGDAL

# Create distance rings and state union
rings = Dict(25 => ring_25_miles, 50 => ring_50_miles, 75 => ring_75_miles)
state_boundary = create_state_union(states)

# Clip rings to state boundaries
clipped_rings = clip_rings_to_states(rings, state_boundary)

Notes

Progress is reported for each ring being clipped
Errors during intersection operations are caught and reported as warnings
Original geometries are preserved if clipping fails
Returns geometries suitable for visualization and further spatial analysis

source

JuliaMapping.compare_quantile_vs_jenks — Method

compare_quantile_vs_jenks(df::DataFrame, col::Symbol; k::Int=5) -> NamedTuple

Compare quantile and Fisher-Jenks binning strategies by analyzing bin width variability.

This function computes quantile breaks and evaluates whether the resulting bins have consistent widths. High variability in bin widths suggests that data has natural clustering that Fisher-Jenks would capture more effectively.

Arguments

df::DataFrame: Input DataFrame containing the data
col::Symbol: Column name to analyze
k::Int=5: Number of bins to create

Returns

NamedTuple with fields:
- quantile_breaks: Vector of k+1 break points from quantile method
- width_cv: Coefficient of variation of quantile bin widths

Interpretation

width_cv < 1.0: Quantile widths are relatively uniform → Quantiles appropriate
1.0 ≤ width_cv ≤ 2.0: Moderate variation → Either method works, depends on goals
width_cv > 2.0: Highly variable widths → Data has clusters, Fisher-Jenks recommended

Details

The coefficient of variation (CV) is calculated as: CV = σ(binwidths) / μ(binwidths)

High CV indicates that some bins span large ranges while others span small ranges, which is a strong indicator of natural clustering in the data.

Example

result = compare_quantile_vs_jenks(counties, :margin_pct, k=5)
# === COMPARING QUANTILES VS FISHER-JENKS ===
# 
# QUANTILES (k=5):
#   Bin 1: [-0.547, -0.263] - width: 0.284 - count: ~20.0%
#   Bin 2: [-0.263, -0.109] - width: 0.154 - count: ~20.0%
#   Bin 3: [-0.109, 0.176] - width: 0.285 - count: ~20.0%
#   Bin 4: [0.176, 0.426] - width: 0.250 - count: ~20.0%
#   Bin 5: [0.426, 0.831] - width: 0.405 - count: ~20.0%
#   Width CV: 0.342
# 
# INTERPRETATION:
# → Moderate variation in quantile widths
#   Either method could work—depends on communication goals

Rationale

When quantile bins have very different widths, it means observations are unevenly distributed across the value range. This is exactly what Fisher-Jenks is designed to handle by finding optimal breakpoints between clusters.

Notes

This function analyzes quantiles only; for actual Fisher-Jenks breaks, use a dedicated package like NaturalBreaks.jl or implement the algorithm
By definition, quantile bins always have equal counts (~100/k percent each)
The diagnostic focuses on bin width variation as a proxy for clustering

source

JuliaMapping.compare_skewness — Method

compare_skewness(df::DataFrame, column::Symbol)

Compare skewness before and after log transformation.

Arguments

df: DataFrame containing the data
column: Column name to analyze

Returns

NamedTuple with original and log-transformed skewness

source

JuliaMapping.compute_fixed_intervals — Function

compute_fixed_intervals(dfs::Vector{DataFrame}, col::Symbol, n_bins::Int=5) -> Vector{Float64}

Calculate fixed equal-interval breaks across multiple DataFrames for consistent map series comparisons.

When creating a series of choropleth maps (e.g., across time periods or regions), using consistent bin breaks enables meaningful visual comparison. This function computes global min/max across all datasets and creates equal-width intervals.

Arguments

dfs::Vector{DataFrame}: Vector of DataFrames to analyze
col::Symbol: Column name present in all DataFrames
n_bins::Int=5: Number of bins to create

Returns

Vector{Float64}: Bin break points (length = n_bins + 1)
- First element is global minimum
- Last element is global maximum
- Interior elements divide range into equal widths

Use Cases

Time series maps (comparing same region across years)
Atlas-style maps (comparing different regions)
Multi-panel comparisons where color scales must match

Example

# Compare election results across 2020 and 2024
breaks = compute_fixed_intervals([counties_2020, counties_2024], :margin_pct, 5)
# Use these breaks for both maps to enable direct comparison
# breaks will be something like: [-0.6, -0.36, -0.12, 0.12, 0.36, 0.6]

Notes

Ensures all maps use identical color-to-value mapping
May result in empty bins if data ranges differ substantially between DataFrames
Alternative to this approach: use quantiles computed on combined data

source

JuliaMapping.create_county_union — Function

create_county_union(counties::DataFrame, geometry_col=:geometry)

Create a union of all county geometries for use as a clipping boundary.

Arguments

counties: DataFrame containing county geometries, typically loaded from a shapefile
geometry_col: Symbol or string specifying the column name containing geometry data (default: :geometry)

Returns

An ArchGDAL geometry object representing the union of all county boundaries

Details

This function iteratively unions all county geometries to create a single boundary that can be used for clipping operations. The function includes progress reporting and error handling to manage large datasets with many counties.

Example

using GeoDataFrames, ArchGDAL

# Load county data
counties = GeoDataFrames.read("data/counties.shp")

# Create union for clipping
county_boundary = create_county_union(counties)

Notes

Progress is reported every 500 counties processed
Errors during union operations are caught and reported as warnings
The function assumes geometries are in the same coordinate reference system
Returns an ArchGDAL geometry suitable for spatial clipping operations

source

JuliaMapping.create_county_union — Method

create_county_union(df; geometry_col=:geometry)

Create a unified geometry representing the union of all county geometries in a DataFrame.

This function iteratively combines all county geometries using ArchGDAL's union operation, which is useful for creating a boundary for clipping or masking operations.

Arguments

df: DataFrame containing county geometry data
geometry_col: Symbol specifying the column name containing ArchGDAL geometry objects (default: :geometry)

Returns

An ArchGDAL geometry object representing the union of all input geometries

Details

Progress is printed every 500 counties for large datasets
Handles union errors gracefully with warning messages
The result can be used for clipping contours or other spatial operations

Example

county_union = create_county_union(county_df)
county_union = create_county_union(county_df, geometry_col=:geom)

source

JuliaMapping.create_filled_voting_contours! — Method

create_filled_voting_contours!(ga, df; geometry_col=:geometry, value_col=:republican_pct, resolution=150, colormap=:RdBu)

Create a filled contour plot (like a topographic map) showing spatial interpolation of a value across geographic regions.

Arguments

ga: An existing Makie plot axis/figure to add filled contours to
df: DataFrame containing geometry and value columns for interpolation
geometry_col::Symbol=:geometry: Column name containing ArchGDAL geometry objects (default: :geometry)
value_col::Symbol=:republican_pct: Column name containing numeric values to interpolate (default: :republican_pct) Can also use :democratic_pct, :third_party_pct, margin data, or any numeric column
resolution::Int=150: Grid resolution for interpolation (higher = smoother but slower)
colormap::Symbol=:RdBu: Makie colormap for the filled regions Use :RdBu for red-blue diverging (good for margins), :viridis for sequential data, etc.

Returns

Tuple of (filledcontours, interpolatedgridZ, xgrid, y_grid)

Details

Uses Gaussian kernel density weighting for smooth interpolation
Adaptive bandwidth based on data extent (slightly larger than create_voting_contours! for visibility)
Draws both filled regions and contour lines for dual representation
Filled regions have transparency (alpha=0.7) to show layer composition
Contour lines overlaid in black for clarity
Designed for political/electoral data visualization

Example

fig, ax, plt = scatter(counties_df.longitude, counties_df.latitude)
cf, Z, x_grid, y_grid = create_filled_voting_contours!(ax, counties_df, value_col=:margin_pct, colormap=:RdBu)

source

JuliaMapping.create_isopleth_rings — Function

create_isopleth_rings(centroids_geo, distances=[25, 50, 75, 100, 150])

Create nested isopleth rings around geographic centroids, similar to elevation contours.

Arguments

centroids_geo: Vector of Point2f objects representing geographic centroids (longitude, latitude)
distances: Vector of distances in miles for creating rings (default: [25, 50, 75, 100, 150])

Returns

Dictionary mapping distance values to ArchGDAL geometry objects representing nested rings

Details

This function creates concentric rings around each centroid at specified distances. The rings are created as nested zones where each ring represents the area between its distance and the previous distance (e.g., 25-50 miles, 50-75 miles, etc.). This creates an isopleth-like visualization similar to elevation contours on topographic maps.

The function:

Converts distances from miles to degrees (approximate conversion: 1° ≈ 69 miles)
Creates circles around each centroid at each distance
Unions circles at the same distance to create zones
Creates rings by taking the difference between consecutive zones

Example

using GeometryBasics, ArchGDAL

# Define centroids and distances
centroids = [Point2f(-74.0, 40.7), Point2f(-87.6, 41.9)]  # NYC, Chicago
distances = [50, 100, 150, 200]

# Create isopleth rings
rings = create_isopleth_rings(centroids, distances)

Notes

Distance conversion uses approximate factor of 69 miles per degree
First ring represents the area from 0 to the first distance
Subsequent rings represent areas between consecutive distances
Progress is reported for each distance being processed
Returns ArchGDAL geometries suitable for spatial operations and visualization

source

JuliaMapping.create_state_union — Function

create_state_union(states::DataFrame, geometry_col=:geometry)

Create a union of all state geometries for use as a clipping boundary.

Arguments

states: DataFrame containing state geometries, typically loaded from a shapefile
geometry_col: Symbol or string specifying the column name containing geometry data (default: :geometry)

Returns

An ArchGDAL geometry object representing the union of all state boundaries

Details

This function iteratively unions all state geometries to create a single boundary that can be used for clipping operations. The function includes progress reporting and error handling to manage large datasets with many states.

Example

using GeoDataFrames, ArchGDAL

# Load state data
states = GeoDataFrames.read("data/states.shp")

# Create union for clipping
state_boundary = create_state_union(states)

Notes

Progress is reported every 500 states processed
Errors during union operations are caught and reported as warnings
The function assumes geometries are in the same coordinate reference system
Returns an ArchGDAL geometry suitable for spatial clipping operations

source

JuliaMapping.create_voting_contours! — Method

create_voting_contours!(ga, df; geometry_col=:geometry, value_col=:republican_pct, resolution=200, levels=10)

Create smooth contour lines on an existing plot representing spatial interpolation of a value across geographic regions.

Arguments

ga: An existing Makie plot axis/figure to add contours to
df: DataFrame containing geometry and value columns for interpolation
geometry_col::Symbol=:geometry: Column name containing ArchGDAL geometry objects (default: :geometry)
value_col::Symbol=:republican_pct: Column name containing numeric values to interpolate (default: :republican_pct) Can also use :democratic_pct, :third_party_pct, or any numeric column
resolution::Int=200: Grid resolution for interpolation (higher = smoother but slower)
levels::Int=10: Number of contour levels to draw, or a vector of specific levels

Returns

Tuple of (contourlines, interpolatedgridZ, xgrid, y_grid)

Details

Uses Gaussian kernel density weighting for smooth interpolation
Adaptive bandwidth based on data extent for automatic smoothing
Contour lines are labeled with values
Designed for political/electoral data (margins, vote shares, etc.)

Example

fig, ax, plt = scatter(counties_df.longitude, counties_df.latitude)
cs, Z, x_grid, y_grid = create_voting_contours!(ax, counties_df, value_col=:margin_pct)

source

JuliaMapping.detect_clustering — Method

detect_clustering(df::DataFrame, col::Symbol; n_bins::Int=5) -> Vector{Int}

Identify natural clusters and gaps in data to determine if Fisher-Jenks binning is appropriate.

This function analyzes the distribution of gaps between consecutive sorted values to detect whether the data contains natural clusters. Large gaps suggest natural breakpoints that Fisher-Jenks optimization would identify effectively.

Arguments

df::DataFrame: Input DataFrame containing the data
col::Symbol: Column name to analyze
n_bins::Int=5: Number of bins to create (used for recommendation threshold)

Returns

Vector{Int}: Indices of locations with large gaps (potential natural breaks)

Interpretation

If number of large gaps ≥ n_bins - 1: Strong clustering detected → Use Fisher-Jenks
If fewer large gaps: Weak clustering → Quantiles may be simpler
Large gaps are defined as those exceeding mean + 1.5 × standard deviation

Details

The function:

Sorts data and calculates gaps between consecutive values
Identifies "large" gaps using statistical threshold
Produces a histogram of gap sizes
Recommends binning strategy based on clustering strength

Example

large_gaps = detect_clustering(counties, :margin_pct, n_bins=5)
# Prints analysis and shows gap distribution histogram
# Returns indices where natural breaks occur

source

JuliaMapping.dms_to_decimal — Method

dms_to_decimal(coords::AbstractString) -> String

Convert coordinates from degrees, minutes, seconds (DMS) format to decimal degrees (DD).

Arguments

coords::AbstractString: Coordinates in DMS format with flexible symbols and direction indicators

Returns

String: Coordinates in decimal degrees format "±DD.DDDD, ±DD.DDDD"

Format

Input format is flexible:

Degrees, minutes, and seconds can use °/′/″ symbols or be plain numbers
Direction can be N/North/n/north, S/South/s/south, E/East/e/east, W/West/w/west
Direction can appear before or after the coordinate
Latitude and longitude separated by comma
Spaces between components are optional

Example

# Various formats work
dms_to_decimal("42° 21′ 37″ N, 71° 03′ 28″ W")
dms_to_decimal("42 21 37 N, 71 03 28 W")
dms_to_decimal("North 42° 21′ 37″, West 71° 03′ 28″")
dms_to_decimal("42° 21′ 37″ north, 71° 03′ 28″ west")
dms_to_decimal("40° 26′ 46.302″ N, 79° 58′ 56.484″ W")

Throws

ArgumentError: If input format is invalid or coordinates are out of range

source

JuliaMapping.dots — Method

dots(df::DataFrame, dots::Int)

Calculate dot density values for wheat production visualization.

Arguments

df::DataFrame: DataFrame containing wheat production data with wheat2017bu column
dots::Int: Number of bushels represented by each dot

Returns

Vector{Int}: Number of dots needed for each row based on production levels

Examples

wheat_df = DataFrame(wheat2017bu = [5000, 12000, 800])
dot_counts = dots(wheat_df, 1000)  # Each dot represents 1000 bushels
# Returns: [5, 12, 0] (dots per county)

Notes

Uses floor division to ensure whole dots only
Specifically designed for wheat production dot density maps
Part of the agricultural visualization workflow
Returns 0 for counties with production below the dot threshold

source

JuliaMapping.extract_centroid — Method

extract_centroid(geometry)

Extract the centroid coordinates from a geometry object.

Arguments

geometry: An ArchGDAL geometry object

Returns

A tuple (y, x) containing the latitude and longitude coordinates of the centroid

Example

centroid_x, centroid_y = extract_centroid(geom)

source

JuliaMapping.format_breaks — Method

format_breaks(breaks::Vector{String}) -> Vector{String}

Formats a vector of string representations of ranges into a human-readable format.

Arguments

breaks::Vector{String}: A vector of strings, each representing a numerical range. Each string is expected to be in the format "start - end", where start and end are numerical values.

Returns

A Vector{String} where each element is a formatted range string in the form "start to end".
- The numerical values in the range are rounded to the nearest integer and formatted with commas for better readability.

Example

julia> breaks = ["1000 - 2000", "3000.5 - 4000.2", "500000 - 1000000"]; julia> format_breaks(breaks) ["1,000 to 2,000", "3,001 to 4,000", "500,000 to 1,000,000"]

source

JuliaMapping.format_table_as_text — Function

format_table_as_text(headers::Vector{String}, rows::Vector{Vector{String}}, padding::Int=2)

Format data as an ASCII table with borders and proper column alignment.

Arguments

headers::Vector{String}: Column header names
rows::Vector{Vector{String}}: Data rows, where each row is a vector of strings
padding::Int=2: Additional padding space around each cell (default: 2)

Returns

String: Formatted ASCII table with Unicode box-drawing characters

Examples

headers = ["Name", "Age", "City"]
rows = [["Alice", "25", "New York"], 
        ["Bob", "30", "Chicago"],
        ["Carol", "22", "Boston"]]
table = format_table_as_text(headers, rows)
# Returns a formatted table with borders and proper alignment

Notes

Uses Unicode box-drawing characters (┌─┬─┐│├─┼─┤└─┴─┘)
Automatically calculates column widths based on content
Each cell is padded for consistent spacing
Useful for creating publication-ready ASCII tables

source

JuliaMapping.get_gdp — Method

get_gdp(df::DataFrame, quarter::Int) -> DataFrame

Process a GDP dataset to extract state-level GDP data for a specified quarter.

This function cleans and processes a raw GDP DataFrame by:

Selecting the state names column and the specified quarter column
Removing header and footer rows that contain explanatory text
Converting GDP values from strings to integers (removing commas)
Filtering to include only valid US states
Sorting the results by state name

Arguments

df::DataFrame: Raw GDP data with state names in column 1 and quarterly data in subsequent columns
quarter::Int: Column index representing the desired quarter (e.g., 2 for Q1, 3 for Q2, etc.)

Returns

DataFrame: Cleaned dataset with columns :state and :gdp, sorted by state name

Notes

Assumes data starts at row 6 and state data ends 5 rows before the DataFrame end
Requires a global variable states containing valid state names for filtering
GDP values are converted to integers after removing comma separators
First set of quarters are in current dollars, second set are in constant dollars.
Which to use depends on the context of the analysis.
This approach is suitable for official data and other series that have very
consistent formatting from period to period. Otherwise, we would not be able to
hard code the rows to trim.

Example

cleaned_data = get_gdp(raw_gdp_df, 14)  # Extract Q4 data (assuming column 14 contains Q4)

source

JuliaMapping.get_nth_table — Function

get_nth_table(url::String, n::Int=1) -> DataFrame

Get the nth HTML table from a webpage and return as a DataFrame.

Arguments

url::String: The URL of the webpage to scrape
n::Int=1: The table index to get (1-based indexing). Defaults to 1 for the first table.

Returns

DataFrame: The fetched table with cleaned text content

Throws

HTTP.ExceptionRequest.StatusError: If the webpage cannot be accessed
ArgumentError: If no tables are found on the page
BoundsError: If the requested table index n exceeds the number of tables found

Examples

# Get first table from Wikipedia or other web page
url = "https://en.wikipedia.org/wiki/List_of_European_countries_by_area"
df = get_nth_table(url)

# Get second table
df2 = get_nth_table(url, 2)

# Save to CSV
using CSV
CSV.write("table_data.csv", df)

Notes

Headers are automatically detected from <th> elements
Cell text is cleaned by removing extra whitespace and newlines
If rows have different numbers of columns, they are padded or truncated to match headers
Generic column names are created if no headers are found

source

JuliaMapping.get_sheet — Method

get_sheet(file_name::String, sheet::Int) -> DataFrame

Read a specific sheet from an Excel file and return it as a DataFrame.

Arguments

file_name::String: Path to the Excel file (.xlsx format)
sheet::Int: Sheet number to read (1-indexed)

Returns

DataFrame: The Excel sheet data converted to a DataFrame

Examples

# Read the first sheet from an Excel file
df = get_sheet("data.xlsx", 1)

# Read the third sheet
df = get_sheet("sales_report.xlsx", 3)

source

JuliaMapping.hard_wrap — Method

hard_wrap(text::String, width::Int)

Hard-wrap text at the specified width, breaking at word boundaries when possible. Each line is right-padded to the specified width.

Arguments

text::String: The text to wrap
width::Int: Maximum line width in characters

Returns

String: Text with line breaks inserted and each line padded to width

Examples

text = "This is a long sentence that will be wrapped at word boundaries."
wrapped = hard_wrap(text, 20)
# Returns text wrapped to 20 characters per line with padding

Notes

Right-pads each line to exactly the specified width
Attempts to break at word boundaries when possible
If a word is longer than the width, it will be broken mid-word
Useful for creating fixed-width text layouts

source

JuliaMapping.haversine_distance_km — Method

haversine_distance_km(lon1, lat1, lon2, lat2)

Calculate geodesic distance between two points using the Haversine formula.

source

JuliaMapping.inspect_shp — Method

inspect_shp(path::String)

Prints the structure and field names of a shapefile for inspection.

Arguments

path::String: Path to the shapefile (.shp file)

Details

This function reads a shapefile and prints:

Layer name
Feature count (number of records)
All field/column names available in the shapefile

Example

inspect_shp("/path/to/data.shp")
# Output:
# Layer name: data
# Feature count: 1234
# Fields:
#  - ID
#  - NAME
#  - geometry

source

JuliaMapping.is_font_available — Method

is_font_available(font_name::AbstractString) -> Bool

Return true if a font with the given name can be located by FreeTypeAbstraction.findfont, and false otherwise.

This function attempts to resolve font_name to an installed font using FreeTypeAbstraction’s font-discovery mechanism. If the font is found, true is returned. If findfont throws an exception—such as when the font is missing or not registered on the system—the function catches the error and returns false.

Examples

```julia julia> isfontavailable("Helvetica") true

julia> isfontavailable("NonexistentFontXYZ") false

source

JuliaMapping.log_dist — Method

log_dist(df::DataFrame, col::Symbol)

Plot the log₁₀-transformed distribution of a numeric column with both histogram and kernel density overlay.

Arguments

df::DataFrame: Input DataFrame containing the numeric column.
col::Symbol: Column name to visualize.

Details

The function computes log10(x + 1) for all nonmissing values in df[!, col], then draws:

A normalized histogram (PDF scaling).
A smooth kernel density estimate overlay.

Returns

Displays an AlgebraOfGraphics plot showing the log-transformed distribution.

Example

log_dist(df, :population)

source

JuliaMapping.make_combined_table — Function

make_combined_table(data::DataFrame, half::String, formatted_breaks::Vector{String} = formatted_breaks)

Create a formatted summary table of population statistics by bins.

Generates a text-based table showing population totals, percentages, and cumulative statistics for either the first or second half of population bins. Useful for creating small multiple displays with separate tables for each half.

Arguments

data::DataFrame: DataFrame containing :bin and :population columns
half::String: Either "first" (bins 1-4) or "second" (bins 5-8)
formatted_breaks::Vector{String}: Vector of formatted range labels (default: formatted_breaks)

Returns

String containing a formatted text table with columns:
- Interval: The population range
- Population: Total population in the bin
- Percent: Percentage of national total
- Cumulative: Cumulative population
- Cumulative: Cumulative percentage

Details

Groups data by bin and sums population
Calculates percentages relative to national total
Computes cumulative population and percentages
Formats numbers with commas for readability
Uses PrettyTables with right-aligned columns

Example

table_text = make_combined_table(county_data, "first", formatted_breaks)
println(table_text)

Notes

The table is formatted as a string suitable for display or saving to file.

source

JuliaMapping.make_geographic_circle — Function

make_geographic_circle(center_geo, radius_deg, n_pts=64)

Create a circular polygon in geographic coordinates (longitude/latitude in degrees).

Arguments

center_geo: Tuple or vector containing (longitude, latitude) of the circle center in degrees
radius_deg: Radius of the circle in degrees
n_pts: Number of points to approximate the circle (default: 64)

Returns

Vector of Point2f objects representing the circle vertices

Details

This function creates a circular polygon by generating points around the center at regular angular intervals. The function accounts for latitude distortion by dividing the longitude offset by the cosine of the latitude, ensuring the circle appears approximately circular on a map projection.

Example

# Create a circle centered at (0°N, 0°E) with 1-degree radius
circle_points = make_geographic_circle((0.0, 0.0), 1.0, 32)

Notes

The circle is created in geographic coordinates (WGS84, EPSG:4326)
Latitude distortion correction is applied to maintain circular appearance
Enter π and θ symbols in the REPL with \piTAB and \thetaTAB

source

JuliaMapping.make_marker — Method

make_marker(n::Int, size::Real, shape::String) -> BezierPath

Create a custom marker composed of multiple geometric shapes arranged in a grid pattern for Makie plots.

Arguments

n::Int: Number of shapes to include in the marker (arranged in a grid)
size::Real: Size of each individual shape in the marker
shape::String: Type of shape to draw. Supported values:
- "+": Plus sign (cross with horizontal and vertical bars)
- "_": Underscore (horizontal bar)
- "±": Plus-minus sign (horizontal bars with vertical bar on top)
- "#": Hash sign (two horizontal and two vertical bars forming a grid)
- "*": Asterisk (plus with two diagonal bars)
- "=": Equal sign (two horizontal bars)
- "|": Vertical bar
- ":": Colon (two dots vertically aligned)

Returns

BezierPath: A Makie BezierPath object containing all shapes arranged in a grid

Details

The shapes are arranged in a square grid pattern with automatic spacing. The grid size is calculated as ceil(sqrt(n)) to accommodate all n shapes. Line thickness is set to 20% of the specified size, and spacing between shapes is 3× the size parameter.

Examples

using Makie, CairoMakie

# Create a marker with 4 plus signs
marker = make_marker(4, 10.0, "+")

# Use in a scatter plot
fig = Figure()
ax = Axis(fig[1, 1])
scatter!(ax, [1, 2, 3], [1, 2, 3], marker=marker, markersize=30)
fig

# Create markers with different shapes
plus_marker = make_marker(9, 5.0, "+")
underscore_marker = make_marker(6, 5.0, "_")
plusminus_marker = make_marker(4, 5.0, "±")
hash_marker = make_marker(4, 5.0, "#")
asterisk_marker = make_marker(9, 5.0, "*")
equal_marker = make_marker(6, 5.0, "=")
pipe_marker = make_marker(4, 5.0, "|")
colon_marker = make_marker(6, 5.0, ":")

Throws

error: If shape is not one of the supported values ("+", "_", "±", "#", "*", "=", "|", ":")

source

JuliaMapping.percent — Method

percent(x::Float64)

Convert a decimal value to a percentage string with two decimal places.

Takes a decimal value (e.g., 0.5) and converts it to a formatted percentage string (e.g., "50.0%").

Arguments

x::Float64: A decimal value (typically between 0.0 and 1.0)

Returns

A string representation of the percentage with two decimal places and a "%" suffix

Example

percent(0.5)        # Returns "50.0%"
percent(0.7532)     # Returns "75.32%"
percent(0.123456)   # Returns "12.35%"
percent(1.0)        # Returns "100.0%"

Notes

Values are multiplied by 100 and rounded to 2 decimal places
Works with any Float64 value, not just those between 0 and 1

source

JuliaMapping.pick_random_subset — Method

pick_random_subset(data, target_sum; rng=Random.default_rng())

Select a random subset of rows from a DataFrame where a count column sums to a target value.

Convenience wrapper around uniform_subset_sum_indices that works directly with DataFrames containing a :Count column.

Arguments

data: DataFrame with a :Count column containing integer values
target_sum: The desired sum of the :Count column
rng: Random number generator (default: Random.default_rng())

Returns

A subset of the input DataFrame (rows whose :Count values sum to target_sum)
Returns empty DataFrame if no valid subset exists

Example

using DataFrames
deaths = DataFrame(
    location = ["A", "B", "C", "D", "E"],
    Count = [5, 10, 15, 20, 25]
)
subset = pick_random_subset(deaths, 40)
# Returns rows that sum to exactly 40 deaths, e.g., rows with counts [5, 15, 20]

Notes

Assumes the DataFrame has a :Count column
Each valid subset has equal probability of selection
Useful for bootstrap sampling with exact sample size constraints

source

JuliaMapping.plot_colorscheme_grid — Method

plot_colorscheme_grid(schemes; ncol=2)

Create a grid display of ColorScheme colorbars for visual comparison.

Arguments

schemes::Vector: Vector of ColorScheme names (as symbols) or ColorScheme objects to display

Keywords

ncol::Int=2: Number of columns in the grid layout

Returns

Figure: A CairoMakie Figure object containing the colorbar grid

Description

Displays multiple ColorSchemes as horizontal colorbars arranged in a grid layout. Each colorbar is labeled with its scheme name. Useful for comparing color palettes or selecting appropriate schemes for data visualization.

The figure height automatically adjusts based on the number of rows needed. Each colorbar is displayed horizontally with a fixed height of 30 pixels.

Example

# Display common sequential schemes
schemes = [:viridis, :plasma, :inferno, :magma]
f = plot_colorscheme_grid(schemes; ncol=2)
display(f)

# Compare diverging schemes
diverging = [:RdBu, :RdYlBu, :PiYG, :BrBG]
f = plot_colorscheme_grid(diverging; ncol=2)

# Display all schemes in a category
sequential = [:viridis, :plasma, :inferno, :magma, :cividis, :twilight]
f = plot_colorscheme_grid(sequential; ncol=3)

See Also

ColorSchemes.colorschemes - Dictionary of all available ColorSchemes
Browse schemes at: https://juliagraphics.github.io/ColorSchemes.jl/stable/catalogue/

source

JuliaMapping.plot_county_interval — Method

plot_county_interval(data::DataFrame, f::Figure, brk::Int64, formatted_breaks::Vector{String})

Plot counties within a specific population bin on a small multiples figure.

Creates a single panel in a 2×2 grid showing counties that fall within a particular population bin, highlighted against a background of all counties. Part of a small multiples visualization.

Arguments

data::DataFrame: DataFrame containing county geometries and :bin classification
f::Figure: Makie Figure object to add the plot to
brk::Int64: Bin number to plot (1-8)
formatted_breaks::Vector{String}: Vector of formatted range labels for titles

Returns

Nothing. Modifies the figure in place by adding a GeoAxis panel.

Details

Creates a GeoAxis in EPSG:5070 projection (US Albers Equal Area)
Positions panels in a 2×2 grid (bins 5-8 map to positions 1-4 for second figure)
Row calculation: ceil(display_brk / 2)
Column calculation: odd bins → column 1, even bins → column 2
Background: All counties in white with thin gray borders
Foreground: Counties in current bin colored using YlGn (Yellow-Green) colorscheme
Title shows the formatted population range
Decorations (axes, ticks) are hidden

Example

f = Figure(resolution=(1200, 1000))
for brk in 1:4
    plot_county_interval(county_data, f, brk, formatted_breaks)
end

Notes

Designed for creating small multiple displays showing population distribution patterns.

source

JuliaMapping.plot_named_color_groups — Method

plot_named_color_groups(title, names; ncol=6, cell=(140,80), gap=(8,8), figsize=(1600,900))

Create a visual grid display of named colors with their RGB values.

Arguments

title::AbstractString: The title to display at the top of the plot
names::Vector{<:AbstractString}: Vector of color names to display

Keywords

ncol::Int=6: Number of columns in the grid layout
cell::Tuple{Int,Int}=(140,80): Width and height of each color cell in pixels
gap::Tuple{Int,Int}=(8,8): Horizontal and vertical gap between cells in pixels
figsize::Tuple{Int,Int}=(1600,900): Overall figure size in pixels

Returns

Figure: A CairoMakie Figure object containing the color grid

Description

Each color swatch displays:

The color name (centered at top)
RGB values as decimal numbers (bottom left)
Text color automatically chosen (black/white) based on background brightness

Example

# Display a select red colors
reds = ["red", "crimson", "darkred", "firebrick"]
f = plot_named_color_groups("Red Colors", reds; ncol=4)
display(f)
# All reds
f = plot_named_color_groups("All Red Colors", :reds)

Throws

Error if any color name in names is not recognized by Colors.jl

source

JuliaMapping.polygon_to_archgdal — Method

polygon_to_archgdal(poly::Polygon)

Convert a GeometryBasics.Polygon to an ArchGDAL geometry object.

Arguments

poly: A Polygon object from GeometryBasics containing the polygon vertices

Returns

An ArchGDAL polygon geometry object that can be used for spatial operations

Details

This function extracts the exterior ring coordinates from a GeometryBasics.Polygon and creates a corresponding ArchGDAL polygon. The function automatically closes the ring if it's not already closed by duplicating the first point at the end.

Example

using GeometryBasics, ArchGDAL

# Create a simple polygon
points = [Point2f(0,0), Point2f(1,0), Point2f(1,1), Point2f(0,1)]
poly = Polygon(points)

# Convert to ArchGDAL geometry
gdal_poly = polygon_to_archgdal(poly)

Notes

The function handles both closed and unclosed polygon rings
Returns an ArchGDAL geometry suitable for spatial analysis operations
Coordinate order is preserved (longitude, latitude for geographic data)

source

JuliaMapping.pump_comparison_test — Function

pump_comparison_test(pump_1_mean, other_means, n_permutations=10000)

Perform a permutation test to compare one pump against the mean of other pumps.

Tests whether Pump 1 has a significantly lower mean than other pumps using a one-tailed permutation test. Inspired by John Snow's cholera analysis comparing the Broad Street pump to other water pumps in London.

Arguments

pump_1_mean: Mean value for the pump of interest (e.g., mean deaths)
other_means: Vector of mean values for comparison pumps
n_permutations: Number of permutations for the test (default: 10000)

Returns

A tuple (p_value, observed_diff) where:

p_value: Proportion of permutations with difference ≤ observed difference
observed_diff: Observed difference (Pump 1 mean - mean of others)

Details

Null hypothesis: All pumps come from the same distribution
Tests if Pump 1 is unusually low (one-tailed test)
Prints the observed difference to console
Randomly permutes pump labels and recalculates the test statistic

Example

pump_1 = 8.5  # Mean deaths near Broad Street pump
others = [3.2, 2.8, 3.5, 2.9, 3.1]  # Mean deaths near other pumps
p_value, diff = pump_comparison_test(pump_1, others, 10000)
println("P-value: $p_value")
# If p_value < 0.05, Pump 1 has significantly higher mean deaths

Notes

Small p-values suggest Pump 1 is unusually high (or low, depending on metric)
Based on resampling without replacement (permutation test)
Named after John Snow's 1854 cholera outbreak investigation

source

JuliaMapping.raw_hist — Method

quick_hist(df::DataFrame, column::Symbol; bins=20)

Create a histogram with negative values colored brown and positive values colored green.

Arguments

df: DataFrame containing the data
column::Symbol: Column name to plot
bins=20: Number of histogram bins (default: 20)

Returns

Figure object from AlgebraOfGraphics

Example

fg = quick_hist(df, :Population)
fg = quick_hist(df, :Population, bins=30)

source

JuliaMapping.ripleys_k — Method

ripleys_k(coords, distances; area=nothing)

Calculate Ripley's K-function to detect spatial clustering or dispersion of point patterns.

Ripley's K-function measures the spatial distribution of points at various distance scales. Values higher than expected indicate clustering; lower values indicate dispersion.

Arguments

coords: 2×n matrix where each column is a point [x, y]
distances: Vector of distance thresholds to evaluate
area: Optional study area size; if nothing, calculated as bounding box area

Returns

Vector of K-function values, one for each distance in distances

Formula

For distance d: K(d) = (A/n²) × 2 × (number of point pairs within distance d)

Where:

A = study area
n = number of points

Interpretation

K(d) ≈ πd² for complete spatial randomness (CSR)
K(d) > πd² suggests clustering at distance d
K(d) < πd² suggests dispersion/regularity at distance d
Often transformed to L(d) = √(K(d)/π) - d for easier interpretation

Example

# Points in 2D space (cholera deaths)
coords = [1.0 2.0 3.0 5.0 5.5;
          1.0 1.5 2.0 5.0 5.2]
distances = [0.5, 1.0, 2.0, 3.0]
k_values = ripleys_k(coords, distances)

# Check for clustering
for (d, k) in zip(distances, k_values)
    expected = π * d^2
    println("Distance $d: K=$k (expected=$expected)")
end

Notes

Computational complexity: O(n² × length(distances))
Named after statistician Brian Ripley
Commonly used in spatial epidemiology and ecology
John Snow's cholera analysis can be analyzed with this function

source

JuliaMapping.scaled_dist — Method

scaleddist(df::DataFrame, col::Symbol; bandwidthpct=0.05, bins=30)

Plot the distribution of a numeric column with adaptive kernel bandwidth scaling.

Arguments

df::DataFrame: Input DataFrame.
col::Symbol: Column name to visualize.
bandwidth_pct::Float64=0.05: Fraction of the data range used as bandwidth

for the kernel density estimate.

bins::Int=30: Number of histogram bins.

Details

Computes a histogram normalized as a PDF, and overlays a kernel density curve with bandwidth proportional to the column range (bandwidth = range * bandwidth_pct).

Returns

Displays an AlgebraOfGraphics plot showing the histogram and scaled density.

Example

scaleddist(df, :income; bandwidthpct=0.03, bins=40)

source

JuliaMapping.show_named_color_groups — Method

show_named_color_groups()

Print the available named color groups to the console.

Displays the names of predefined color categories that can be used with plot_named_color_group(). These correspond to color arrays defined in the included "named_colors.jl" file.

Example

show_named_color_groups()
# Output: whites reds oranges yellows greens cyans blues purples pinks browns grays

source

JuliaMapping.split_string_into_n_parts — Method

split_string_into_n_parts(text::String, n::Int)

Split a string into n approximately equal parts, inserting newlines at word boundaries.

Arguments

text::String: The text string to split
n::Int: Number of parts to split the string into

Returns

String: The original text with newlines inserted to create n parts

Examples

text = "This is a long text that needs to be split into multiple parts for better formatting."
result = split_string_into_n_parts(text, 3)
# Returns text split into 3 parts with newlines at word boundaries

Notes

Attempts to break at word boundaries when possible
If a word is longer than the target part length, it will be broken mid-word
Each part will be approximately equal in length
Preserves original spacing between words

source

JuliaMapping.uniform_subset_sum_indices — Method

uniform_subset_sum_indices(counts::AbstractVector{<:Integer}, target::Integer; rng=Random.default_rng())

Find indices of elements that sum to a target value using uniform random selection via dynamic programming.

Uses a dynamic programming approach to uniformly sample from all possible subsets of counts that sum exactly to target. Each valid subset has an equal probability of being selected.

Arguments

counts::AbstractVector{<:Integer}: Vector of integer values (e.g., counts per category)
target::Integer: The desired sum
rng: Random number generator (default: Random.default_rng())

Returns

Vector of indices whose corresponding counts values sum to target
Returns empty Int[] if no valid subset exists

Algorithm

Builds a DP table counting all possible ways to achieve each sum
Uses BigInt arithmetic to handle large combinatorial counts
Backtracks through the DP table, randomly choosing to include/exclude each element
Inclusion probability = (ways to include) / (total ways from this state)

Example

counts = [10, 20, 30, 40, 50]
target = 80
indices = uniform_subset_sum_indices(counts, target)
# Might return [1, 3, 4] since counts[[1,3,4]] = [10,30,40] sum to 80
sum(counts[indices]) == target  # Always true if indices is non-empty

Notes

Useful for statistical sampling when you need exactly N items from categories
Computationally intensive for large vectors or large target values
Related to the subset sum problem, but with uniform random selection

source

JuliaMapping.with_commas — Method

with_commas(x)

Format numeric values with comma separators for thousands.

Converts numeric values to Int64 and adds comma separators using the Humanize.digitsep function, making large numbers more readable.

Arguments

x: A numeric value or array of numeric values

Returns

A string or array of strings with comma-separated digits

Example

with_commas(1000000)        # Returns "1,000,000"
with_commas([1234, 5678])   # Returns ["1,234", "5,678"]
with_commas(1234.56)        # Returns "1,235" (rounded to Int64)

Notes

Values are converted to Int64, so decimal portions are truncated/rounded
Uses the Humanize package for formatting

source

JuliaMapping.@ensure_types — Macro

@ensure_types(df, type_specs...)

Ensure that specified columns in a DataFrame have the correct data types by performing automatic type conversions.

Arguments

df: The DataFrame to modify
type_specs...: Variable number of type specifications in the format column::Type

Type Specifications

Each type specification should be in the format column::Type where:

column is the column name (Symbol or String)
Type is the target Julia type (e.g., Int, Float64, String)

Supported Conversions

String to Integer: Uses parse() to convert string representations of numbers
String to Float: Uses parse() to convert string representations of floating-point numbers
Float to Integer: Uses round() to convert floating-point numbers to integers
Other conversions: Uses convert() for general type conversions

Examples

# Convert Population to Int and Expend to Float64
@ensure_types df Population::Int Expend::Float64

# Convert multiple columns at once
@ensure_types df Deaths::Int Population::Int Expend::Float64

Notes

The macro modifies the DataFrame in-place
Prints progress messages for successful conversions
Issues warnings for columns that don't exist
Throws errors for conversion failures
Returns the modified DataFrame

source