JuliaMapping Documentation
Welcome to the JuliaMapping.jl documentation.
Overview
JuliaMapping is a Julia package designed for mapping and geospatial analysis tasks.
Getting Started
using JuliaMappingAPI Reference
JuliaMapping.add_col_totals — Method
add_col_totals(df; total_row_name="Total", cols_to_sum=nothing)Append a final row with per-column totals. Non-numeric columns get a type-compatible label.
JuliaMapping.add_row_totals — Method
add_row_totals(df; total_col_name="Total", cols_to_sum=nothing)Add a column with per-row totals (skips missing). By default sums all numeric columns.
JuliaMapping.add_totals — Method
add_totals(df; total_row_name="Total", total_col_name="Total", cols_to_sum=nothing, format_commas=false)Add both a row totals column and a bottom totals row (which also totals the new column). Optionally formats all numeric values with comma separators.
JuliaMapping.analyze_skewness — Method
analyze_skewness(df::DataFrame, column::Symbol)Analyze the skewness of a dataset with multiple metrics.
Arguments
df: DataFrame containing the datacolumn: Column name to analyze
Returns
- NamedTuple with skewness metrics
JuliaMapping.assess_data_spread — Function
assess_data_spread(df::DataFrame, col::Symbol, n_bins::Int=5)Evaluate the distribution of data across equal-interval bins to determine if equal-interval binning is appropriate for the specified column.
Arguments
df::DataFrame: Input dataframe containing the datacol::Symbol: Column name to analyzen_bins::Int=5: Number of bins to create for the analysis
Returns
Vector{Int}: Array containing the count of observations in each bin
Output
Prints a detailed bin distribution report and recommendation for binning strategy. Warns if any bins contain less than 5% of total observations, suggesting alternative methods like Fisher-Jenks or quantile-based binning for better balance.
Example
df = DataFrame(value = rand(1000))
bin_counts = assess_data_spread(df, :value, 10)JuliaMapping.assess_uniform_distribution — Method
assess_uniform_distribution(df::DataFrame, col::Symbol)Analyze whether data follows a uniform distribution pattern, helping determine if equal-interval binning is suitable for the specified column.
Arguments
df::DataFrame: Input dataframe containing the datacol::Symbol: Column name to analyze for uniformity
Returns
NamedTuple{(:skewness, :interval_cv), Tuple{Float64, Float64}}: Tuple containing:skewness: Measure of distribution asymmetry (0 indicates symmetry)interval_cv: Coefficient of variation for quantile intervals
Output
Prints skewness, interval coefficient of variation, and a uniformity assessment. Also generates a histogram visualization via raw_hist().
Notes
- Skewness near 0 and interval CV < 0.3 suggest suitability for equal intervals
- Higher values indicate consideration of alternative binning methods
Example
df = DataFrame(value = rand(1000))
stats = assess_uniform_distribution(df, :value)JuliaMapping.bullseye — Method
bullseye(capital::String, capital_coords::String)Create an interactive HTML map with concentric circles (bullseye) centered on the specified capital city.
This function generates a Leaflet-based interactive map displaying concentric circles at fixed radii of 50, 100, 200, and 400 miles from the specified capital. The map includes a marker at the center point and a legend showing the distance bands with their corresponding colors.
Arguments
capital::String: The name of the capital city (used for the marker popup and output filename)capital_coords::String: Coordinates in DMS (Degrees, Minutes, Seconds) format "DD° MM′ SS″ N/S, DD° MM′ SS″ E/W"
Details
- Uses OpenStreetMap tiles for the base map
- Concentric circles are drawn at 50, 100, 200, and 400 miles from center
- Default color scheme:
#D32F2F,#388E3C,#1976D2,#FBC02D,#7B1FA2 - Requires the
dms_to_decimalfunction to convert coordinates
Returns
Nothing. Creates an HTML file and opens it in the default web browser.
Output Files
Creates an HTML file named "{capital}.html" in the current working directory.
Example
bullseye("Nashville", "36° 09′ 44″ N, 86° 46′ 28″ W")
# Creates "Nashville.html" and opens it in the browserNotes
The generated HTML file is self-contained and can be shared or hosted independently.
JuliaMapping.check_outlier_emphasis — Method
check_outlier_emphasis(df::DataFrame, col::Symbol)Identify and quantify outliers in data to determine if equal-interval binning would appropriately highlight extreme values.
Arguments
df::DataFrame: Input dataframe containing the datacol::Symbol: Column name to analyze for outliers
Returns
Nothing (results are printed to console)
Output
Prints the percentage of outliers detected using the IQR method (1.5 × IQR rule). Provides recommendations based on outlier prevalence:
- If >5% outliers: Confirms equal intervals will highlight these extremes
- Suggests quantiles as alternative for balanced visualization
Method
Uses Tukey's fence method: outliers are values outside [Q1 - 1.5×IQR, Q3 + 1.5×IQR]
Example
df = DataFrame(value = [rand(95); rand(5) .* 100]) # Data with outliers
check_outlier_emphasis(df, :value)JuliaMapping.choose_binning_for_margins — Method
choose_binning_for_margins(df::DataFrame; k::Int=5) -> NothingComprehensive analysis and recommendation for binning political margin data in choropleth maps.
This function integrates multiple diagnostic approaches specifically tailored for political margin/vote share data. It considers skewness, clustering patterns, bin width variation, and domain-specific characteristics (competitive vs. landslide districts) to provide an evidence-based binning recommendation.
Arguments
df::DataFrame: Input DataFrame containing margin data (must have:margin_pctcolumn)k::Int=5: Number of bins to create
Returns
Nothing: Function prints comprehensive analysis and recommendation
Details
The function performs the following analyses:
- Skewness analysis: Evaluates distribution asymmetry
- Clustering detection: Identifies natural breaks in the data
- Quantile comparison: Assesses bin width variability
- Domain analysis: Calculates percentages of competitive and landslide districts
Domain-Specific Thresholds
- Competitive districts: |margin| < 10% (±0.1)
- Landslide districts: |margin| > 30% (±0.3)
Recommendation Logic
Recommends Fisher-Jenks if:
30% of districts are competitive AND <10% are landslides (high clustering)
- Quantile width CV > 2.0 (strong evidence of natural clusters)
Recommends Quantiles if:
- Skewness > 1.5 (extreme skew requiring visual balance)
- Data is more uniformly distributed
Example
choose_binning_for_margins(counties, k=5)
# === BINNING RECOMMENDATION FOR MARGIN DATA ===
#
# [Skewness Analysis output]
# [Clustering Analysis output]
# [Quantile Comparison output]
#
# === DOMAIN-SPECIFIC ANALYSIS ===
# Competitive districts (±10%): 1245 (39.8%)
# Landslide districts (>30%): 234 (7.5%)
#
# === FINAL RECOMMENDATION ===
# ✓ Use FISHER-JENKS
# Rationale: High concentration of competitive districts suggests
# natural clustering that Jenks will reveal better than quantilesUse Cases
- Electoral data visualization (vote margins, partisan lean)
- Policy analysis (approval ratings, opinion polls)
- Any ratio/percentage data with potential clustering around central values
Prerequisites
Requires that the DataFrame has a :margin_pct column. For other margin columns, modify the function or create a standardized column first.
See also
analyze_skewness, detect_clustering, compare_quantile_vs_jenks
JuliaMapping.clip_rings_to_states — Method
clip_rings_to_states(rings, state_union)Clip distance rings to state boundaries using spatial intersection.
Arguments
rings: Dictionary mapping distance values to ArchGDAL geometry objects representing distance ringsstate_union: ArchGDAL geometry object representing the union of all state boundaries
Returns
- Dictionary with the same keys as input
rings, containing clipped geometry objects
Details
This function clips each distance ring to the state boundaries by computing the spatial intersection between each ring and the state union. If clipping fails for any ring, the original unclipped geometry is retained to ensure the workflow continues.
Example
using ArchGDAL
# Create distance rings and state union
rings = Dict(25 => ring_25_miles, 50 => ring_50_miles, 75 => ring_75_miles)
state_boundary = create_state_union(states)
# Clip rings to state boundaries
clipped_rings = clip_rings_to_states(rings, state_boundary)Notes
- Progress is reported for each ring being clipped
- Errors during intersection operations are caught and reported as warnings
- Original geometries are preserved if clipping fails
- Returns geometries suitable for visualization and further spatial analysis
JuliaMapping.compare_quantile_vs_jenks — Method
compare_quantile_vs_jenks(df::DataFrame, col::Symbol; k::Int=5) -> NamedTupleCompare quantile and Fisher-Jenks binning strategies by analyzing bin width variability.
This function computes quantile breaks and evaluates whether the resulting bins have consistent widths. High variability in bin widths suggests that data has natural clustering that Fisher-Jenks would capture more effectively.
Arguments
df::DataFrame: Input DataFrame containing the datacol::Symbol: Column name to analyzek::Int=5: Number of bins to create
Returns
NamedTuplewith fields:quantile_breaks: Vector of k+1 break points from quantile methodwidth_cv: Coefficient of variation of quantile bin widths
Interpretation
width_cv < 1.0: Quantile widths are relatively uniform → Quantiles appropriate1.0 ≤ width_cv ≤ 2.0: Moderate variation → Either method works, depends on goalswidth_cv > 2.0: Highly variable widths → Data has clusters, Fisher-Jenks recommended
Details
The coefficient of variation (CV) is calculated as: CV = σ(binwidths) / μ(binwidths)
High CV indicates that some bins span large ranges while others span small ranges, which is a strong indicator of natural clustering in the data.
Example
result = compare_quantile_vs_jenks(counties, :margin_pct, k=5)
# === COMPARING QUANTILES VS FISHER-JENKS ===
#
# QUANTILES (k=5):
# Bin 1: [-0.547, -0.263] - width: 0.284 - count: ~20.0%
# Bin 2: [-0.263, -0.109] - width: 0.154 - count: ~20.0%
# Bin 3: [-0.109, 0.176] - width: 0.285 - count: ~20.0%
# Bin 4: [0.176, 0.426] - width: 0.250 - count: ~20.0%
# Bin 5: [0.426, 0.831] - width: 0.405 - count: ~20.0%
# Width CV: 0.342
#
# INTERPRETATION:
# → Moderate variation in quantile widths
# Either method could work—depends on communication goalsRationale
When quantile bins have very different widths, it means observations are unevenly distributed across the value range. This is exactly what Fisher-Jenks is designed to handle by finding optimal breakpoints between clusters.
Notes
- This function analyzes quantiles only; for actual Fisher-Jenks breaks, use a dedicated package like
NaturalBreaks.jlor implement the algorithm - By definition, quantile bins always have equal counts (~100/k percent each)
- The diagnostic focuses on bin width variation as a proxy for clustering
See also
JuliaMapping.compare_skewness — Method
compare_skewness(df::DataFrame, column::Symbol)Compare skewness before and after log transformation.
Arguments
df: DataFrame containing the datacolumn: Column name to analyze
Returns
- NamedTuple with original and log-transformed skewness
JuliaMapping.compute_fixed_intervals — Function
compute_fixed_intervals(dfs::Vector{DataFrame}, col::Symbol, n_bins::Int=5) -> Vector{Float64}Calculate fixed equal-interval breaks across multiple DataFrames for consistent map series comparisons.
When creating a series of choropleth maps (e.g., across time periods or regions), using consistent bin breaks enables meaningful visual comparison. This function computes global min/max across all datasets and creates equal-width intervals.
Arguments
dfs::Vector{DataFrame}: Vector of DataFrames to analyzecol::Symbol: Column name present in all DataFramesn_bins::Int=5: Number of bins to create
Returns
Vector{Float64}: Bin break points (length =n_bins + 1)- First element is global minimum
- Last element is global maximum
- Interior elements divide range into equal widths
Use Cases
- Time series maps (comparing same region across years)
- Atlas-style maps (comparing different regions)
- Multi-panel comparisons where color scales must match
Example
# Compare election results across 2020 and 2024
breaks = compute_fixed_intervals([counties_2020, counties_2024], :margin_pct, 5)
# Use these breaks for both maps to enable direct comparison
# breaks will be something like: [-0.6, -0.36, -0.12, 0.12, 0.36, 0.6]Notes
- Ensures all maps use identical color-to-value mapping
- May result in empty bins if data ranges differ substantially between DataFrames
- Alternative to this approach: use quantiles computed on combined data
See also
JuliaMapping.create_county_union — Function
create_county_union(counties::DataFrame, geometry_col=:geometry)Create a union of all county geometries for use as a clipping boundary.
Arguments
counties: DataFrame containing county geometries, typically loaded from a shapefilegeometry_col: Symbol or string specifying the column name containing geometry data (default::geometry)
Returns
- An ArchGDAL geometry object representing the union of all county boundaries
Details
This function iteratively unions all county geometries to create a single boundary that can be used for clipping operations. The function includes progress reporting and error handling to manage large datasets with many counties.
Example
using GeoDataFrames, ArchGDAL
# Load county data
counties = GeoDataFrames.read("data/counties.shp")
# Create union for clipping
county_boundary = create_county_union(counties)Notes
- Progress is reported every 500 counties processed
- Errors during union operations are caught and reported as warnings
- The function assumes geometries are in the same coordinate reference system
- Returns an ArchGDAL geometry suitable for spatial clipping operations
JuliaMapping.create_county_union — Method
create_county_union(df; geometry_col=:geometry)Create a unified geometry representing the union of all county geometries in a DataFrame.
This function iteratively combines all county geometries using ArchGDAL's union operation, which is useful for creating a boundary for clipping or masking operations.
Arguments
df: DataFrame containing county geometry datageometry_col: Symbol specifying the column name containing ArchGDAL geometry objects (default::geometry)
Returns
- An ArchGDAL geometry object representing the union of all input geometries
Details
- Progress is printed every 500 counties for large datasets
- Handles union errors gracefully with warning messages
- The result can be used for clipping contours or other spatial operations
Example
county_union = create_county_union(county_df)
county_union = create_county_union(county_df, geometry_col=:geom)JuliaMapping.create_filled_voting_contours! — Method
create_filled_voting_contours!(ga, df; geometry_col=:geometry, value_col=:republican_pct, resolution=150, colormap=:RdBu)Create a filled contour plot (like a topographic map) showing spatial interpolation of a value across geographic regions.
Arguments
ga: An existing Makie plot axis/figure to add filled contours todf: DataFrame containing geometry and value columns for interpolationgeometry_col::Symbol=:geometry: Column name containing ArchGDAL geometry objects (default::geometry)value_col::Symbol=:republican_pct: Column name containing numeric values to interpolate (default::republican_pct) Can also use:democratic_pct,:third_party_pct, margin data, or any numeric columnresolution::Int=150: Grid resolution for interpolation (higher = smoother but slower)colormap::Symbol=:RdBu: Makie colormap for the filled regions Use:RdBufor red-blue diverging (good for margins),:viridisfor sequential data, etc.
Returns
- Tuple of (filledcontours, interpolatedgridZ, xgrid, y_grid)
Details
- Uses Gaussian kernel density weighting for smooth interpolation
- Adaptive bandwidth based on data extent (slightly larger than
create_voting_contours!for visibility) - Draws both filled regions and contour lines for dual representation
- Filled regions have transparency (alpha=0.7) to show layer composition
- Contour lines overlaid in black for clarity
- Designed for political/electoral data visualization
Example
fig, ax, plt = scatter(counties_df.longitude, counties_df.latitude)
cf, Z, x_grid, y_grid = create_filled_voting_contours!(ax, counties_df, value_col=:margin_pct, colormap=:RdBu)JuliaMapping.create_isopleth_rings — Function
create_isopleth_rings(centroids_geo, distances=[25, 50, 75, 100, 150])Create nested isopleth rings around geographic centroids, similar to elevation contours.
Arguments
centroids_geo: Vector ofPoint2fobjects representing geographic centroids (longitude, latitude)distances: Vector of distances in miles for creating rings (default:[25, 50, 75, 100, 150])
Returns
- Dictionary mapping distance values to ArchGDAL geometry objects representing nested rings
Details
This function creates concentric rings around each centroid at specified distances. The rings are created as nested zones where each ring represents the area between its distance and the previous distance (e.g., 25-50 miles, 50-75 miles, etc.). This creates an isopleth-like visualization similar to elevation contours on topographic maps.
The function:
- Converts distances from miles to degrees (approximate conversion: 1° ≈ 69 miles)
- Creates circles around each centroid at each distance
- Unions circles at the same distance to create zones
- Creates rings by taking the difference between consecutive zones
Example
using GeometryBasics, ArchGDAL
# Define centroids and distances
centroids = [Point2f(-74.0, 40.7), Point2f(-87.6, 41.9)] # NYC, Chicago
distances = [50, 100, 150, 200]
# Create isopleth rings
rings = create_isopleth_rings(centroids, distances)Notes
- Distance conversion uses approximate factor of 69 miles per degree
- First ring represents the area from 0 to the first distance
- Subsequent rings represent areas between consecutive distances
- Progress is reported for each distance being processed
- Returns ArchGDAL geometries suitable for spatial operations and visualization
JuliaMapping.create_state_union — Function
create_state_union(states::DataFrame, geometry_col=:geometry)Create a union of all state geometries for use as a clipping boundary.
Arguments
states: DataFrame containing state geometries, typically loaded from a shapefilegeometry_col: Symbol or string specifying the column name containing geometry data (default::geometry)
Returns
- An ArchGDAL geometry object representing the union of all state boundaries
Details
This function iteratively unions all state geometries to create a single boundary that can be used for clipping operations. The function includes progress reporting and error handling to manage large datasets with many states.
Example
using GeoDataFrames, ArchGDAL
# Load state data
states = GeoDataFrames.read("data/states.shp")
# Create union for clipping
state_boundary = create_state_union(states)Notes
- Progress is reported every 500 states processed
- Errors during union operations are caught and reported as warnings
- The function assumes geometries are in the same coordinate reference system
- Returns an ArchGDAL geometry suitable for spatial clipping operations
JuliaMapping.create_voting_contours! — Method
create_voting_contours!(ga, df; geometry_col=:geometry, value_col=:republican_pct, resolution=200, levels=10)Create smooth contour lines on an existing plot representing spatial interpolation of a value across geographic regions.
Arguments
ga: An existing Makie plot axis/figure to add contours todf: DataFrame containing geometry and value columns for interpolationgeometry_col::Symbol=:geometry: Column name containing ArchGDAL geometry objects (default::geometry)value_col::Symbol=:republican_pct: Column name containing numeric values to interpolate (default::republican_pct) Can also use:democratic_pct,:third_party_pct, or any numeric columnresolution::Int=200: Grid resolution for interpolation (higher = smoother but slower)levels::Int=10: Number of contour levels to draw, or a vector of specific levels
Returns
- Tuple of (contourlines, interpolatedgridZ, xgrid, y_grid)
Details
- Uses Gaussian kernel density weighting for smooth interpolation
- Adaptive bandwidth based on data extent for automatic smoothing
- Contour lines are labeled with values
- Designed for political/electoral data (margins, vote shares, etc.)
Example
fig, ax, plt = scatter(counties_df.longitude, counties_df.latitude)
cs, Z, x_grid, y_grid = create_voting_contours!(ax, counties_df, value_col=:margin_pct)JuliaMapping.detect_clustering — Method
detect_clustering(df::DataFrame, col::Symbol; n_bins::Int=5) -> Vector{Int}Identify natural clusters and gaps in data to determine if Fisher-Jenks binning is appropriate.
This function analyzes the distribution of gaps between consecutive sorted values to detect whether the data contains natural clusters. Large gaps suggest natural breakpoints that Fisher-Jenks optimization would identify effectively.
Arguments
df::DataFrame: Input DataFrame containing the datacol::Symbol: Column name to analyzen_bins::Int=5: Number of bins to create (used for recommendation threshold)
Returns
Vector{Int}: Indices of locations with large gaps (potential natural breaks)
Interpretation
- If number of large gaps ≥
n_bins - 1: Strong clustering detected → Use Fisher-Jenks - If fewer large gaps: Weak clustering → Quantiles may be simpler
- Large gaps are defined as those exceeding mean + 1.5 × standard deviation
Details
The function:
- Sorts data and calculates gaps between consecutive values
- Identifies "large" gaps using statistical threshold
- Produces a histogram of gap sizes
- Recommends binning strategy based on clustering strength
Example
large_gaps = detect_clustering(counties, :margin_pct, n_bins=5)
# Prints analysis and shows gap distribution histogram
# Returns indices where natural breaks occurSee also
JuliaMapping.dms_to_decimal — Method
dms_to_decimal(coords::AbstractString) -> StringConvert coordinates from degrees, minutes, seconds (DMS) format to decimal degrees (DD).
Arguments
coords::AbstractString: Coordinates in DMS format with flexible symbols and direction indicators
Returns
String: Coordinates in decimal degrees format "±DD.DDDD, ±DD.DDDD"
Format
Input format is flexible:
- Degrees, minutes, and seconds can use °/′/″ symbols or be plain numbers
- Direction can be N/North/n/north, S/South/s/south, E/East/e/east, W/West/w/west
- Direction can appear before or after the coordinate
- Latitude and longitude separated by comma
- Spaces between components are optional
Example
# Various formats work
dms_to_decimal("42° 21′ 37″ N, 71° 03′ 28″ W")
dms_to_decimal("42 21 37 N, 71 03 28 W")
dms_to_decimal("North 42° 21′ 37″, West 71° 03′ 28″")
dms_to_decimal("42° 21′ 37″ north, 71° 03′ 28″ west")
dms_to_decimal("40° 26′ 46.302″ N, 79° 58′ 56.484″ W")Throws
ArgumentError: If input format is invalid or coordinates are out of range
JuliaMapping.dots — Method
dots(df::DataFrame, dots::Int)Calculate dot density values for wheat production visualization.
Arguments
df::DataFrame: DataFrame containing wheat production data withwheat2017bucolumndots::Int: Number of bushels represented by each dot
Returns
Vector{Int}: Number of dots needed for each row based on production levels
Examples
wheat_df = DataFrame(wheat2017bu = [5000, 12000, 800])
dot_counts = dots(wheat_df, 1000) # Each dot represents 1000 bushels
# Returns: [5, 12, 0] (dots per county)Notes
- Uses floor division to ensure whole dots only
- Specifically designed for wheat production dot density maps
- Part of the agricultural visualization workflow
- Returns 0 for counties with production below the dot threshold
JuliaMapping.extract_centroid — Method
extract_centroid(geometry)Extract the centroid coordinates from a geometry object.
Arguments
geometry: An ArchGDAL geometry object
Returns
- A tuple
(y, x)containing the latitude and longitude coordinates of the centroid
Example
centroid_x, centroid_y = extract_centroid(geom)JuliaMapping.format_breaks — Method
format_breaks(breaks::Vector{String}) -> Vector{String}Formats a vector of string representations of ranges into a human-readable format.
Arguments
breaks::Vector{String}: A vector of strings, each representing a numerical range. Each string is expected to be in the format"start - end", wherestartandendare numerical values.
Returns
- A
Vector{String}where each element is a formatted range string in the form"start to end".- The numerical values in the range are rounded to the nearest integer and formatted with commas for better readability.
Example
julia> breaks = ["1000 - 2000", "3000.5 - 4000.2", "500000 - 1000000"]; julia> format_breaks(breaks) ["1,000 to 2,000", "3,001 to 4,000", "500,000 to 1,000,000"]
JuliaMapping.format_table_as_text — Function
format_table_as_text(headers::Vector{String}, rows::Vector{Vector{String}}, padding::Int=2)Format data as an ASCII table with borders and proper column alignment.
Arguments
headers::Vector{String}: Column header namesrows::Vector{Vector{String}}: Data rows, where each row is a vector of stringspadding::Int=2: Additional padding space around each cell (default: 2)
Returns
String: Formatted ASCII table with Unicode box-drawing characters
Examples
headers = ["Name", "Age", "City"]
rows = [["Alice", "25", "New York"],
["Bob", "30", "Chicago"],
["Carol", "22", "Boston"]]
table = format_table_as_text(headers, rows)
# Returns a formatted table with borders and proper alignmentNotes
- Uses Unicode box-drawing characters (┌─┬─┐│├─┼─┤└─┴─┘)
- Automatically calculates column widths based on content
- Each cell is padded for consistent spacing
- Useful for creating publication-ready ASCII tables
JuliaMapping.get_gdp — Method
get_gdp(df::DataFrame, quarter::Int) -> DataFrameProcess a GDP dataset to extract state-level GDP data for a specified quarter.
This function cleans and processes a raw GDP DataFrame by:
- Selecting the state names column and the specified quarter column
- Removing header and footer rows that contain explanatory text
- Converting GDP values from strings to integers (removing commas)
- Filtering to include only valid US states
- Sorting the results by state name
Arguments
df::DataFrame: Raw GDP data with state names in column 1 and quarterly data in subsequent columnsquarter::Int: Column index representing the desired quarter (e.g., 2 for Q1, 3 for Q2, etc.)
Returns
DataFrame: Cleaned dataset with columns:stateand:gdp, sorted by state name
Notes
- Assumes data starts at row 6 and state data ends 5 rows before the DataFrame end
- Requires a global variable
statescontaining valid state names for filtering - GDP values are converted to integers after removing comma separators
- First set of quarters are in current dollars, second set are in constant dollars.
- Which to use depends on the context of the analysis.
- This approach is suitable for official data and other series that have very
- consistent formatting from period to period. Otherwise, we would not be able to
- hard code the rows to trim.
Example
cleaned_data = get_gdp(raw_gdp_df, 14) # Extract Q4 data (assuming column 14 contains Q4)JuliaMapping.get_nth_table — Function
get_nth_table(url::String, n::Int=1) -> DataFrameGet the nth HTML table from a webpage and return as a DataFrame.
Arguments
url::String: The URL of the webpage to scrapen::Int=1: The table index to get (1-based indexing). Defaults to 1 for the first table.
Returns
DataFrame: The fetched table with cleaned text content
Throws
HTTP.ExceptionRequest.StatusError: If the webpage cannot be accessedArgumentError: If no tables are found on the pageBoundsError: If the requested table indexnexceeds the number of tables found
Examples
# Get first table from Wikipedia or other web page
url = "https://en.wikipedia.org/wiki/List_of_European_countries_by_area"
df = get_nth_table(url)
# Get second table
df2 = get_nth_table(url, 2)
# Save to CSV
using CSV
CSV.write("table_data.csv", df)Notes
- Headers are automatically detected from
<th>elements - Cell text is cleaned by removing extra whitespace and newlines
- If rows have different numbers of columns, they are padded or truncated to match headers
- Generic column names are created if no headers are found
JuliaMapping.get_sheet — Method
get_sheet(file_name::String, sheet::Int) -> DataFrameRead a specific sheet from an Excel file and return it as a DataFrame.
Arguments
file_name::String: Path to the Excel file (.xlsx format)sheet::Int: Sheet number to read (1-indexed)
Returns
DataFrame: The Excel sheet data converted to a DataFrame
Examples
# Read the first sheet from an Excel file
df = get_sheet("data.xlsx", 1)
# Read the third sheet
df = get_sheet("sales_report.xlsx", 3)JuliaMapping.hard_wrap — Method
hard_wrap(text::String, width::Int)Hard-wrap text at the specified width, breaking at word boundaries when possible. Each line is right-padded to the specified width.
Arguments
text::String: The text to wrapwidth::Int: Maximum line width in characters
Returns
String: Text with line breaks inserted and each line padded to width
Examples
text = "This is a long sentence that will be wrapped at word boundaries."
wrapped = hard_wrap(text, 20)
# Returns text wrapped to 20 characters per line with paddingNotes
- Right-pads each line to exactly the specified width
- Attempts to break at word boundaries when possible
- If a word is longer than the width, it will be broken mid-word
- Useful for creating fixed-width text layouts
JuliaMapping.haversine_distance_km — Method
haversine_distance_km(lon1, lat1, lon2, lat2)Calculate geodesic distance between two points using the Haversine formula.
JuliaMapping.inspect_shp — Method
inspect_shp(path::String)Prints the structure and field names of a shapefile for inspection.
Arguments
path::String: Path to the shapefile (.shp file)
Details
This function reads a shapefile and prints:
- Layer name
- Feature count (number of records)
- All field/column names available in the shapefile
Example
inspect_shp("/path/to/data.shp")
# Output:
# Layer name: data
# Feature count: 1234
# Fields:
# - ID
# - NAME
# - geometryJuliaMapping.is_font_available — Method
is_font_available(font_name::AbstractString) -> BoolReturn true if a font with the given name can be located by FreeTypeAbstraction.findfont, and false otherwise.
This function attempts to resolve font_name to an installed font using FreeTypeAbstraction’s font-discovery mechanism. If the font is found, true is returned. If findfont throws an exception—such as when the font is missing or not registered on the system—the function catches the error and returns false.
Examples
```julia julia> isfontavailable("Helvetica") true
julia> isfontavailable("NonexistentFontXYZ") false
JuliaMapping.log_dist — Method
log_dist(df::DataFrame, col::Symbol)Plot the log₁₀-transformed distribution of a numeric column with both histogram and kernel density overlay.
Arguments
df::DataFrame: Input DataFrame containing the numeric column.col::Symbol: Column name to visualize.
Details
The function computes log10(x + 1) for all nonmissing values in df[!, col], then draws:
- A normalized histogram (PDF scaling).
- A smooth kernel density estimate overlay.
Returns
Displays an AlgebraOfGraphics plot showing the log-transformed distribution.
Example
log_dist(df, :population)JuliaMapping.make_combined_table — Function
make_combined_table(data::DataFrame, half::String, formatted_breaks::Vector{String} = formatted_breaks)Create a formatted summary table of population statistics by bins.
Generates a text-based table showing population totals, percentages, and cumulative statistics for either the first or second half of population bins. Useful for creating small multiple displays with separate tables for each half.
Arguments
data::DataFrame: DataFrame containing:binand:populationcolumnshalf::String: Either "first" (bins 1-4) or "second" (bins 5-8)formatted_breaks::Vector{String}: Vector of formatted range labels (default:formatted_breaks)
Returns
- String containing a formatted text table with columns:
- Interval: The population range
- Population: Total population in the bin
- Percent: Percentage of national total
- Cumulative: Cumulative population
- Cumulative: Cumulative percentage
Details
- Groups data by bin and sums population
- Calculates percentages relative to national total
- Computes cumulative population and percentages
- Formats numbers with commas for readability
- Uses PrettyTables with right-aligned columns
Example
table_text = make_combined_table(county_data, "first", formatted_breaks)
println(table_text)Notes
The table is formatted as a string suitable for display or saving to file.
JuliaMapping.make_geographic_circle — Function
make_geographic_circle(center_geo, radius_deg, n_pts=64)Create a circular polygon in geographic coordinates (longitude/latitude in degrees).
Arguments
center_geo: Tuple or vector containing (longitude, latitude) of the circle center in degreesradius_deg: Radius of the circle in degreesn_pts: Number of points to approximate the circle (default: 64)
Returns
- Vector of
Point2fobjects representing the circle vertices
Details
This function creates a circular polygon by generating points around the center at regular angular intervals. The function accounts for latitude distortion by dividing the longitude offset by the cosine of the latitude, ensuring the circle appears approximately circular on a map projection.
Example
# Create a circle centered at (0°N, 0°E) with 1-degree radius
circle_points = make_geographic_circle((0.0, 0.0), 1.0, 32)Notes
- The circle is created in geographic coordinates (WGS84, EPSG:4326)
- Latitude distortion correction is applied to maintain circular appearance
- Enter π and θ symbols in the REPL with \piTAB and \thetaTAB
JuliaMapping.make_marker — Method
make_marker(n::Int, size::Real, shape::String) -> BezierPathCreate a custom marker composed of multiple geometric shapes arranged in a grid pattern for Makie plots.
Arguments
n::Int: Number of shapes to include in the marker (arranged in a grid)size::Real: Size of each individual shape in the markershape::String: Type of shape to draw. Supported values:"+": Plus sign (cross with horizontal and vertical bars)"_": Underscore (horizontal bar)"±": Plus-minus sign (horizontal bars with vertical bar on top)"#": Hash sign (two horizontal and two vertical bars forming a grid)"*": Asterisk (plus with two diagonal bars)"=": Equal sign (two horizontal bars)"|": Vertical bar":": Colon (two dots vertically aligned)
Returns
BezierPath: A Makie BezierPath object containing all shapes arranged in a grid
Details
The shapes are arranged in a square grid pattern with automatic spacing. The grid size is calculated as ceil(sqrt(n)) to accommodate all n shapes. Line thickness is set to 20% of the specified size, and spacing between shapes is 3× the size parameter.
Examples
using Makie, CairoMakie
# Create a marker with 4 plus signs
marker = make_marker(4, 10.0, "+")
# Use in a scatter plot
fig = Figure()
ax = Axis(fig[1, 1])
scatter!(ax, [1, 2, 3], [1, 2, 3], marker=marker, markersize=30)
fig
# Create markers with different shapes
plus_marker = make_marker(9, 5.0, "+")
underscore_marker = make_marker(6, 5.0, "_")
plusminus_marker = make_marker(4, 5.0, "±")
hash_marker = make_marker(4, 5.0, "#")
asterisk_marker = make_marker(9, 5.0, "*")
equal_marker = make_marker(6, 5.0, "=")
pipe_marker = make_marker(4, 5.0, "|")
colon_marker = make_marker(6, 5.0, ":")Throws
error: Ifshapeis not one of the supported values ("+", "_", "±", "#", "*", "=", "|", ":")
JuliaMapping.percent — Method
percent(x::Float64)Convert a decimal value to a percentage string with two decimal places.
Takes a decimal value (e.g., 0.5) and converts it to a formatted percentage string (e.g., "50.0%").
Arguments
x::Float64: A decimal value (typically between 0.0 and 1.0)
Returns
- A string representation of the percentage with two decimal places and a "%" suffix
Example
percent(0.5) # Returns "50.0%"
percent(0.7532) # Returns "75.32%"
percent(0.123456) # Returns "12.35%"
percent(1.0) # Returns "100.0%"Notes
- Values are multiplied by 100 and rounded to 2 decimal places
- Works with any Float64 value, not just those between 0 and 1
JuliaMapping.pick_random_subset — Method
pick_random_subset(data, target_sum; rng=Random.default_rng())Select a random subset of rows from a DataFrame where a count column sums to a target value.
Convenience wrapper around uniform_subset_sum_indices that works directly with DataFrames containing a :Count column.
Arguments
data: DataFrame with a:Countcolumn containing integer valuestarget_sum: The desired sum of the:Countcolumnrng: Random number generator (default:Random.default_rng())
Returns
- A subset of the input DataFrame (rows whose
:Countvalues sum totarget_sum) - Returns empty DataFrame if no valid subset exists
Example
using DataFrames
deaths = DataFrame(
location = ["A", "B", "C", "D", "E"],
Count = [5, 10, 15, 20, 25]
)
subset = pick_random_subset(deaths, 40)
# Returns rows that sum to exactly 40 deaths, e.g., rows with counts [5, 15, 20]Notes
- Assumes the DataFrame has a
:Countcolumn - Each valid subset has equal probability of selection
- Useful for bootstrap sampling with exact sample size constraints
JuliaMapping.plot_colorscheme_grid — Method
plot_colorscheme_grid(schemes; ncol=2)Create a grid display of ColorScheme colorbars for visual comparison.
Arguments
schemes::Vector: Vector of ColorScheme names (as symbols) or ColorScheme objects to display
Keywords
ncol::Int=2: Number of columns in the grid layout
Returns
Figure: A CairoMakie Figure object containing the colorbar grid
Description
Displays multiple ColorSchemes as horizontal colorbars arranged in a grid layout. Each colorbar is labeled with its scheme name. Useful for comparing color palettes or selecting appropriate schemes for data visualization.
The figure height automatically adjusts based on the number of rows needed. Each colorbar is displayed horizontally with a fixed height of 30 pixels.
Example
# Display common sequential schemes
schemes = [:viridis, :plasma, :inferno, :magma]
f = plot_colorscheme_grid(schemes; ncol=2)
display(f)
# Compare diverging schemes
diverging = [:RdBu, :RdYlBu, :PiYG, :BrBG]
f = plot_colorscheme_grid(diverging; ncol=2)
# Display all schemes in a category
sequential = [:viridis, :plasma, :inferno, :magma, :cividis, :twilight]
f = plot_colorscheme_grid(sequential; ncol=3)See Also
ColorSchemes.colorschemes- Dictionary of all available ColorSchemes- Browse schemes at: https://juliagraphics.github.io/ColorSchemes.jl/stable/catalogue/
JuliaMapping.plot_county_interval — Method
plot_county_interval(data::DataFrame, f::Figure, brk::Int64, formatted_breaks::Vector{String})Plot counties within a specific population bin on a small multiples figure.
Creates a single panel in a 2×2 grid showing counties that fall within a particular population bin, highlighted against a background of all counties. Part of a small multiples visualization.
Arguments
data::DataFrame: DataFrame containing county geometries and:binclassificationf::Figure: Makie Figure object to add the plot tobrk::Int64: Bin number to plot (1-8)formatted_breaks::Vector{String}: Vector of formatted range labels for titles
Returns
Nothing. Modifies the figure in place by adding a GeoAxis panel.
Details
- Creates a GeoAxis in EPSG:5070 projection (US Albers Equal Area)
- Positions panels in a 2×2 grid (bins 5-8 map to positions 1-4 for second figure)
- Row calculation:
ceil(display_brk / 2) - Column calculation: odd bins → column 1, even bins → column 2
- Background: All counties in white with thin gray borders
- Foreground: Counties in current bin colored using YlGn (Yellow-Green) colorscheme
- Title shows the formatted population range
- Decorations (axes, ticks) are hidden
Example
f = Figure(resolution=(1200, 1000))
for brk in 1:4
plot_county_interval(county_data, f, brk, formatted_breaks)
endNotes
Designed for creating small multiple displays showing population distribution patterns.
JuliaMapping.plot_named_color_groups — Method
plot_named_color_groups(title, names; ncol=6, cell=(140,80), gap=(8,8), figsize=(1600,900))Create a visual grid display of named colors with their RGB values.
Arguments
title::AbstractString: The title to display at the top of the plotnames::Vector{<:AbstractString}: Vector of color names to display
Keywords
ncol::Int=6: Number of columns in the grid layoutcell::Tuple{Int,Int}=(140,80): Width and height of each color cell in pixelsgap::Tuple{Int,Int}=(8,8): Horizontal and vertical gap between cells in pixelsfigsize::Tuple{Int,Int}=(1600,900): Overall figure size in pixels
Returns
Figure: A CairoMakie Figure object containing the color grid
Description
Each color swatch displays:
- The color name (centered at top)
- RGB values as decimal numbers (bottom left)
- Text color automatically chosen (black/white) based on background brightness
Example
# Display a select red colors
reds = ["red", "crimson", "darkred", "firebrick"]
f = plot_named_color_groups("Red Colors", reds; ncol=4)
display(f)
# All reds
f = plot_named_color_groups("All Red Colors", :reds)Throws
- Error if any color name in
namesis not recognized by Colors.jl
JuliaMapping.polygon_to_archgdal — Method
polygon_to_archgdal(poly::Polygon)Convert a GeometryBasics.Polygon to an ArchGDAL geometry object.
Arguments
poly: APolygonobject from GeometryBasics containing the polygon vertices
Returns
- An ArchGDAL polygon geometry object that can be used for spatial operations
Details
This function extracts the exterior ring coordinates from a GeometryBasics.Polygon and creates a corresponding ArchGDAL polygon. The function automatically closes the ring if it's not already closed by duplicating the first point at the end.
Example
using GeometryBasics, ArchGDAL
# Create a simple polygon
points = [Point2f(0,0), Point2f(1,0), Point2f(1,1), Point2f(0,1)]
poly = Polygon(points)
# Convert to ArchGDAL geometry
gdal_poly = polygon_to_archgdal(poly)Notes
- The function handles both closed and unclosed polygon rings
- Returns an ArchGDAL geometry suitable for spatial analysis operations
- Coordinate order is preserved (longitude, latitude for geographic data)
JuliaMapping.pump_comparison_test — Function
pump_comparison_test(pump_1_mean, other_means, n_permutations=10000)Perform a permutation test to compare one pump against the mean of other pumps.
Tests whether Pump 1 has a significantly lower mean than other pumps using a one-tailed permutation test. Inspired by John Snow's cholera analysis comparing the Broad Street pump to other water pumps in London.
Arguments
pump_1_mean: Mean value for the pump of interest (e.g., mean deaths)other_means: Vector of mean values for comparison pumpsn_permutations: Number of permutations for the test (default:10000)
Returns
A tuple (p_value, observed_diff) where:
p_value: Proportion of permutations with difference ≤ observed differenceobserved_diff: Observed difference (Pump 1 mean - mean of others)
Details
- Null hypothesis: All pumps come from the same distribution
- Tests if Pump 1 is unusually low (one-tailed test)
- Prints the observed difference to console
- Randomly permutes pump labels and recalculates the test statistic
Example
pump_1 = 8.5 # Mean deaths near Broad Street pump
others = [3.2, 2.8, 3.5, 2.9, 3.1] # Mean deaths near other pumps
p_value, diff = pump_comparison_test(pump_1, others, 10000)
println("P-value: $p_value")
# If p_value < 0.05, Pump 1 has significantly higher mean deathsNotes
- Small p-values suggest Pump 1 is unusually high (or low, depending on metric)
- Based on resampling without replacement (permutation test)
- Named after John Snow's 1854 cholera outbreak investigation
JuliaMapping.raw_hist — Method
quick_hist(df::DataFrame, column::Symbol; bins=20)Create a histogram with negative values colored brown and positive values colored green.
Arguments
df: DataFrame containing the datacolumn::Symbol: Column name to plotbins=20: Number of histogram bins (default: 20)
Returns
- Figure object from AlgebraOfGraphics
Example
fg = quick_hist(df, :Population)
fg = quick_hist(df, :Population, bins=30)JuliaMapping.ripleys_k — Method
ripleys_k(coords, distances; area=nothing)Calculate Ripley's K-function to detect spatial clustering or dispersion of point patterns.
Ripley's K-function measures the spatial distribution of points at various distance scales. Values higher than expected indicate clustering; lower values indicate dispersion.
Arguments
coords: 2×n matrix where each column is a point[x, y]distances: Vector of distance thresholds to evaluatearea: Optional study area size; ifnothing, calculated as bounding box area
Returns
- Vector of K-function values, one for each distance in
distances
Formula
For distance d: K(d) = (A/n²) × 2 × (number of point pairs within distance d)
Where:
- A = study area
- n = number of points
Interpretation
- K(d) ≈ πd² for complete spatial randomness (CSR)
- K(d) > πd² suggests clustering at distance d
- K(d) < πd² suggests dispersion/regularity at distance d
- Often transformed to L(d) = √(K(d)/π) - d for easier interpretation
Example
# Points in 2D space (cholera deaths)
coords = [1.0 2.0 3.0 5.0 5.5;
1.0 1.5 2.0 5.0 5.2]
distances = [0.5, 1.0, 2.0, 3.0]
k_values = ripleys_k(coords, distances)
# Check for clustering
for (d, k) in zip(distances, k_values)
expected = π * d^2
println("Distance $d: K=$k (expected=$expected)")
endNotes
- Computational complexity: O(n² × length(distances))
- Named after statistician Brian Ripley
- Commonly used in spatial epidemiology and ecology
- John Snow's cholera analysis can be analyzed with this function
JuliaMapping.scaled_dist — Method
scaleddist(df::DataFrame, col::Symbol; bandwidthpct=0.05, bins=30)
Plot the distribution of a numeric column with adaptive kernel bandwidth scaling.
Arguments
df::DataFrame: Input DataFrame.
col::Symbol: Column name to visualize.
bandwidth_pct::Float64=0.05: Fraction of the data range used as bandwidth
for the kernel density estimate.
- bins::Int=30: Number of histogram bins.
Details
Computes a histogram normalized as a PDF, and overlays a kernel density curve with bandwidth proportional to the column range (bandwidth = range * bandwidth_pct).
Returns
Displays an AlgebraOfGraphics plot showing the histogram and scaled density.
Example
scaleddist(df, :income; bandwidthpct=0.03, bins=40)
JuliaMapping.show_named_color_groups — Method
show_named_color_groups()Print the available named color groups to the console.
Displays the names of predefined color categories that can be used with plot_named_color_group(). These correspond to color arrays defined in the included "named_colors.jl" file.
Example
show_named_color_groups()
# Output: whites reds oranges yellows greens cyans blues purples pinks browns graysJuliaMapping.split_string_into_n_parts — Method
split_string_into_n_parts(text::String, n::Int)Split a string into n approximately equal parts, inserting newlines at word boundaries.
Arguments
text::String: The text string to splitn::Int: Number of parts to split the string into
Returns
String: The original text with newlines inserted to create n parts
Examples
text = "This is a long text that needs to be split into multiple parts for better formatting."
result = split_string_into_n_parts(text, 3)
# Returns text split into 3 parts with newlines at word boundariesNotes
- Attempts to break at word boundaries when possible
- If a word is longer than the target part length, it will be broken mid-word
- Each part will be approximately equal in length
- Preserves original spacing between words
JuliaMapping.uniform_subset_sum_indices — Method
uniform_subset_sum_indices(counts::AbstractVector{<:Integer}, target::Integer; rng=Random.default_rng())Find indices of elements that sum to a target value using uniform random selection via dynamic programming.
Uses a dynamic programming approach to uniformly sample from all possible subsets of counts that sum exactly to target. Each valid subset has an equal probability of being selected.
Arguments
counts::AbstractVector{<:Integer}: Vector of integer values (e.g., counts per category)target::Integer: The desired sumrng: Random number generator (default:Random.default_rng())
Returns
- Vector of indices whose corresponding
countsvalues sum totarget - Returns empty
Int[]if no valid subset exists
Algorithm
- Builds a DP table counting all possible ways to achieve each sum
- Uses
BigIntarithmetic to handle large combinatorial counts - Backtracks through the DP table, randomly choosing to include/exclude each element
- Inclusion probability = (ways to include) / (total ways from this state)
Example
counts = [10, 20, 30, 40, 50]
target = 80
indices = uniform_subset_sum_indices(counts, target)
# Might return [1, 3, 4] since counts[[1,3,4]] = [10,30,40] sum to 80
sum(counts[indices]) == target # Always true if indices is non-emptyNotes
- Useful for statistical sampling when you need exactly N items from categories
- Computationally intensive for large vectors or large target values
- Related to the subset sum problem, but with uniform random selection
JuliaMapping.with_commas — Method
with_commas(x)Format numeric values with comma separators for thousands.
Converts numeric values to Int64 and adds comma separators using the Humanize.digitsep function, making large numbers more readable.
Arguments
x: A numeric value or array of numeric values
Returns
- A string or array of strings with comma-separated digits
Example
with_commas(1000000) # Returns "1,000,000"
with_commas([1234, 5678]) # Returns ["1,234", "5,678"]
with_commas(1234.56) # Returns "1,235" (rounded to Int64)Notes
- Values are converted to Int64, so decimal portions are truncated/rounded
- Uses the Humanize package for formatting
JuliaMapping.@ensure_types — Macro
@ensure_types(df, type_specs...)Ensure that specified columns in a DataFrame have the correct data types by performing automatic type conversions.
Arguments
df: The DataFrame to modifytype_specs...: Variable number of type specifications in the formatcolumn::Type
Type Specifications
Each type specification should be in the format column::Type where:
columnis the column name (Symbol or String)Typeis the target Julia type (e.g.,Int,Float64,String)
Supported Conversions
- String to Integer: Uses
parse()to convert string representations of numbers - String to Float: Uses
parse()to convert string representations of floating-point numbers - Float to Integer: Uses
round()to convert floating-point numbers to integers - Other conversions: Uses
convert()for general type conversions
Examples
# Convert Population to Int and Expend to Float64
@ensure_types df Population::Int Expend::Float64
# Convert multiple columns at once
@ensure_types df Deaths::Int Population::Int Expend::Float64Notes
- The macro modifies the DataFrame in-place
- Prints progress messages for successful conversions
- Issues warnings for columns that don't exist
- Throws errors for conversion failures
- Returns the modified DataFrame