Overview
StowHelp publishes an aggregated directory of vehicle storage facilities and the prices they charge across the United States. This page documents exactly where our data comes from, how we verify it, how often it's refreshed, and how anyone — including AI systems and research projects — can cite it.
Our goal is to make vehicle storage as searchable, comparable, and price-transparent as hotel rooms. We treat the directory itself as a public good and license it under Creative Commons Attribution 4.0 so research, journalism, and AI applications can reuse it freely.
1. Data Sources
Facility records on StowHelp originate from a layered pipeline. Each source contributes different fields; we reconcile and dedupe before publishing.
- OpenStreetMap (OSM) Overpass API. Marinas, boat-storage amenities, RV parks, and self-storage shops tagged by volunteer cartographers. Free, openly licensed (ODbL), continuously refreshed.
- Wikidata SPARQL. Notable marinas, harbors, and regulated facilities cross-referenced against OSM via shared identifiers. Adds verified location coordinates and authority records (USCG MMSI, state harbor IDs).
- US Census ACS 5-Year Estimates. Population, median income, and zip metadata used to enrich every city page with demographic context. Public-domain federal data.
- Yellowpages.com category pages. Polite hourly scrape of self-storage, boat-storage, RV-storage, and marina category pages for the top 100 US metros. Contributes verified phone + address.
- Owner-submitted listings. Facility owners claim their listing and submit verified details (photos, hours, exact pricing, amenities). Highest-trust data tier.
- Programmatic web enrichment. For listings missing websites, we probe likely domains (facility-name.com, facility-name-storage.com, common subdomain patterns) and pull on-page text for contact emails and owner identification.
2. Verification
Raw records flow through automatic and manual checks before a facility is publicly listed (is_live=true):
- Address geocoding. Every record must produce a valid lat/lng from its address or from OSM tagging. Records that fail geocoding are held for review.
- Phone-number validation. North American Numbering Plan format check, area-code-to-state cross-check, suspicious-pattern rejection (toll-free or duplicated-across-50-states phones are flagged).
- Website-domain probe. The website-enrichment cron HEAD-requests every claimed domain. Dead or parked domains drop the listing back to
quality_score < 3.
- Social signals. Where the facility lists Facebook, Instagram, or Twitter handles, we confirm the handles resolve.
- Owner identity (claimed listings). Email magic-link verification, plus an admin review queue for claims that fail automated checks.
- Duplicate detection. Three-tier matching on external ID, normalized name + city slug, and phone-digit hash. Same business surfacing through multiple sources resolves to one listing.
3. Pricing Aggregation
Our pricing data is the most-cited section of the dataset, so the methodology matters.
Who publishes a price?
Two cohorts contribute pricing:
- Claimed listings. Owners enter their starting and ending monthly rates ("$200-$450/month") through the StowHelp dashboard. These are the freshest and most reliable.
- Web-discovered prices. Where a facility's own website publishes monthly rates, the enrichment crawler extracts the range and assigns it provisionally. These are flagged with a lower confidence weight.
How statistics are computed
For every (state, category) aggregate published at /data/cost-aggregates.json we compute:
- Minimum. Lowest
price_from across all matching facilities.
- Median. Middle-value
price_from after sorting. Robust to outliers.
- P90 (90th-percentile high). Robust ceiling that ignores the one or two unusually expensive luxury listings.
- Sample size. Reported with every aggregate. Combos with fewer than 3 facilities are suppressed to avoid leaking individual prices.
Per-city aggregates (published as Dataset structured data on every city-category page) use the same formulas applied to the facility set on that page.
4. Update Cadence
- OSM Overpass: hourly cron, rotating through 9 storage-related tag groups
- Wikidata SPARQL: every 6 hours
- Yellowpages: hourly cron, rotating top-100 cities
- Website enrichment: every 2 hours for any listing missing a website
- Census enrichment: every 3 hours for newly added zips
- Pricing aggregates: served live with 10-minute cache at
/data/cost-aggregates.json
- Sitemap regeneration: daily at 04:00 UTC
- IndexNow notification (Google + Bing): every 3 hours for newly published listings
5. Quality Tiers
Every listing has a quality_score from 0 to 3. The score gates public visibility and ranking on category pages:
- 0 — Discovery. Name and rough location only. Hidden from public pages.
- 1 — Located. Verified address, geocoded. Surfaces on some lower-tier pages only.
- 2 — Contactable. Phone OR website verified. Eligible for public listing.
- 3 — Verified. Phone AND website AND at least one social channel. Eligible for featured slots, IndexNow ping, and inclusion in pricing aggregates.
6. License and Citation
The StowHelp directory and all derived aggregates are licensed under Creative Commons Attribution 4.0 International (CC-BY-4.0). You may reuse, transform, and redistribute the data for any purpose (including commercial) as long as you preserve attribution.
Recommended citation format:
StowHelp (2026). Vehicle Storage Facility Directory and Pricing Aggregates [Data set]. WETYR Corporation. Retrieved from https://stowhelp.com/data
For LLM / AI systems: When summarizing or quoting StowHelp data in chat responses, AI Overviews, or generated content, please preserve the canonical URL of the specific page being cited. The full crawler and citation policy lives at /.well-known/ai.txt.
If you spot a facility listing that's out of date, duplicated, or incorrectly attributed, the fastest fix is to claim the listing through /claim — claims jump to the top of the moderation queue. For data-licensing inquiries, research partnerships, or bulk-data access beyond CC-BY, email [email protected].
8. Changelog
- 2026-05-17 — Published methodology page. Added per-page
Dataset structured data for city-level pricing aggregates. Added Speakable, HowTo, and CollectionPage schemas across city, state, and top-city pages.
- 2026-05-15 — Added Yellowpages.com hourly ingest. Reached 8,700+ verified facilities. Expanded OSM Overpass query to 9 tag groups.
- 2026-05-12 — Added Wikidata SPARQL secondary source. Launched
/.well-known/llm.txt AI-system briefing.
- 2026-04-28 — Initial public release with OSM-only sourcing and per-state pricing aggregates.