A self-updating personal geolocation intelligence dashboard built on a fully static architecture — no server, no database, no runtime. From a Foursquare check-in to a live deployed dashboard in under 5 minutes.
A personal analytics dashboard that ingests every Foursquare/Swarm check-in and renders it into ten pages: a main analytics dashboard, a trip journal with per-trip maps, a companions tracker, a full check-in feed with historical weather, a tips explorer with country/city tabs, closed/deleted-venue badges, view counts, and filter buttons, a venue loyalty explorer, a world cities map, a full-text search page, a photo gallery with 21 000+ images, a stats overview, and this engineering write-up — all committed to git and served via Cloudflare Pages CDN.
Beyond rendering, the system maintains a data integrity pipeline: a manual archive
workflow snapshots the full check-in history, diffs it against the previous snapshot to detect renamed
or moved venues, and propagates those changes into the tips dataset — all without extra API calls.
Duplicate rows and check-ins that the API silently stops returning are detected on every full re-fetch
and accumulated in an incremental anomaly log (checkins_anomalies.json), giving a
permanent auditable history of every data quality event.
The entire system runs on free-tier infrastructure: Cloudflare Pages, Cloudflare Workers, Cloudflare KV, Cloudflare D1 (SQLite at the edge), and GitHub Actions. There are no servers, no containers. The build pipeline is 100% Python 3.9+, the frontend is vanilla JS with Leaflet and Chart.js. Dynamic features (full-text search, the paginated check-in feed) are served by Cloudflare Pages Functions querying a D1 database — keeping runtime cost at zero while enabling millisecond-latency queries against the full 65 000-row dataset.
The system is designed around a push-on-change philosophy: nothing runs unless there is new data. A Cloudflare Worker acts as the real-time sensor, keeping the whole pipeline reactive while consuming negligible resources when idle.
The Worker runs on a 1-minute cron trigger, fetching the most recent check-in from
the Foursquare API and comparing its Unix timestamp against the last-seen value stored in
Cloudflare KV. If the timestamp is newer, the Worker writes the new value to KV
(idempotent — prevents double-triggering on retries) and fires a workflow_dispatch
event to GitHub Actions via the REST API.
GitHub Actions then fetches the updated check-in data from the private data repository,
rebuilds all ten HTML pages, commits and pushes to main. Cloudflare Pages
detects the push and deploys automatically — no build command needed, since the HTML is
already pre-built.
timezonefinder) fails for
countries that don't observe Daylight Saving Time. For example, Belarus geographically
maps to UTC+2 in summer, but politically observes UTC+3 year-round. This causes
check-in local times to be off by one hour for half the year.
metrics.py maintains a _COUNTRY_TZ dictionary mapping
country names to authoritative IANA timezone IDs. This takes precedence over the
coordinate-based lookup. Europe/Minsk is always UTC+3, regardless of what
the geometric timezone boundary says — because that's what the clocks actually show.
checkins.csv contains full location history: GPS coordinates,
venue IDs, timestamps, and companion names across years of travel. Committing this
to a public repo exposes structured personal data to anyone crawling GitHub.
foursquare-data). A fine-grained Personal Access Token scoped
exclusively to that repository allows GitHub Actions to check it out at build time.
The public repository never sees the raw CSV — only the generated HTML output, which
contains only the aggregated, display-ready data already visible on the site.
The PAT has Contents: read/write and nothing else.
index.html.tmpl and gen_worldcities.py implement a
CTRY_CONT JavaScript dictionary mapping every country
to its continent. The matchVisited() function rejects a world-cities
database match unless the candidate city's country falls on the same continent as
the visited check-in's country. This guard is maintained in sync across both files —
a documented invariant noted in CLAUDE.md.
metrics.py runs an 8-pass pipeline over each candidate trip window
to progressively widen its boundaries until they reflect the actual journey:
"airport" in cat.lower()) rather than an exact string, catching all
variants emitted by Foursquare: International Airport, Airport
Terminal, Airport Gate, Airport Service. A
prev_end_idx guard prevents the backward scan from crossing into a
preceding trip's arrival rows. Where the heuristics fall short, three JSON config
files provide surgical overrides: trip_start_overrides
and trip_end_overrides pin exact boundary timestamps,
and trip_names.json / trip_tags.json
attach human-readable names and activity tags (bicycle, camping, etc.) keyed by the
final resolved start timestamp.
/users/self/tips endpoint silently omits any tip written on
a venue that has since been closed or deleted. With no error, no flag, and no indication
in the response, a full fetch of 1 782 tips appeared complete — until a per-venue sweep
revealed 25 additional tips that existed only on closed venues.
A secondary problem: the API returns country names in the local language
(Беларусь, Republica Moldova, المغرب…) rather than
a consistent English form, breaking grouping, flag lookup, and tab rendering.
viewCount field entirely absent from the API response. The export also
revealed that presence in checkins.csv is not a reliable proxy for
venue activity: a venue can appear in historical check-ins and still be closed on Foursquare
today. Determining true closed/deleted status required fetching each venue page individually.
closed=True in tips.json.
The 25 pre-existing sweep tips were identified retroactively by comparing the initial
1 782-tip commit in the data repo against HEAD (1 807 tips); the 25-ID delta was
patched directly.
foursquare.com/v/{id} with browser session cookies
(the public page embeds "closed":true in its __NEXT_DATA__ JSON
or in the raw HTML for closed venues). 95 of 100 venues were confirmed closed on-page;
the remaining 5 loaded on the legacy app.foursquare.com renderer with no closed
marker, indicating they are still active — and their tips were found to have been
deleted by moderators rather than lost to venue closure.
viewCount for all tips — a field the API never
returns. fetch_tips.py was updated to capture viewCount going
forward; historical counts from the export were backfilled into tips.json
in one pass. View counts are refreshed on each full re-fetch (--full);
incremental runs only touch tips newer than the latest known timestamp.
CTRY_NORM dictionary
in gen_tips.py mapping every local-language variant to its English form.
City names reuse the existing city_merge.yaml pipeline. Both normalised
values are stored as nc (country) and nci (city) on each tip
record and propagated into the recent-30 tips slice embedded in index.html.
Tip cards display a red CLOSED badge, a purple
DELETED badge, and a 👁 view count
in the footer. Three dedicated filter buttons — By Date, Closed only,
and Deleted only — let the reader surface each data quality category directly.
checkins.csv three years ago may now point to a venue
with a different name, city, or coordinates — and tips.json, which duplicates
that venue metadata per tip, can drift out of sync independently.
Additionally, full re-fetches occasionally surface a second problem: some check-ins that
existed in the old CSV are simply absent from the API response (deleted or merged
venues), while other rows appear duplicated within the historical data with no indication of
the double-entry.
Both silent drift and silent data loss are impossible to detect without an explicit comparison.
sync_venue_changes.py compares the two snapshots on six fields per
venue_id: venue, city, country,
lat, lng, category. For each changed venue it
patches every matching tip in tips.json in-place — converting lat/lng to
float rounded to 5 dp to match the tips schema — and logs a numbered summary of every
updated tip. The patched tips.json is committed alongside the fresh CSV in
the same atomic commit, keeping both files permanently in sync without any additional
API quota.
checkins_anomalies.json:
(venue_id, date) key appears more
than once — identical double-entries from early Swarm usage. They are intentionally
preserved in the CSV rather than silently removed; the anomaly file provides
visibility without data loss. A duplicate_checkins.csv sidecar is also written
for direct inspection.
Missing rows are check-ins present in the existing CSV but absent from the
API response — venues that Foursquare deleted or merged. These too are preserved and recorded
so the count discrepancy is explained and auditable. Both lists accumulate across runs:
new entries are merged in, existing entries are never removed, giving a permanent history
of every data quality event the re-fetch has ever observed.
/v2/checkins/{id} endpoint and is no longer
surfaced through any current API. With ~65 000 historical check-ins, a one-time
enrichment run was the only way to recover this data before it disappeared entirely.
HTTP 403 responses identically — marking the row as a permanent skip
("-") — so quota-exhausted rows were silently discarded alongside genuinely
inaccessible ones, with no way to tell them apart after the fact.
overlaps_name and
with_name / created_by_name. 408 rows were affected.
--only-ids-file flag accepts a plain text file of one
checkin_id per line. Before building the work queue it resets any row in
that set whose overlaps_id was incorrectly finalised (back to ""),
ensuring the IDs are always re-processed regardless of prior run state.
Sleep was increased from 0.35 s to 1.5 s per call to stay safely below
the quota ceiling for the full 65 000-row run.
overlaps_name / overlaps_id scrubbed of any
name already present in with_name or created_by_name, with
entries reduced to "-" where nothing genuine remained. The surviving
38 genuine overlaps — people who happened to be at the same place at the
same time, entirely independently — were committed to the data repo and rendered in the
companions page.
photos.json) grows over time as new check-ins are added.
A naive run would re-probe every un-indexed check-in on every CI run — including the
~51 000 that were already confirmed to have no photos — wasting quota and time.
/item/{id} path rather than /checkin/{id}
— matching tip IDs, not check-in IDs. They had been silently mis-classified.
photos.html gallery renders 21 000+ images lazily in batches of 300,
with a country/city accordion filter (countries collapsed by default, cities as pill
buttons), a separate tip photos section with its own lightbox mode, and a hero count
that includes both check-in and tip photos with an anchor link to the tip section.
The multi-photo badge on index.html recent-check-in cards shows the first
photo plus a +N overlay when a check-in has multiple images; clicking
navigates through all photos for that check-in in the inline lightbox.
aws s3 sync uploads only new files (those not yet
in the bucket), making each incremental deploy fast regardless of total gallery size.
The --pix-url flag keeps the build fully decoupled: local builds use a
file:/// URI; the deployed site uses the R2 public URL — no code changes
needed between environments.
/users/self/venueokays and
/users/self/venuedislikes return HTTP 402 on every
request, regardless of authentication or token scope. There is no error message
explaining why, no workaround in the documentation, and no indication this will
change. A naive implementation would treat 402 as a transient failure and retry
indefinitely, blocking the CI run.
createdAt timestamp field entirely — the field the ratings page uses
to assign check-ins to years. Without it, those venues sort to the bottom with no
year grouping.
fetch_ratings.py was updated to treat 402 as a permanent
skip, not a retry-able error. The script logs a one-time warning and moves
on — no hang, no false failure. Only the likes endpoint (/users/self/venuelikes)
reliably returns data and is actively fetched.
createdAt timestamps on likes: a backfill pass reads
checkins.csv and, for each liked venue, finds the earliest
check-in timestamp at that venue. This is stored as first_ts
in venueRatings.json and used as a proxy creation date when
createdAt is absent. The ratings page year-grouping logic falls through
to first_ts automatically, so all likes are assigned to a year with no
manual intervention.
const ALL=[...] JSON blob directly
inside the HTML — roughly 11 MB of inline JavaScript. This created
three compounding problems:
search-index.json) was generated at build time, committed to the repo
(~1.5 MB), and served as a static file. Any search therefore reflected data as of the
last deploy, not the current D1 state. The static file also bloated the repo and the
CDN response for every search query.
ALL); position
computation (buildPos) maps each row to a pixel offset using a gap-row
strategy. The first 200 items render immediately on first API response; the remainder
are fetched in the background while the user scrolls, appended to ALL
without disrupting the viewport.
?ym=1 prefetch,
so all month anchors are clickable before the background pagination
completes. An isFiltered flag suppresses viewport re-renders while a
search is active, preventing the background fetch from resetting the user's current
view. The feed page went from an 11 MB HTML file that blocked paint to a
26 KB shell that renders the first screen in under one second.
--tips-changed,
--ratings-changed, --lists-changed) that default to
false — so a manual local sync never rewrites unchanged tables accidentally.
CI passes the actual fetch-step output for each flag.
The build system is intentionally minimal. build.py is the single orchestrator:
it loads YAML/JSON config, calls transform.py to normalise city and country names,
calls metrics.py to compute all aggregations and trip detection, then renders
two Jinja-free template files using simple {{PLACEHOLDER}} substitution.
The generator scripts (gen_companions.py, gen_venues.py,
gen_worldcities.py) each embed their complete HTML template as a
base64-encoded string (_TMPL_B64). This makes each
generator fully self-contained — no external template files, no path resolution issues
regardless of working directory. To modify a page's design, you base64-decode the
string, edit the HTML/CSS/JS, re-encode.
gen_feed.py is an exception: after the migration to D1-backed pagination
(see Challenge 13), it no longer embeds any data. It simply renders the template shell
with the {{SWARM_USER_ID}} placeholder substituted — the actual check-in
data is fetched at runtime from /api/feed. Similarly, the full-text search
index (search-index.json) is no longer generated or committed; search is
served live by functions/api/search.js querying D1 directly.
Trip detection in metrics.py runs a single-pass scan over the sorted
check-in sequence: any consecutive run of check-ins where city ≠ home_city
and the run length exceeds min_checkins (configurable) is declared a trip.
Trip names are auto-generated from the most-visited countries and cities in that sequence.
Every choice optimises for zero operational overhead and long-term maintainability — no framework churn, no node_modules in the build pipeline, no containers to patch.
python scripts/build.py and serve from any static host.
index.html.tmpl and gen_worldcities.py),
the constraint is explicitly documented in CLAUDE.md — so future
AI-assisted edits know to update both files together.