A self-updating personal geolocation intelligence dashboard built on a fully static architecture — no server, no database, no runtime. From a Foursquare check-in to a live deployed dashboard in under 5 minutes.
A personal analytics dashboard that ingests every Foursquare/Swarm check-in and renders it into fourteen pages: a main analytics dashboard, a trip journal with per-trip maps, a companions tracker, a full check-in feed with historical weather, a tips explorer with country/city tabs and closed/deleted-venue badges, a venue ratings page (likes / okays / dislikes), a venue loyalty explorer, a world cities map, a Foursquare lists page, a full-text search page, a photo gallery with 21 000+ images, a shouts archive (free-text comments attached to check-ins, with year / country filters and live search), a stats overview with Hour × Category and DOW × Category heatmaps plus shout text mining, a live travel guide that reads your last 48 hours and suggests nearby venues based on your historical preferences, and this engineering write-up — all committed to git and served via Cloudflare Pages CDN.
Beyond rendering, the system maintains a data integrity pipeline: a manual archive
workflow snapshots the full check-in history, diffs it against the previous snapshot to detect renamed
or moved venues, and propagates those changes into the tips dataset — all without extra API calls.
Duplicate rows and check-ins that the API silently stops returning are detected on every full re-fetch
and accumulated in an incremental anomaly log (checkins_anomalies.json), giving a
permanent auditable history of every data quality event.
The entire system runs on free-tier infrastructure: Cloudflare Pages, Cloudflare Workers, Cloudflare KV, Cloudflare D1 (SQLite at the edge), and GitHub Actions. There are no servers, no containers. The build pipeline is 100% Python 3.9+, the frontend is vanilla JS with Leaflet and Chart.js. Dynamic features (full-text search, the paginated check-in feed) are served by Cloudflare Pages Functions querying a D1 database — keeping runtime cost at zero while enabling millisecond-latency queries against the full 65 000-row dataset.
The system is designed around a push-on-change philosophy: nothing runs unless there is new data. A Cloudflare Worker acts as the real-time sensor, keeping the whole pipeline reactive while consuming negligible resources when idle.
The Worker runs on a 1-minute cron trigger, fetching the most recent check-in from
the Foursquare API and comparing its Unix timestamp against the last-seen value stored in
Cloudflare KV. If the timestamp is newer, the Worker writes the new value to KV
(idempotent — prevents double-triggering on retries) and fires a workflow_dispatch
event to GitHub Actions via the REST API.
GitHub Actions then fetches the updated check-in data from the private data repository,
rebuilds all fourteen HTML pages, commits and pushes to main. Cloudflare Pages
detects the push and deploys automatically — no build command needed, since the HTML is
already pre-built.
The full path from a check-in tap in Swarm to a deployed page on the public site, showing every component and the data that flows between them. Three external services (Foursquare API, Cloudflare R2, Open-Meteo) are sources; everything else runs on the Cloudflare + GitHub free tiers.
user taps "check in" in Swarm
│
▼
┌──────────────────────┐
│ Foursquare API v2 │ ◄──── data source
└──────────┬───────────┘
│ /users/self/checkins?limit=1
│ every 60 s
▼
┌──────────────────────┐ ┌──────────────────┐
│ Cloudflare Worker │ ───────► │ Cloudflare KV │
│ checkin-poller │ │ last-seen ts │
└──────────┬───────────┘ └──────────────────┘
│ workflow_dispatch (REST)
▼
┌──────────────────────────────────────────────────┐
│ GitHub Actions (update-dashboard.yml) │
│ │
│ fetch_checkins.py ─► private-data/checkins.csv│
│ fetch_tips.py ─► private-data/tips.json │
│ fetch_photos.py ─► private-data/pix/*.jpg │
│ fetch_ratings.py ─► private-data/venue…json │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ scripts/build.py (orchestrator) │ │
│ │ │ │
│ │ transform.py ── 4-layer city/cntry │ │
│ │ metrics.py ── aggregations, trips, │ │
│ │ companions, shouts, │ │
│ │ cross-dim heatmaps │ │
│ │ gen_*.py × 14 ── templates → *.html │ │
│ │ POST-PROCESS ── {{CTRY_CODE_JSON}}, │ │
│ │ {{CAT_ICON_JSON}} │ │
│ └──────────────────────────────────────────┘ │
│ │ │
│ aws s3 sync ──► Cloudflare R2 (new photos only) │
│ sync_to_d1.py ──► Cloudflare D1 (incremental) │
│ │ │
│ git push ──► public repo (HTML pre-built) │
└──────────────────────────┼───────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Cloudflare Pages (4sq.pages.dev) │
│ │
│ ┌────────────────────────────┐ ┌─────────────────────────┐ │
│ │ STATIC HTML (14 pages) │ │ PAGES FUNCTIONS │ │
│ │ ────────────────────── │ │ ──────────────────── │ │
│ │ index · trips · feed │ │ /api/feed │ │
│ │ companions · tips │ │ /api/search │ │
│ │ ratings · shouts │ │ /api/search-venues │ │
│ │ venues · world_cities │ │ /api/venue-tips │ │
│ │ lists · photos · stats │ │ /api/custom-list │ │
│ │ search · guide │ │ │ │
│ │ + trip-{N}.html × 160 │ │ cache: 60 s browser / │ │
│ │ │ │ 1 h CDN, _v= cache-bust│ │
│ └────────────────────────────┘ └────────────┬────────────┘ │
│ │ │
│ │ SQL │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Cloudflare D1 │ │
│ │ (SQLite, edge-served) │ │
│ │ ───────────────────── │ │
│ │ checkins (65 k rows) │ │
│ │ venues (~33 k) │ │
│ │ tips (~1.9 k) │ │
│ │ ratings (~3.8 k) │ │
│ │ lists / list_venues │ │
│ │ trips (~160) │ │
│ │ venue_changes (audit) │ │
│ └─────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
│
▼
CDN edge → user (any device, any country)
+ browser fetches historical weather per check-in
from Open-Meteo archive (no API key, free)
Three different cache lifetimes serve three different patterns:
the static HTML is immutable per commit (CDN edge caches indefinitely
via content hash), the D1-backed feed/search APIs use a 60 s browser /
1 h edge cache with a _v= query-param cache-buster for schema-shape changes,
and the R2 photo CDN uses long-lived caching since photo filenames are
content-addressed and never change.
Canonical normalization layer. Three pure-data JSON files act as the single source of truth for cross-cutting lookups that used to live inline in nine HTML templates and one Python generator:
config/country_aliases.json raw native name → English
"Беларусь":"Belarus"
"Тоҷикистон":"Tajikistan"
39 entries
│
│ loaded by gen_tips.py at import
▼
CTRY_NORM dict (used by build.py, gen_guide,
tip normalisation on index)
config/country_flags.json English → ISO 3166-1 alpha-2
"Belarus":"by", "Macao":"mo"
112 entries
│
│ loaded by build.py once;
│ substituted into {{CTRY_CODE_JSON}}
│ on EVERY generated HTML file as a
▼ single post-process pass
flagHtml(country) → <span class="fi fi-by"></span>
config/category_icons.json category → [emoji, hex color]
"Kebab Restaurant":["🌯","#E67E22"]
559 entries
│
│ substituted into {{CAT_ICON_JSON}}
▼ on every page in the same post-process pass
catIcon(cat) → <div class="cat-icon">🌯</div>
Adding a new country / category / native-name alias is a one-line JSON edit.
No code changes anywhere. The post-process pass in build.py sweeps every
output file once after generators run — generators don't need to thread these lookups
through their own signatures.
timezonefinder) fails for
countries that don't observe Daylight Saving Time. For example, Belarus geographically
maps to UTC+2 in summer, but politically observes UTC+3 year-round. This causes
check-in local times to be off by one hour for half the year.
metrics.py maintains a _COUNTRY_TZ dictionary mapping
country names to authoritative IANA timezone IDs. This takes precedence over the
coordinate-based lookup. Europe/Minsk is always UTC+3, regardless of what
the geometric timezone boundary says — because that's what the clocks actually show.
checkins.csv contains full location history: GPS coordinates,
venue IDs, timestamps, and companion names across years of travel. Committing this
to a public repo exposes structured personal data to anyone crawling GitHub.
foursquare-data). A fine-grained Personal Access Token scoped
exclusively to that repository allows GitHub Actions to check it out at build time.
The public repository never sees the raw CSV — only the generated HTML output, which
contains only the aggregated, display-ready data already visible on the site.
The PAT has Contents: read/write and nothing else.
index.html.tmpl and gen_worldcities.py implement a
CTRY_CONT JavaScript dictionary mapping every country
to its continent. The matchVisited() function rejects a world-cities
database match unless the candidate city's country falls on the same continent as
the visited check-in's country. This guard is maintained in sync across both files —
a documented invariant noted in CLAUDE.md.
metrics.py runs an 8-pass pipeline over each candidate trip window
to progressively widen its boundaries until they reflect the actual journey:
"airport" in cat.lower()) rather than an exact string, catching all
variants emitted by Foursquare: International Airport, Airport
Terminal, Airport Gate, Airport Service. A
prev_end_idx guard prevents the backward scan from crossing into a
preceding trip's arrival rows. Where the heuristics fall short, three JSON config
files provide surgical overrides: trip_start_overrides
and trip_end_overrides pin exact boundary timestamps,
and trip_names.json / trip_tags.json
attach human-readable names and activity tags (bicycle, camping, etc.) keyed by the
final resolved start timestamp.
/users/self/tips endpoint silently omits any tip written on
a venue that has since been closed or deleted. With no error, no flag, and no indication
in the response, a full fetch of 1 782 tips appeared complete — until a per-venue sweep
revealed 25 additional tips that existed only on closed venues.
A secondary problem: the API returns country names in the local language
(Беларусь, Republica Moldova, المغرب…) rather than
a consistent English form, breaking grouping, flag lookup, and tab rendering.
viewCount field entirely absent from the API response. The export also
revealed that presence in checkins.csv is not a reliable proxy for
venue activity: a venue can appear in historical check-ins and still be closed on Foursquare
today. Determining true closed/deleted status required fetching each venue page individually.
closed=True in tips.json.
The 25 pre-existing sweep tips were identified retroactively by comparing the initial
1 782-tip commit in the data repo against HEAD (1 807 tips); the 25-ID delta was
patched directly.
foursquare.com/v/{id} with browser session cookies
(the public page embeds "closed":true in its __NEXT_DATA__ JSON
or in the raw HTML for closed venues). 95 of 100 venues were confirmed closed on-page;
the remaining 5 loaded on the legacy app.foursquare.com renderer with no closed
marker, indicating they are still active — and their tips were found to have been
deleted by moderators rather than lost to venue closure.
viewCount for all tips — a field the API never
returns. fetch_tips.py was updated to capture viewCount going
forward; historical counts from the export were backfilled into tips.json
in one pass. View counts are refreshed on each full re-fetch (--full);
incremental runs only touch tips newer than the latest known timestamp.
CTRY_NORM dictionary
in gen_tips.py mapping every local-language variant to its English form.
City names reuse the existing city_merge.yaml pipeline. Both normalised
values are stored as nc (country) and nci (city) on each tip
record and propagated into the recent-30 tips slice embedded in index.html.
Tip cards display a red CLOSED badge, a purple
DELETED badge, and a 👁 view count
in the footer. Three dedicated filter buttons — By Date, Closed only,
and Deleted only — let the reader surface each data quality category directly.
checkins.csv three years ago may now point to a venue
with a different name, city, or coordinates — and tips.json, which duplicates
that venue metadata per tip, can drift out of sync independently.
Additionally, full re-fetches occasionally surface a second problem: some check-ins that
existed in the old CSV are simply absent from the API response (deleted or merged
venues), while other rows appear duplicated within the historical data with no indication of
the double-entry.
Both silent drift and silent data loss are impossible to detect without an explicit comparison.
sync_venue_changes.py compares the two snapshots on six fields per
venue_id: venue, city, country,
lat, lng, category. For each changed venue it
patches every matching tip in tips.json in-place — converting lat/lng to
float rounded to 5 dp to match the tips schema — and logs a numbered summary of every
updated tip. The patched tips.json is committed alongside the fresh CSV in
the same atomic commit, keeping both files permanently in sync without any additional
API quota.
checkins_anomalies.json:
(venue_id, date) key appears more
than once — identical double-entries from early Swarm usage. They are intentionally
preserved in the CSV rather than silently removed; the anomaly file provides
visibility without data loss. A duplicate_checkins.csv sidecar is also written
for direct inspection.
Missing rows are check-ins present in the existing CSV but absent from the
API response — venues that Foursquare deleted or merged. These too are preserved and recorded
so the count discrepancy is explained and auditable. Both lists accumulate across runs:
new entries are merged in, existing entries are never removed, giving a permanent history
of every data quality event the re-fetch has ever observed.
/v2/checkins/{id} endpoint and is no longer
surfaced through any current API. With ~65 000 historical check-ins, a one-time
enrichment run was the only way to recover this data before it disappeared entirely.
HTTP 403 responses identically — marking the row as a permanent skip
("-") — so quota-exhausted rows were silently discarded alongside genuinely
inaccessible ones, with no way to tell them apart after the fact.
overlaps_name and
with_name / created_by_name. 408 rows were affected.
--only-ids-file flag accepts a plain text file of one
checkin_id per line. Before building the work queue it resets any row in
that set whose overlaps_id was incorrectly finalised (back to ""),
ensuring the IDs are always re-processed regardless of prior run state.
Sleep was increased from 0.35 s to 1.5 s per call to stay safely below
the quota ceiling for the full 65 000-row run.
overlaps_name / overlaps_id scrubbed of any
name already present in with_name or created_by_name, with
entries reduced to "-" where nothing genuine remained. The surviving
38 genuine overlaps — people who happened to be at the same place at the
same time, entirely independently — were committed to the data repo and rendered in the
companions page.
photos.json) grows over time as new check-ins are added.
A naive run would re-probe every un-indexed check-in on every CI run — including the
~51 000 that were already confirmed to have no photos — wasting quota and time.
/item/{id} path rather than /checkin/{id}
— matching tip IDs, not check-in IDs. They had been silently mis-classified.
photos.html gallery renders 21 000+ images lazily in batches of 300,
with a country/city accordion filter (countries collapsed by default, cities as pill
buttons), a separate tip photos section with its own lightbox mode, and a hero count
that includes both check-in and tip photos with an anchor link to the tip section.
The multi-photo badge on index.html recent-check-in cards shows the first
photo plus a +N overlay when a check-in has multiple images; clicking
navigates through all photos for that check-in in the inline lightbox.
aws s3 sync uploads only new files (those not yet
in the bucket), making each incremental deploy fast regardless of total gallery size.
The --pix-url flag keeps the build fully decoupled: local builds use a
file:/// URI; the deployed site uses the R2 public URL — no code changes
needed between environments.
/users/self/venueokays and
/users/self/venuedislikes return HTTP 402 on every
request, regardless of authentication or token scope. There is no error message
explaining why, no workaround in the documentation, and no indication this will
change. A naive implementation would treat 402 as a transient failure and retry
indefinitely, blocking the CI run.
createdAt timestamp field entirely — the field the ratings page uses
to assign check-ins to years. Without it, those venues sort to the bottom with no
year grouping.
fetch_ratings.py was updated to treat 402 as a permanent
skip, not a retry-able error. The script logs a one-time warning and moves
on — no hang, no false failure. Only the likes endpoint (/users/self/venuelikes)
reliably returns data and is actively fetched.
createdAt timestamps on likes: a backfill pass reads
checkins.csv and, for each liked venue, finds the earliest
check-in timestamp at that venue. This is stored as first_ts
in venueRatings.json and used as a proxy creation date when
createdAt is absent. The ratings page year-grouping logic falls through
to first_ts automatically, so all likes are assigned to a year with no
manual intervention.
const ALL=[...] JSON blob directly
inside the HTML — roughly 11 MB of inline JavaScript. This created
three compounding problems:
search-index.json) was generated at build time, committed to the repo
(~1.5 MB), and served as a static file. Any search therefore reflected data as of the
last deploy, not the current D1 state. The static file also bloated the repo and the
CDN response for every search query.
ALL); position
computation (buildPos) maps each row to a pixel offset using a gap-row
strategy. Month and year counts are served from a static feed_meta.json
pre-built at CI time — zero D1 reads for the calendar. The first 50 items render
immediately; the remainder are fetched in the background via cursor pagination,
each page O(1) because the query uses WHERE date < cursor on an
indexed column rather than OFFSET N.
feed_meta.json
immediately on load — all month anchors are clickable before any D1 query fires.
An isFiltered flag suppresses viewport re-renders while a search is
active; monthMode suppresses them during single-month views.
The feed page went from an 11 MB HTML file that blocked paint to a
56 KB shell that renders the first screen in under one second.
--tips-changed,
--ratings-changed, --lists-changed) that default to
false — so a manual local sync never rewrites unchanged tables accidentally.
CI passes the actual fetch-step output for each flag.
checkins
and venues update on every new check-in; tips,
ratings, and lists only change after explicit fetch runs.
A single sync script must handle all five without accidentally overwriting unchanged
tables when run locally or in CI.
INSERT OR IGNORE keyed on
(venue_id, date). New rows append; existing rows are silently skipped.
A full 65 000-row pass takes ~650 batches but is still O(N) reads with zero
unnecessary writes — D1 discards duplicates at the index level.
INSERT OR REPLACE only for venue IDs that
appear in the current fetch payload. A check-in fetch returns at most a few dozen
new venue IDs, so the venue sync is always a tiny subset of the full table.
--tips-changed, --ratings-changed,
--lists-changed). Each flag defaults to false; CI injects
the actual output of the preceding fetch step. This means a routine check-in sync
that produces no tip changes never touches the tips table — the batch calls simply
don't happen.
--force-* flag that issues a DELETE FROM <table>
followed by a full re-insert, bypassing the change gates. This is also available as a
GitHub Actions workflow with checkboxes per table, so a selective resync never requires
touching local credentials.
totalH ≈ 8.5 million px). Everything between the two batches was an
empty gap that filled from both ends as the user scrolled. This produced three
cascading failures:
AVG_ITEM_H = 10 560 px — about
0.12 % of the total canvas. In practice the prefetch fired only sporadically, leaving
a visible blank region every time the user scrolled faster than the loader.
totalH shrank on every batch.
Each 50-item append grew the loaded height by 50 × 126 px = 6 300 px but reduced
the estimated remainder by 50 × 132 px = 6 600 px — a net loss of 300 px per page.
Near the end of the dataset the browser clamped scrollTop because the
sizer was shrinking under the user, causing the viewport to jump upward.
goYMD used a two-phase approach: resolve a cursor via ?resolve=TS,
then load items around that timestamp. After each splice, rebuildAfterSplice
patched activeIdx to compensate for prepended items — but the correction
was based on array indices, not pixel positions, so any concurrent load could
desynchronise it. Jumping to a date near the bottom of a fresh load left the viewport
pointing at the wrong item.
ALL array
that grows strictly by append (older items) or prepend (newer items):
_loadGen), assigns a new ALL slice, and resets
totalH = 0 before calling buildPos — ensuring
Math.max starts from zero rather than retaining a stale large value:
YM_IDX[ym] from the static
feed_meta.json directly in renderCal, rather than an
accumulated local counter (ymCnts). This prevents double-counting when
the same items are loaded again after a state reset (e.g., Latest → Oldest → Latest).
The result: the calendar sidebar always shows authoritative counts, and all month
anchors are clickable before any scroll event fires.
The build system is intentionally minimal. build.py is the single orchestrator:
it loads YAML/JSON config, calls transform.py to normalise city and country names,
calls metrics.py to compute all aggregations and trip detection, then renders
two Jinja-free template files using simple {{PLACEHOLDER}} substitution.
Each per-page generator (gen_companions.py, gen_venues.py,
gen_tips.py, gen_ratings.py, gen_shouts.py, …)
reads a plain templates/*.tmpl file and substitutes its own placeholders.
After every generator runs, build.py does a single post-process pass
on every output HTML file, substituting two cross-cutting placeholders —
{{CTRY_CODE_JSON}} and {{CAT_ICON_JSON}} — from
config/country_flags.json and config/category_icons.json.
This means every page shares one canonical 112-entry country→flag map and one
559-entry category→icon map; adding an entry is a one-line JSON edit, no code changes
across the nine templates that consume them.
gen_feed.py is an exception: after the migration to D1-backed pagination
(see Challenge 13), it no longer embeds any data. It simply renders the template shell
with the {{SWARM_USER_ID}} placeholder substituted — the actual check-in
data is fetched at runtime from /api/feed. Similarly, the full-text search
index (search-index.json) is no longer generated or committed; search is
served live by functions/api/search.js querying D1 directly.
Trip detection in metrics.py runs a single-pass scan over the sorted
check-in sequence: any consecutive run of check-ins where city ≠ home_city
and the run length exceeds min_checkins (configurable) is declared a trip.
Trip names are auto-generated from the most-visited countries and cities in that sequence.
Every choice optimises for zero operational overhead and long-term maintainability — no framework churn, no node_modules in the build pipeline, no containers to patch.
python scripts/build.py and serve from any static host.
index.html.tmpl and gen_worldcities.py),
the constraint is explicitly documented in CLAUDE.md — so future
AI-assisted edits know to update both files together.