Permit Enrichment Agent
Autonomous multi-tier enrichment pipeline: SQL joins, Haiku fuzzy match, Sonnet deep analysis, cross-node sync. Cost-tracked, anomaly-monitored, SMS-reported.
Turns raw permit records into qualified sales leads automatically. Joins property sales, exact-and-fuzzy-matches contractors and applicants, scores leads, and pushes only verified enrichments to the primary node. The cost-tracked, anomaly-gated design means the LLM spend is auto-bounded and silent regressions are caught.
A standalone orchestrator at /home/will/permit-enrichment-agent on R730-2 that runs three tiers of enrichment over the permit graph.
Tier 1 is pure SQL: address normalization, exact-match joins against property_sales and contractor_licenses, PostgreSQL trigram fuzzy matching with auto-accept above 0.7 and an escalation queue between 0.4 and 0.7. A typical run enriches ~4,600 leads in under 10 minutes.
Tier 2 escalates the trigram-ambiguous matches to Claude Haiku in batches of 20; Haiku accepts above a tuned confidence of 0.85 and rejects below 0.4. Anything in between escalates again to Claude Sonnet with a 0.7 accept threshold. Every LLM call is logged with token counts and dollar cost in a SQLite ledger (costs.db) for per-day and per-month budget reporting.
Tier 3 is rich Sonnet analysis on the highest-value leads, plus a description miner (Haiku-only) that extracts septic-tank size and inspection-status hints from free-text permit descriptions. The agent also runs an anomaly detector that compares today's tier-1 stats to the 7-day median and fires a RingCentral SMS to the operator on underflow (today=0 vs median 1 already caught a regression on 2026-04-24).
The nightly sync pushes enriched rows from R730-2 to T430 with a regression gate: if any field has lower coverage on R730-2 than T430, the sync aborts and alerts. Twilio is used for phone validation on top-priority phones; a website finder enumerates contractor URLs and classifies them. The whole thing is flock-guarded so cron drift can't fire two instances.
Cron is paused since 2026-04-24 after a Tier-2 batch returned 2,250 consecutive Anthropic API errors. The Tier-1 SQL pipeline is independently working and ready to re-enable; Tier-2 needs API stability work or a fallback to the on-prem Sovereign LLM cluster (Qwen 3.5 122B) to bypass external API risk.
- > 4,653 property-sales joins per run on 2026-04-24 (last live run)
- > ~2,000 contractor + 250 applicant trigram escalations per run
- > 3-tier LLM pipeline: trigram → Haiku → Sonnet with tuned thresholds
- > SQLite-backed cost ledger with per-day and per-month budget tracking
- > RingCentral SMS alerts to +1 979-236-1958 on anomaly underflow
- > Cross-node sync with regression gate (R730-2 → T430)
- > flock-protected single-instance guarantee
- → Re-enable Tier-1 SQL cron (works standalone, no external API)
- → Migrate Tier-2 fuzzy matching to local Qwen 3.5 122B (Sovereign LLM Cluster) to eliminate external API risk
- → Productize as a per-vertical enrichment SKU once stable
- ! Tier-2 Anthropic API instability caused 2,250-error cascade on 2026-04-24 → cron killed
- ! No git remote — code lives only on R730-2; back up or move to wburns02/permit-enrichment-agent