Apollo-level business intelligence from 100% public data. Round-robin discovery across 5 states, 65 categories, and 4 sources — with inline enrichment that pushes data to your Google Sheet in real-time. 9 email extraction methods, WHOIS age scoring, and crash recovery built in. The pipeline agencies charge thousands to build — yours in one command.
v6.1's key innovation: inline enrichment. Data flows into your Sheet continuously as each chunk completes — no waiting hours for all discovery to finish first. Discover → enrich → push → rotate to next category.
Round-robin across 5 states × 4 sources × 54 cities. Shuffled order every run. Global dedup on every insert — zero duplicates across all states and sources.
Each chunk: find website → visit → extract contacts → 9 email methods → social profiles → WHOIS age → MX verify. Data flows immediately.
Google Sheets dashboard updated every 30 seconds. Discovery counts, enrichment progress, error rates, per-source breakdowns, and per-state results.
Per-category checkpoints, signal handlers (SIGTERM/SIGINT/SIGHUP), global dedup rebuild on resume. Just re-run the same command — picks up where it left off.
Each source uses Puppeteer with stealth plugin for JavaScript rendering. Smart retry with exponential backoff, automatic proxy rotation, and human-like scrolling behavior.
Puppeteer + stealth plugin. Extracts name, phone, address, website. Pagination support with configurable limits.
Smart auto-skip after 5 consecutive blocks. Longer delays for anti-bot evasion. Full business profile extraction.
React SPA rendered via headless Chrome. Accredited business data with ratings and complaint history markers.
New in v5.0. Aria-label extraction from search results. /maps/search/ URL pattern for reliable results.
The engine crawls every discovered business website and applies 9 extraction methods in sequence. Found emails are verified via DNS MX record lookup. Pattern inference generates likely emails for staff found without them.
Parses structured data for contact info
Extracts from href="mailto:" patterns
Scans data-email, data-contact attrs
Parses team/about pages for contacts
Comprehensive email pattern matching
Decodes [at], (at), etc. patterns
Targeted extraction from page footers
Checks meta tags for contact emails
Falls back to phone when no email found
Plus: email pattern inference from known contacts (first.last@, flast@, firstl@, etc.) — generates likely emails for staff found without them. All verified via DNS MX records.
Pre-configured categories spanning local services, retail, food & beverage, health, professional services, and home services. Run one category at a time or combine them.
Showing 30 of 65. Full list includes retail, ecommerce, food & beverage, health & wellness, professional services, and home services categories. See all 65 →
5 pre-configured states (54 cities): AZ (15 cities), NV (10 cities), OH (12 cities), ID (10 cities), WA (12 cities). Use --state ALL for round-robin across all states, or --state AZ for a single state. --max 500 and --chunk 100 for custom limits.
Every discovered business gets a complete profile pushed to Google Sheets in real-time batches of 10.
| Column | Description | Source |
|---|---|---|
| First Name | Contact first name | Website crawl |
| Last Name | Contact last name | Website crawl |
| Verified email address | 9 extraction methods + MX verify | |
| Title | Contact job title | Staff card parsing |
| Company Name | Business name | Discovery sources |
| Location | City, State | Discovery sources |
| Website | Business URL | Discovery + DuckDuckGo fallback |
| Phone | Normalized phone number | Discovery sources |
| Facebook profile URL | Website crawl | |
| Instagram profile URL | Website crawl | |
| LinkedIn profile URL | Website crawl | |
| Twitter/X | X profile URL | Website crawl |
| Source | Which discovery source found it | Yellow Pages / Yelp / BBB / Maps |
| Confidence | Data quality score | Computed |
| Biz Age | Business age label | WHOIS/RDAP lookup |
| Year Founded | Domain registration year | WHOIS/RDAP lookup |
| Industry | Searched category | Input parameter |
| Date | Discovery timestamp | Auto-generated |
Same playbook as the Google Ads API Agent. The entire engine — 1,763 lines in a single file — is MIT licensed. v6.1 adds round-robin discovery, inline enrichment, crash recovery, and --state ALL. Extend sources, add categories, or plug in your own CRM.
Part of the Google Ads Agent ecosystem. Built by John Williams — Senior Paid Media Specialist at Seer Interactive with 15+ years managing $48M+ in digital advertising.