---
name: web-recon
description: "Reverse-engineer web flows: SPA/API reconnaissance, checkout/form capture, promo/offer investigation, restaurant ordering, and server-side emulation. Use when the user needs to understand, verify, or replay any JavaScript-heavy web flow where direct curl fails."
category: software-development
metadata:
  hermes:
    emoji: "🔍"
---

# Web Reconnaissance & Flow Emulation

General methodology for reverse-engineering web-based flows: investigate landing pages, discover hidden APIs, capture network traffic via browser automation, and emulate API calls server-side. Use this for SPA checkouts, promo/offer verification, restaurant ordering, membership validations, or any JS-heavy web flow.

## Core workflow

### Phase 1: Reconnaissance
1. **Open the target URL** — extract with `web_extract` or load in browser. Preserve what it actually returns, even if broken.
2. **Strip the claim to concrete terms** — identify merchant, issuer, product, benefit hook, campaign IDs, URL params (`utm_*`, `fbclid`, `c=`, `o=`).
3. **Search authoritative sources** — official pages, terms, FAQs, APIs. Prefer official over blogs:
   - `"<brand>" "<benefit>" "<merchant>"`
   - `site:<domain> <term>`
   - Campaign identifiers from URL params

### Phase 2: Dynamic page / hidden API discovery
If extraction returns "Loading…" or SPA shell:
1. Fetch raw HTML with `requests`/`curl`
2. Extract JS/CSS bundle URLs from `src`/`href`
3. Search bundles for merchant names, campaign IDs, API base URLs, and terms:
   ```python
   for term in ['beneficio', 'promocion', 'tope', 'vigencia', 'merchant', 'campaign']:
       m = re.search(term, js, re.I)
       if m:
           print(js[max(0, m.start()-300):m.start()+700])
   ```
4. Look for official API endpoints embedded in bundles
5. For paginated APIs, try small `pageSize` and `page=1` before assuming zero-based

### Phase 3: Network capture (for form/checkout flows)
Use Playwright with full XHR/fetch interception:
```python
def log_request(request):
    if request.resource_type in ('xhr', 'fetch', 'preflight'):
        network_log.append({
            'type': 'request', 'url': request.url,
            'method': request.method, 'post_data': request.post_data,
        })

def log_response(response):
    if response.request.resource_type in ('xhr', 'fetch', 'preflight'):
        network_log.append({
            'type': 'response', 'url': response.url,
            'status': response.status, 'body': response.text()[:5000]
        })
```

### Phase 4: Form interaction
- Text inputs: `page.fill()` or character-by-character `page.type()` for stubborn SPAs
- MUI/React dropdowns: click component → wait → click `li[data-value="..."]`
- Checkboxes: `page.check()`
- SPA rendering delay: add `wait_for_timeout(4000-8000)` after `networkidle`
- **Screenshot after each step** to catch client-side validation errors

### Phase 5: API emulation
Replay captured endpoints with Python `requests`:
```python
payload = {"form": {...}, "order": order}
r = requests.post(f"{BASE}//order/{slug}", json=payload,
                  headers={"Content-Type": "application/json"})
```

## Domain-specific subskills

### Promo/offer investigation
When evaluating ads, bank promos, or retail benefits:
- Lead with the practical conclusion, not process notes
- Distinguish ad hook from verified terms
- Inspect URL params (`c=`, `o=`) for hidden CMS content (Banco Patagonia pattern)
- Do API pagination with `pageSize` control; try `page=1` before `page=0`
- For membership/venue promos, do the usage math: monthly cost × realistic visits
- Answer format: Conclusión → Verificado → No pude confirmar → Mi lectura
- **Key pitfall:** do not stop at a broken landing page; continue through CMS/API bundles

### Restaurant ordering (MasDelivery pattern)
Many Argentine restaurant sites on `pedidos.masdelivery.com` are API-driven:
- POST JSON action arrays to `front-api.php`
- Recon actions: `CheckSlug`, `GetLocations`, `GetLocationMenu`, `GetProductsDetails`
- **Danger endpoint:** `SaveOrder` finalizes a real order — never call without confirmation
- Passwordless login available via `SendPasswordlessLoginEmail` / `PasswordlessLogin`
- Persist confirmed favorites to memory for repeat orders

### SPA/checkout flow capture
When the user needs to understand form submissions or multi-step checkouts:
- Filter captured XHR by domain (ignore GA/GTM/ad noise)
- Double-slash in paths: replicate exactly, don't normalize
- Province/city dropdowns: test common values; verify with screenshots
- Delivery pattern: create minimal HTML that auto-replays captured API calls, deploy via `pipo-deploy`

### Conditional form field discovery (Marketo / HubSpot / similar)
Many modern marketing forms (Marketo Forms 2.0 is the modal case, with `mktoweb.com/index.php/form/XDFrame` as the giveaway iframe URL) render a visible parent form AND a hidden Marketo XDFrame iframe. The parent shows the initial state; the iframe holds the full schema including conditional fields that are `display: none` until a trigger fires. **A static snapshot or `web_extract` of the page returns only the visible fields — you can miss required fields entirely.**

When a form has a checkbox / radio that "asks for a justification" or "reasons you need X" but the textarea isn't visible until you check the box, treat it as a hidden required field. Discovery recipe:

1. `browser_cdp` → `Target.getTargets`. Find the entry whose `type` is `iframe` and `url` contains `mktoweb.com` (or `hubspot.com`, `marketo.com`, etc.) — note the `targetId`.
2. `browser_cdp` → `Runtime.evaluate` with `target_id=<that targetId>` and `expression="JSON.stringify(Array.from(document.querySelectorAll('input,textarea,select')).map(i => ({name:i.name,id:i.id,value:i.value,required:i.required,visible:i.offsetParent!==null,label:(i.closest('.mktoFormRow')?.querySelector('.mktoLabel')?.innerText||'').trim()})))"`. Note: `browser_cdp` deserialization can mangle the `returnByValue` flag at certain param positions — wrap the expression in an IIFE and `JSON.stringify` the result to force a plain string return.
3. Fields with `visible:false` AND `label` containing `*` are conditional required fields. The form has more questions than the visible DOM suggests.
4. If `browser_console` is more convenient than `browser_cdp`, it works too — it executes in whatever page/frame the browser tool is currently focused on. Just verify you're pointed at the right target (the iframe, not the parent) by re-navigating first.
5. Alternative: trigger simulation. In the visible parent form, programmatically set the trigger checkbox's `checked=true` and dispatch `new Event('change',{bubbles:true})` — then re-snapshot. The parent React form should render the new field. `browser_click` on the visible label sometimes doesn't fire React's onChange; toggling the underlying input directly is more reliable.

Practical impact: any time the human says "this form has a question I didn't see", or you see a checkbox with text like "I'm interested in X" where checking X *seems* like it should ask follow-ups, **always enumerate the full schema before drafting answers**. The visible page is a lie about the form's true length.

## Common pitfalls

1. **Stopping at broken pages.** Campaign URLs can fail while APIs, CMS folders, or legal text still exist. Continue with URL-param inspection, bundle analysis, and hidden CMS paths.
2. **Treating ad copy as terms.** Ads say "beneficios" without caps or eligibility. The answer must distinguish hook from verified facts.
3. **Over-certifying vague benefits.** If exact terms can't be confirmed, say so and explain practical implications.
4. **Dumping raw JSON.** Extract only benefit, cap, dates, eligibility, and recommendation.
5. **SPA rendering delay.** Even after `networkidle`, components may not be mounted. Add extra timeouts and check DOM state.
6. **Client-side validation blocking.** Forms may silently block submission. Check screenshots for red outlines.
7. **Google Analytics noise.** 90% of captured XHR may be GA/GTM/ads. Filter by actual API domain.

## Verification checklist

- [ ] Original URL result checked
- [ ] Official sources searched
- [ ] Dynamic bundles or APIs inspected if extraction was incomplete
- [ ] Benefit/cap/dates/eligibility identified or explicitly marked unconfirmed
- [ ] Final answer starts with practical conclusion, not process notes
- [ ] For form flows: screenshot at each step, API payloads documented
- [ ] SaveOrder/submit endpoints NOT called without explicit user confirmation

## References

- `references/sportclub-checkout.md` — SportClub Argentina checkout API: endpoints, payload structure, DNI split technique, tested promotions (ACA 25%, BAC 50%)
- `references/banco-patagonia-sportclub.md` — Banco Patagonia hidden CMS `/DM/RC<o>.CARPETA` pattern for recovering promo legal text from broken landing pages
- `references/masdelivery-sushitime.md` — MasDelivery/Sushi Time Belgrano: concrete API action IDs, product option IDs, ordering notes

## Skills absorbed

This umbrella merges the following former skills (archived):
- `promo-offer-research` — promo/offer investigation methodology
- `restaurant-ordering` — restaurant order preparation via MasDelivery API
- `web-api-flow-capture` — SPA/checkout API flow capture and emulation
- `booking-scraper` — Booking.com stealth scraper (Camoufox + DataDome bypass). The reusable script moved to `scripts/booking_search.py`; the recipe moved to `references/booking-com-stealth-scraper.md`. The "Verified Hotel Catalog" table was intentionally dropped (session-specific data, not reusable knowledge).
- `personal-browser-cdp` — how to use the user's real Chrome via the existing CDP connection. Preserved as `references/personal-browser-cdp.md` (with deployment-specific SSH/Tailscale details dropped; the user replaces those with their own).

## Anti-bot browser scraping (Camoufox pattern)

When a site blocks standard Playwright/Chromium and DataDome is involved, the Camoufox stealth-browser pattern documented in `references/booking-com-stealth-scraper.md` (with reusable script in `scripts/booking_search.py`) is the working approach:
- Camoufox (modified Firefox) bypasses DataDome at the binary level
- GeoIP-matched timezone/locale/language via `camoufox[geoip]`
- Real WebGL/canvas/navigator fingerprints (no `navigator.webdriver`)
- Progressive human-like scroll to trigger lazy loading

This is the third tool in the recon toolbox after `curl` reconnaissance and Playwright network-capture — use it when both fail at the anti-bot layer.

## Using the user's live browser (CDP)

When the recon target needs the user's existing session (logged into Gmail, GitHub, a bank dashboard, etc.) — and a fresh Playwright context would hit a login wall — use the user's real Chrome via the existing CDP connection. Reference: `references/personal-browser-cdp.md`. Absorbed from the former `personal-browser-cdp` standalone skill. The full original SKILL.md was 44 lines and is preserved as a clean reference (without the hardcoded SSH-tunnel incantation and Tailscale IP, which were tied to one specific deployment).
