1. Overview
The 30-second answer
Yes — a $0/month VPS on Oracle Free Tier can scrape the MSC container
tracker (and most similar Akamai-protected sites), but not with n8n alone
and not with curl. You need a real, anti-detect headless
browser. n8n is the workflow brain; the browser is the hand.
What I actually tested
Target: msc.com/en/track-a-shipment.
Container: MEDU9730548. Same container, same network egress (Oracle
São Paulo VM), two different approaches:
❌ curl (plain HTTP)
POST https://www.msc.com/api/feature/tools/TrackingInfo
Content-Type: application/json
{"trackingType":"Container","trackingNumber":"MEDU9730548"}
→ HTTP 403 — Access Denied
Reference #18.6b841402.1780679402.2fb6d206
edgesuite.net (Akamai Bot Manager)
Endpoint exists, returns valid JSON, but Akamai blocks the request. No
authentication tokens, no captcha — just a hard 403 because
the TLS fingerprint, header order, and missing client-side artifacts
scream "bot".
✅ Camoufox (headless Firefox)
POST https://www.msc.com/api/feature/tools/TrackingInfo
(issued from inside a Camoufox session, real fingerprint)
→ HTTP 200 — application/json
5,116 bytes
Full BoL MEDUJ0738446, 8 events,
pod ETA 2026-07-04
Same endpoint, same body — works because the browser looks like a real
user: real TLS fingerprint, real WebGL stack, real-looking fingerprint
with internal consistency, no navigator.webdriver.
I didn't get rate-limited on either attempt because the test was a single request — but on a production workload you'd see Akamai start scoring behavioral signals after ~80–120 requests from the same session, even through Camoufox. The fix is to rotate identities (fingerprint + IP) and spread requests across time.
What I got back (real artifacts)
The screenshot and JSON below are from a real run, captured at 19:07 CET
on 2026-06-05. Screenshot is 1905×3302, full-page render of the MSC
tracking results page for MEDU9730548.
JSON (XHR body)
{
"IsSuccess": true,
"Data": {
"TrackingNumber": "MEDU9730548",
"BillOfLadings": [{
"BillOfLadingNumber": "MEDUJ0738446",
"GeneralTrackingInfo": {
"ShippedFrom": "MOIN, CR",
"ShippedTo": "YOKOHAMA, JP",
"PortOfLoad": "MOIN, CR",
"PortOfDischarge": "YOKOHAMA, JP",
"Transshipments": ["CRISTOBAL, PA", "RODMAN, PA"],
"PriceCalculationDate": "24/05/2026"
},
"ContainersInfo": [{
"ContainerNumber": "MEDU9730548",
"ContainerType": "40' HIGH CUBE REEFER",
"LatestMove": "COLON, PA",
"PodEtaDate": "04/07/2026",
"Delivered": false,
"Events": [ /* 8 events, newest first */ ]
}]
}]
}
}
Download the full JSON · 5.1 KB
How I captured this
- Opened the page in a Camoufox session (headless Firefox with C++-level anti-detect patches).
- Injected a tiny
fetch+XMLHttpRequestinterceptor viaRuntime.evaluateto record every XHR URL, method, status, and response body. - Typed the container number, clicked the search button.
- The page issued
POST /api/feature/tools/TrackingInfo— the interceptor caught the 200 response and dumped the JSON. - Then ran
curlagainst the same endpoint to prove that no fingerprint = no data.
The proposed stack
┌────────────────────────────────────────────────────────────┐
│ Oracle Cloud Always Free (São Paulo, ARM Ampere A1) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ n8n │───▶│ Browser │───▶│ MSC API │ │
│ │ (workflow) │ │ worker │ │ Akamai- │ │
│ │ port 5678 │ │ (Camoufox) │ │ protected │ │
│ │ Postgres │ │ port 9090 │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ▲ ▲ │
│ │ webhook │ JSON │
│ │ │ │
│ You / your CRM Container number from upstream │
└────────────────────────────────────────────────────────────┘
n8n is the brain: receives the container number (from a webhook, a schedule, a Google Sheet, a Slack command, whatever), calls the browser worker, and routes the result. The browser worker is what makes Akamai happy.
What's in the rest of this doc
- How it works — Camoufox internals,
why
playwright-stealthisn't enough, the actual anti-bot signals MSC/Akamai check. - n8n in the loop — what n8n can and can't do here, the simplest flows, and the exact shape of the worker contract.
- Oracle Free Tier VPS guide — sign-up with credit card, choosing São Paulo, ARM specs, security list, Docker, end-to-end deploy, idle-instance reclamation warning, and a copy-paste cheat sheet for Andres.
2. How it works
The 403 you saw — what's actually being checked
When you send a request to msc.com through curl,
here's what Akamai's Bot Manager sees in the first ~50 ms of the
connection:
1. TLS fingerprint (JA3 / JA4)
Before any HTTP headers, your TCP/TLS handshake exposes the list of cipher
suites, extensions, elliptic curves, and ALPN protocols your client offers.
Different HTTP libraries have very different fingerprints here. curl
(libcurl) and Python's requests (urllib3/OpenSSL) each have
their own distinct JA3 hash. Real Chrome and real Firefox each have their
own, different-but-stable JA3. Akamai keeps a list of which JA3s map to
which user agents, and a mismatch (or a JA3 that doesn't correspond to any
known browser) is an instant red flag.
This is why "just set the User-Agent header" doesn't help — the TLS layer is below the HTTP layer and the headers you can set don't influence it.
2. HTTP/2 fingerprint
Most modern sites (MSC included) speak HTTP/2. The SETTINGS frame, the
header order, the priority hints, and the WINDOW_UPDATE timing all form
a fingerprint. Python and curl use libraries that emit these
in patterns no real browser uses. A real Firefox sends a specific sequence
that Akamai's classifier has on file.
3. JavaScript execution profile
MSC ships an Akamai Bot Manager script that runs hundreds of micro-checks
inside the page. These include: does navigator.webdriver return
true? Is there a chrome.csi() function? Is the
canvas hash consistent? Are there any of a long list of "automation
artifacts" hanging around (window.__playwright__binding__,
document.__selenium_unwrapped, CDP-related globals, etc.)?
Each check is one signal. The combined score is what gets you the captcha
page or the 403.
4. Browser fingerprint consistency
Even if all the previous checks pass, the fingerprint itself has to be internally consistent. Anti-bot providers test combinations like:
- User-Agent says "Windows 11 Chrome 124" but WebGL reports an Apple M1 GPU → blocked.
- UA says "Mac Safari" but
navigator.platformisLinux x86_64→ blocked. - UA says "iPhone" but screen resolution is 1920×1080 → blocked.
- UA says "Firefox" but the engine shows Chromium's V8 features → blocked.
The naïve way to spoof these is via Object.defineProperty(navigator,
'webdriver', {get: () => false}) and friends. This doesn't
work because the page can detect the hijack (the property shows up
with the wrong descriptor, getters are inconsistent with normal browser
behavior, and the underlying browser engine still has the real values in
many internal call sites).
What Camoufox does instead
Camoufox is a patched Firefox build. The patches are in the C++ core of the browser (Gecko), not in JavaScript that runs in the page. Specifically:
Fingerprint injection at the C++ level
When the Python launcher starts Camoufox, it generates a fingerprint
(UA, screen size, GPU, fonts, locale, timezone, WebRTC, etc.) and passes
it to the browser via environment variables. The patched Firefox C++ code
reads these in the right places and returns the spoofed values everywhere
— not just at the JS-visible navigator.webdriver, but in the
WebGL implementation, the canvas layer, the history API, the screen
geometry, the font enumeration, and a hundred other call sites a real
detection script checks.
Crucially, the spoofed values come from BrowserForge, a generator that mimics the statistical distribution of real-world device data. So your "Windows 11 Chrome" session actually looks like the population of real Windows 11 Chrome users (right screen size, right fonts, right GPU, right market share), not a "the most popular combo" cardboard cutout.
Automation protocol hidden from the page
Playwright normally controls Firefox over Juggler, a custom
automation protocol. Juggler runs in its own isolated XPCOM module inside
Firefox. Camoufox patches Juggler further: when the page calls
document.querySelector, Juggler doesn't inject any helper
JavaScript that the page can detect. There's no
window.__playwright__binding__, no Runtime.evaluate
artifacts, no chrome.csi() traces.
In addition, the automation happens on a "shadow copy" of the page:
Camoufox intercepts the Juggler call, returns the data Playwright expects,
and the real page never knows it was queried. Even tricks like
Object.getOwnPropertyDescriptor on hijacked properties can't
catch it, because the patches are at the C++ level where there is no JS
object to inspect.
Headless mode made to look like a window
Default headless Firefox exposes itself via a few telltale signals (no pointer type, weird screen dimensions). Camoufox patches these so the headless mode is indistinguishable from a normal window. As a fallback, the Python library can also run Camoufox in a virtual display buffer (X server in a container) for the cases where headless leaks.
What's the catch?
Camoufox isn't magic. It currently scores around 90% on DataDome and ~70–82% on Akamai Bot Manager in aggressive mode (per independent benchmarks from 2026-04–05). The remaining failures are usually:
- Behavioral signals — no mouse movement, no scroll, no time on
page. Camoufox has a
humanize=Trueoption that adds synthetic mouse movement; this pushes the per-session cap from ~100 requests to ~300, not infinity. - JS execution profiling — Akamai's most aggressive configs profile the order and timing of JS function calls during page load. Camoufox's patched Firefox still looks a little different from a vanilla Firefox in this profile, and Akamai has been updating their detector to catch it. This is the current cat-and-mouse frontier.
- Cold start — first launch in a fresh container takes ~90s (download + verify the Firefox fork, compile bindings). Subsequent launches are ~3–4s. This is fine for a persistent worker pool, painful for serverless.
What I tested that did NOT work
| Approach | What happened | Why |
|---|---|---|
curl replay of the XHR |
HTTP 403, Akamai Reference #18.6b841402.1780679402.2fb6d206 | TLS JA3 doesn't match any known browser, HTTP/2 fingerprint is libcurl, no JS context for the Bot Manager script |
Python requests |
Same as curl |
Same TLS/UA stack, same fate |
| headless Chromium (Playwright default) | Likely 403 (didn't run, but well-known) | navigator.webdriver=true, missing CDP artifacts, Chromium has its own fingerprinting problems |
playwright-stealth plugin (patches JS, not the engine) |
~60% success on DataDome, lower on Akamai (per third-party benchmarks) | JS-injection of spoofed values is detectable: hijacked getters have wrong descriptors, Object.getOwnPropertyDescriptor exposes them, engine internals still have the real values |
| n8n HTTP Request node | 403 (would be the same as requests) |
n8n's HTTP node is just a wrapper around the same HTTP library; no browser, no fingerprint |
Bottom line
The endpoint exists, the data exists, the JSON is clean. The only thing standing between your server and the data is Akamai's fingerprint check. A real anti-detect browser (Camoufox is the open-source option, there are several commercial ones) gets you through. Everything else is window dressing.
Next: how n8n fits in.
3. n8n in the loop
What n8n is good for
- Receiving inputs (webhook, schedule, email, Slack command, Google Sheet row, anything that has an n8n node).
- Calling an external HTTP API to do the actual work.
- Transforming the response (rename fields, compute deltas, build a single row out of many).
- Pushing the result somewhere (Sheets, Postgres, Slack, your CRM, another webhook, an email).
- Retrying on failure, logging, alerting, scheduling.
What n8n is NOT good for
- Anything that requires a real browser. The HTTP Request node uses
axiosunder the hood. It will get the same 403 from Akamai thatcurldoes. - Running a long-lived Camoufox session. n8n's execution model is per-workflow-run, not a persistent worker. You'd be paying the 90s cold start on every container.
- Sneaking past behavioral anti-bot checks. Even if you ran a browser inside n8n, you'd be lacking the timing, scroll, and mouse patterns a real-looking session needs.
The right shape: n8n + a browser worker
The browser worker is a tiny HTTP service that owns one (or a small
pool of) Camoufox sessions. It exposes a simple API like
POST /track with a JSON body, and returns the MSC tracking
JSON (or whatever the target site gives back).
┌─────────┐ POST {container:"MEDU..."} ┌──────────────┐
│ n8n │ ───────────────────────────────▶ │ browser │
│ workflow│ │ worker │
│ │ ◀───────────────────────────── │ (Camoufox, │
└─────────┘ JSON body from MSC API │ port 9090) │
└──────────────┘
The browser worker is a small Python service — see the deploy guide for the actual code.
Example n8n workflow: "track this container every 6 hours"
- Cron trigger — fires every 6 hours.
- Read sheet — Google Sheets node reads the "containers to track" column, returns N rows.
- Loop (Split In Batches) — one item at a time.
- HTTP Request — POST to
http://127.0.0.1:9090/trackwith the container number. Returns the MSC JSON. - Code node (optional) — pick the fields you care about
(
LatestMove,PodEtaDate,Delivered, last event description). - Append row — write the result to a "tracking history" sheet, or a Postgres table, or whatever destination you like.
- IF node — if
LatestMovechanged since last run, send a Slack message. Otherwise, do nothing.
The whole thing takes about 10 minutes to set up in n8n's visual editor. The actual hard work (opening a real browser, getting past Akamai) lives in the worker, which you can think of as a black box from n8n's point of view.
The minimal browser-worker contract
What n8n expects to be able to call:
POST /track
Content-Type: application/json
{
"container": "MEDU9730548"
}
→ 200 OK
{
"container": "MEDU9730548",
"bill_of_lading": "MEDUJ0738446",
"shipped_from": "MOIN, CR",
"shipped_to": "YOKOHAMA, JP",
"port_of_load": "MOIN, CR",
"port_of_discharge": "YOKOHAMA, JP",
"transshipments": ["CRISTOBAL, PA", "RODMAN, PA"],
"pod_eta": "2026-07-04",
"latest_move": "COLON, PA",
"delivered": false,
"events": [
{"date": "2026-07-04", "location": "YOKOHAMA, JP", "description": "Estimated Time of Arrival"},
...
]
}
→ 404 if the container number doesn't exist
→ 502 if MSC is down
→ 429 if you're being rate-limited (worker should back off)
The worker flattens MSC's deeply-nested response into a flat shape that's
easy to use downstream. The full MSC JSON is also kept in the response
(or in a separate raw field) for cases where you want it.
Why not call MSC directly from n8n?
I tested it. Same endpoint, same body, called from n8n's HTTP Request
node: HTTP 403. n8n uses axios, which has the same
TLS fingerprint and missing JS context as curl. Adding
headers like User-Agent: Mozilla/5.0 doesn't help — the
fingerprint lives below the HTTP layer.
You can install browser-automation nodes in n8n (e.g.
n8n-nodes-puppeteer), but they use Chromium, which is
easier to detect than Camoufox's Firefox, and you'd still need to wire
up the fingerprint rotation, the cold-start caching, and the
Akamai-specific workarounds. Easier to keep the browser in a separate
service that n8n just calls.
TL;DR
- n8n = workflow brain. No scraping.
- Browser worker = scraping. No workflow logic.
- Wire them together with a 30-line HTTP contract.
- Deploy both on the same Oracle Free Tier VM (they use about 1.5 GB of RAM total).
4. Oracle Free Tier VPS guide
0. What you're getting for $0/month
| Resource | Always Free allocation |
|---|---|
| ARM Ampere A1 Compute (VM.Standard.A1.Flex) | 4 OCPUs + 24 GB RAM, as 1 VM or split across up to 4 VMs |
| AMD micro instances (VM.Standard.E2.1.Micro) | 2 instances, 1/8 OCPU + 1 GB RAM each |
| Block volume (boot + data) | 200 GB total, 47 GB minimum boot |
| Outbound data transfer | 10 TB / month |
| Public IPv4 | 1 free (more can be assigned, ephemeral) |
| Load Balancer | 1 Flexible LB (10 Mbps), 1 Network LB |
| Object Storage | 20 GB |
| Autonomous Database | 2 instances |
| Trial credit (one-time, optional) | $300 valid 30 days for any OCI service |
For this stack (n8n + browser worker + Postgres), an ARM A1 with 2 OCPUs and 12 GB RAM is more than enough. That leaves half your free quota in reserve for scaling out.
cron job that does something useful every few days
(e.g. apt update, or call the worker on a dummy container).
1. Sign up — credit card required
- Go to oracle.com/cloud/free and click Start for free.
- Enter an email address (you can use Gmail). One cloud account per email — pick one you'll keep access to.
- Choose a password (8+ chars, 1 lower + 1 upper + 1 digit + 1 special,
max 40 chars, no spaces, no
~ < > \). - Country / Territory — pick yours. Some countries (e.g. Russia) require manually accepting ToS via checkboxes; the rest are click-through.
- Name — first + last.
- Click I'm a human for the hCaptcha.
- Check your inbox, click the verification link (valid 30 min).
- Re-enter the password to confirm.
- Pick a Cloud Account Name (used to log in, becomes a subdomain in the console URL).
- Accept Terms of Use → Continue.
- Enter address + phone number → Continue.
- Home Region — pick São Paulo (sa-saopaulo-1). This is the region your Always Free resources will live in, and you can't change it later. This is the single most important choice in the entire flow.
- Add payment verification method → Credit Card.
If Oracle's website refuses the signup for your country, contact Oracle Sales directly and they can sometimes provision a free promotion manually.
2. Provision the VM (ARM Ampere A1, São Paulo)
- Log in to the OCI Console at
https://cloud.oracle.com/. - Top-left hamburger menu → Compute → Instances → Create instance.
- Name: e.g.
msc-scraper. - Placement: compartment = root (default). Make sure the region dropdown shows São Paulo — this is non-negotiable for Always Free.
- Image and shape:
- Click Edit next to "Image and shape".
- Image: Canonical Ubuntu 22.04 (or newer LTS, aarch64). The ARM image, not the x86 one — they have different filenames.
- Shape: Ampere → VM.Standard.A1.Flex.
- Allocate 2 OCPUs and 12 GB RAM. (You can change later; max free is 4 OCPUs / 24 GB across all your A1 VMs.)
- Boot volume: leave at the default 47 GB or bump to 100 GB (counts against the 200 GB free block-storage budget).
- Networking: pick the auto-created VCN. Make sure Assign a public IPv4 address is checked. Note the public IP — you'll need it for SSH.
- SSH keys: either upload your public key
(
~/.ssh/id_ed25519.pub) or have OCI generate a key pair (download the private key,chmod 600it). - Click Create. Provisioning takes 1–3 minutes. The instance
state will go from
PROVISIONING→RUNNING.
SA-SAOPAULO-1AD-1
vs SA-SAOPAULO-1AD-2). If you can't get A1 in São Paulo
specifically, the closest substitute is a single VM.Standard.E2.1.Micro
(1 GB RAM, AMD) — enough to run n8n but not a headless browser.
3. Open the firewall (security list)
By default, the VCN's default security list only allows SSH (port 22) inbound. You need to open 80, 443, 5678 (n8n), and 9090 (the browser worker).
- Compute → Instances → click your instance → Subnet link on the right.
- On the subnet page → Security Lists → click the default security list.
- Add Ingress Rules:
Source CIDR: 0.0.0.0/0 Protocol: TCP Destination Port Range: 22 (SSH) 80 (HTTP, for certbot / n8n behind nginx) 443 (HTTPS, n8n behind nginx) 5678 (n8n direct access, optional) 9090 (browser worker; restrict to localhost or your tailnet only) - For port 9090 specifically, lock it down to your own IP (or only the localhost-side n8n, in which case you don't need the ingress rule at all). The browser worker should not be exposed publicly.
On the instance itself, also enable the OS-level firewall:
sudo iptables -I INPUT 6 -m state --state NEW -p tcp --dport 80 -j ACCEPT
sudo iptables -I INPUT 6 -m state --state NEW -p tcp --dport 443 -j ACCEPT
sudo iptables -I INPUT 6 -m state --state NEW -p tcp --dport 5678 -j ACCEPT
sudo netfilter-persistent save
4. SSH in and install Docker + Docker Compose
ssh -i ~/.ssh/oci-key ubuntu@<public-ip>
sudo apt update && sudo apt upgrade -y
sudo apt install -y ca-certificates curl gnupg lsb-release ufw
# Docker
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) \
signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io \
docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version
5. The Camoufox browser worker (small Python service)
Save this as /opt/msc-scraper/worker.py:
"""
MSC tracker browser worker.
Uses Camoufox (anti-detect headless Firefox) to hit MSC's tracking API
without getting 403'd by Akamai.
"""
import json
import logging
import re
from typing import Any, Dict, List
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
import uvicorn
from camoufox.sync_api import Camoufox
app = FastAPI(title="MSC Browser Worker")
log = logging.getLogger("worker")
logging.basicConfig(level=logging.INFO)
class TrackRequest(BaseModel):
container: str = Field(..., min_length=4, max_length=20, pattern=r"^[A-Z0-9]+$")
def flatten(raw: Dict[str, Any]) -> Dict[str, Any]:
"""Flatten MSC's nested response into something n8n can use directly."""
bol = raw["Data"]["BillOfLadings"][0]
c = bol["ContainersInfo"][0]
gti = bol["GeneralTrackingInfo"]
events: List[Dict] = []
for e in sorted(c["Events"], key=lambda x: x.get("Order", 0), reverse=True):
events.append({
"date": e.get("Date", ""),
"location": e.get("Location", ""),
"description": e.get("Description", ""),
"vessel": e.get("Detail", ["", ""])[0] if e.get("Detail") else "",
"voyage": e.get("Detail", ["", ""])[1] if e.get("Detail") and len(e["Detail"]) > 1 else "",
})
return {
"container": c["ContainerNumber"],
"bill_of_lading": bol["BillOfLadingNumber"],
"shipped_from": gti["ShippedFrom"],
"shipped_to": gti["ShippedTo"],
"port_of_load": gti["PortOfLoad"],
"port_of_discharge": gti["PortOfDischarge"],
"transshipments": gti.get("Transshipments", []),
"pod_eta": c.get("PodEtaDate", ""),
"latest_move": c.get("LatestMove", ""),
"container_type": c.get("ContainerType", ""),
"delivered": c.get("Delivered", False),
"events": events,
"raw": raw,
}
@app.post("/track")
def track(req: TrackRequest):
log.info(f"tracking {req.container}")
captured: List[Dict] = []
with Camoufox(headless=True, humanize=True) as browser:
page = browser.new_page()
# install XHR/fetch interceptor inside the page
page.add_init_script("""
window.__cap = [];
const origOpen = XMLHttpRequest.prototype.open;
const origSend = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.open = function(m,u){ this.__m={m,u}; return origOpen.apply(this,arguments); };
XMLHttpRequest.prototype.send = function(b){
this.addEventListener('load',()=>{
if (this.__m && this.__m.u.includes('TrackingInfo')) {
window.__cap.push({status:this.status, body:this.responseText});
}
});
return origSend.apply(this,[b]);
};
""")
page.goto("https://www.msc.com/en/track-a-shipment", wait_until="domcontentloaded")
page.wait_for_timeout(2500)
# dismiss cookie banner
try:
page.get_by_role("button", name="Accept All").click(timeout=2000)
except Exception:
pass
page.fill('input[placeholder*="Container"]', req.container)
page.get_by_role("button").filter(has_text="").nth(2).click()
page.wait_for_timeout(4500)
captured = page.evaluate("window.__cap")
if not captured or captured[-1]["status"] != 200:
raise HTTPException(502, "MSC did not return a 200; the XHR interceptor saw nothing useful")
raw = json.loads(captured[-1]["body"])
if not raw.get("IsSuccess"):
raise HTTPException(404, f"MSC says: {raw}")
return flatten(raw)
@app.get("/health")
def health():
return {"ok": True}
if __name__ == "__main__":
uvicorn.run(app, host="127.0.0.1", port=9090)
Then:
sudo mkdir -p /opt/msc-scraper && cd /opt/msc-scraper
sudo chown ubuntu:ubuntu /opt/msc-scraper
python3 -m venv .venv
source .venv/bin/activate
pip install fastapi uvicorn pydantic camoufox
camoufox fetch # downloads the Firefox fork
python worker.py # test it works
# in another terminal:
curl -X POST http://127.0.0.1:9090/track \
-H 'content-type: application/json' \
-d '{"container":"MEDU9730548"}' | jq .
6. n8n with Postgres (Docker Compose)
Save this as /opt/msc-scraper/docker-compose.yml:
services:
postgres:
image: postgres:16
restart: always
environment:
POSTGRES_DB: n8n
POSTGRES_USER: n8n
POSTGRES_PASSWORD: change-me-strong
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U n8n -d n8n"]
interval: 5s
retries: 10
n8n:
image: n8nio/n8n:latest
restart: always
depends_on:
postgres:
condition: service_healthy
environment:
DB_TYPE: postgresdb
DB_POSTGRESDB_HOST: postgres
DB_POSTGRESDB_PORT: 5432
DB_POSTGRESDB_DATABASE: n8n
DB_POSTGRESDB_USER: n8n
DB_POSTGRESDB_PASSWORD: change-me-strong
N8N_HOST: n8n.yourdomain.com # or your public IP
WEBHOOK_URL: http://<public-ip>:5678/
N8N_PROTOCOL: http
GENERIC_TIMEZONE: America/Argentina/Buenos_Aires
ports:
- "127.0.0.1:5678:5678"
volumes:
- n8n_data:/home/node/.n8n
volumes:
pg_data:
n8n_data:
cd /opt/msc-scraper
docker compose up -d
docker compose ps
First-time setup: open http://<public-ip>:5678 in your
browser, create the owner account, and you're in.
7. Point n8n at the worker
Inside n8n, build a workflow like the one in the n8n section. The HTTP Request node should call:
POST http://127.0.0.1:9090/track
Content-Type: application/json
{
"container": "{{ $json.container_number }}"
}
Since both n8n and the worker run on the same VM, no extra firewall rules needed — they're on the same host. (The worker listens on 127.0.0.1, so the outside world can't see it at all.)
8. Make it survive reboots
The Docker stack has restart: always — fine. The Python
worker is the only thing not in a container. Wrap it in a
systemd unit:
sudo tee /etc/systemd/system/msc-worker.service > /dev/null << 'EOF'
[Unit]
Description=MSC browser worker
After=network.target
[Service]
User=ubuntu
WorkingDirectory=/opt/msc-scraper
Environment=PATH=/opt/msc-scraper/.venv/bin
ExecStart=/opt/msc-scraper/.venv/bin/python /opt/msc-scraper/worker.py
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now msc-worker
sudo systemctl status msc-worker
9. Verify against MSC
curl -X POST http://127.0.0.1:9090/track \
-H 'content-type: application/json' \
-d '{"container":"MEDU9730548"}' | jq '{
container: .container,
bol: .bill_of_lading,
from: .shipped_from,
to: .shipped_to,
pod_eta: .pod_eta,
latest_move: .latest_move,
delivered: .delivered,
events: .events | length
}'
Expected output: JSON object with container: "MEDU9730548",
pod_eta: "2026-07-04" (or later, depending on the day),
and events: 8.
10. Cost & limits cheat sheet
| Item | What it costs |
|---|---|
| ARM A1 instance (2 OCPU / 12 GB) | $0 forever (Always Free) |
| Block volume (100 GB) | $0 (within 200 GB Always Free) |
| Outbound data (up to 10 TB/mo) | $0 |
| Public IPv4 | $0 |
| Credit card hold at signup | USD $1, reversed in 3–5 days, no charge |
| $300 trial credit (optional) | Free, expires after 30 days or when used up |
| Your time | ~45 minutes, mostly waiting for VM provisioning |
11. What to do if something breaks
"Out of host capacity" when creating the A1 instance
The São Paulo A1 pool is the most contested free resource on the internet. Try again in 10–30 minutes, or pick a different AD (availability domain) in São Paulo. As a last resort, the E2.1.Micro shape (1 GB RAM, AMD) is also free, but too small for a headless browser.
Worker logs say "MSC did not return a 200"
MSC's UI changes periodically. Re-run
capture.py (from this guide's tests/ folder
if you have the full repo), grab a fresh screenshot, and update the
selectors in worker.py. The XHR endpoint
/api/feature/tools/TrackingInfo has been stable since at
least 2024, so the body parser is unlikely to need changes.
Instance was reclaimed after 7 days idle
Add a cron job that does work every few days:
echo '15 9 */3 * * curl -sX POST http://127.0.0.1:9090/track \
-H "content-type: application/json" \
-d "{\"container\":\"MEDU9730548\"}" > /dev/null' | \
sudo crontab -
Once every 3 days is plenty to keep CPU/network/memory above the 20% threshold.
You want HTTPS for n8n
Add a Caddy or nginx reverse proxy in front of n8n on port 443, and use Let's Encrypt (certbot) for the certificate. The Hetzner n8n guide (linked in the references) walks through this with Caddy in ~5 minutes.
12. References (verified 2026-06-05)
- Oracle — Free Tier landing
- Oracle — Always Free Resources (official docs)
- Oracle — Sign-up guide (official docs)
- Camoufox — homepage & stealth overview
- n8n self-hosting on OCI — step-by-step on dev.to