fix(model-metadata): guard against models.dev underreports at step 5

The resolution order in get_model_context_length() trusts the models.dev
lookup at step 5 over the curated DEFAULT_CONTEXT_LENGTHS table at step 8.
When models.dev reports a stale/incorrectly-low value, the correct curated
default is never reached.

Concrete case: models.dev reports `minimax-m3-free` on the opencode
provider as 200K context, but `DEFAULT_CONTEXT_LENGTHS['minimax-m3']` is
1M. Without this guard, the agent's effective context window is 5x too
small and Hermes auto-compresses at 72% of 200K (~144K) when the model
actually accepts ~720K of input.

Complements upstream's PR #36726 which drops stale <=204,800 cache
entries for the M3 family at step 1 (catches the symptom — stale cache
from pre-catalog builds). This patch catches the root cause — any future
fresh-lookup underreport, not just M3.

Mirrors the existing Kimi guard in the OpenRouter path (step 6 below):
same pattern, generalised to any curated-vs-live drift. Adds the
`_curated_context_length` helper that mirrors the longest-key-first
substring match used by step 8 so the guard can compare apples to apples.

Tests:
- 3 helper tests (M3 family -> 1M, M2.5 -> 204,800, unknown -> None)
- 3 step-5 guard tests (underreport rejected, larger value accepted,
  equal value accepted)
- 1 end-to-end live-resolution test for the original bug

Did NOT add a generic step-1 cache invalidation: cached values are
persistent user data, and a `curated > cached` heuristic cannot reliably
distinguish a known underreport from a legitimate provider-specific cap
(Codex gpt-5.5 is 272K vs curated 1.05M; Nous qwen3.6-plus is cached at
1M vs curated 1,048,576). The fresh-lookup guard at step 5 covers the
in-the-wild underreport case without false-positive risk.