# Risks & Failure Modes

## Risk Matrix

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Hevy changes notification format | Medium | High (bridge stops reading) | Multi-layer fallback; monitor Hevy updates |
| Hevy UI resource IDs change | High | Medium (accessibility breaks) | Text-based fallback; coordinate fallback |
| Hevy removes live notification | Low | Critical | Accessibility fallback; internal API |
| Fitbit companion killed by Android | Medium | Medium (watch loses live data) | fitbit-asap retry; watch local state |
| Bridge APK killed by Android | Medium | High (no state updates) | Foreground service; battery exemption |
| Internal API changes/blocked | High | Low (stretch feature) | Not relied on for MVP |
| Google blocks Accessibility use | Low | Medium (if published on Play) | Sideload; don't publish on Play |
| Network failure (phone offline) | Medium | Medium (no backend sync) | Watch local state; sync when reconnected |
| Tailscale/VPS unreachable | Low | High (full bridge down) | Health checks; local fallback |
| Hevy Wear OS sync interferes | Low | Low (unlikely conflict) | Test with both active |
| Race conditions: watch + phone both logging | Medium | Medium | Backend as source of truth; last-write-wins |

## Detailed Failure Modes

### F1: Hevy Notification Format Change

**Scenario**: Hevy updates their app and changes the notification layout. The parsed fields (`%antext`, extras) contain different data or different formatting.

**Detection**:
- Bridge APK logs parse failures
- Backend sees empty/null state fields
- Health check endpoint reports "state_quality: degraded"

**Recovery**:
1. **Automatic fallback**: If notification parse fails, try AccessibilityService
2. **Alert**: Backend detects degraded state, notifies Pipo (logs/Telegram)
3. **Fix**: Update parsing logic to match new format
4. **Prevention**: Version-pin Hevy APK? (not practical — need Play Store updates)

**Severity**: High. This is the primary data source. Mitigation is the multi-layer approach.

### F2: Bridge APK Killed by Android

**Scenario**: Android's battery optimization kills the bridge APK during a workout. No state updates reach the backend. Watch shows stale data.

**Detection**:
- Backend heartbeat: bridge POSTs every N seconds; if missing for 2× interval, declare bridge down
- Watch companion receives stale data and can show "Bridge offline" indicator
- Android can auto-restart foreground services on some devices

**Recovery**:
1. **Foreground Service**: Keep bridge alive with persistent notification
2. **Battery exemption**: Prompt user to exempt from battery optimization
3. **Auto-restart**: Android's `START_STICKY` for services
4. **Watch self-sufficiency**: Watch keeps local state; can continue logging without bridge
5. **Reconnection**: On bridge restart, re-read Hevy notification state, push to backend

**Severity**: High. Core bridge functionality depends on this process being alive.

### F3: Network Partition (Phone Offline)

**Scenario**: Phone loses internet (gym basement, data dead zone). Bridge can't reach Pipo backend. Watch can't get live state.

**Detection**:
- Bridge APK detects `ConnectivityManager` network loss
- Backend detects missing heartbeats
- Watch companion sees fetch failures

**Recovery**:
1. **Watch local mode**: Watch has workout plan cached; continues logging locally
2. **Bridge queues state**: Buffer state changes, flush when network returns
3. **Companion retry**: Exponential backoff on fetch to backend
4. **No data loss**: Watch events are queued and synced when connectivity returns
5. **Post-workout sync**: When network returns, flush all queued data, create Hevy workout

**Severity**: Medium. This is a known constraint; the MVP's watch-local-state design handles it.

### F4: Hevy Not Running / Workout Not Started

**Scenario**: Chicho forgets to start Hevy workout on phone before starting watch app. No Hevy notification exists.

**Detection**:
- Bridge APK polls `getActiveNotifications()` — no Hevy notification found
- Backend state shows "hevy_workout_active: false"

**Recovery**:
1. **Watch prompt**: Show "Start Hevy workout on phone" message
2. **Bridge can launch Hevy**: Via `am start` or intent (if we discover correct intent)
3. **Accessibility can start workout**: If we automate the Hevy UI flow
4. **Standalone mode**: Watch can function as standalone workout logger; create Hevy workout at end

**Severity**: Low. User action needed, clear feedback on watch.

### F5: Race Condition — Watch and Phone Both Log Sets

**Scenario**: Chicho logs a set on the watch AND in Hevy on the phone simultaneously. Duplicate set recorded.

**Detection**:
- Backend receives two set-complete events with similar timestamps
- Set count mismatch between watch state and Hevy state

**Recovery**:
1. **Design choice**: In the recommended architecture, the **watch is the primary input device**. Hevy on the phone is read-only during the workout.
2. **Backend deduplication**: If both sources produce events, merge by timestamp + exercise + set index; flag duplicates
3. **Last-write-wins**: Backend is source of truth; when workout is finished, backend creates the final Hevy workout

**Severity**: Medium. This is a design/UX problem, not a technical failure. Clear workflow addresses it.

### F6: Hevy Accessibility UI Changes

**Scenario**: Hevy redesigns their workout screen. Resource IDs change, text labels change, button positions move. Accessibility-based automation breaks.

**Detection**:
- Accessibility service fails to find expected nodes
- `findAccessibilityNodeInfosByViewId()` returns empty
- Actions time out

**Recovery**:
1. **Fallback strategy**: Try by text: `findAccessibilityNodeInfosByText("Finish Set")`
2. **Coordinate fallback**: Pre-recorded tap coordinates (fragile but works)
3. **Notification fallback**: If accessibility fails, rely on notification-only mode
4. **Alert and fix**: Log failures, update resource IDs

**Severity**: Medium-High for the Accessibility layer specifically. The notification layer is independent.

### F7: Hevy Internal API Token Expiry

**Scenario**: The internal auth token expires mid-workout. API calls fail with 401.

**Detection**:
- HTTP 401 response from internal API calls
- Bridge detects auth failure

**Recovery**:
1. **Token refresh**: Use `/refresh_token` endpoint if discovered
2. **Re-login**: Re-authenticate with stored credentials
3. **Fallback**: Drop to notification layer immediately
4. **Pre-fetch**: Refresh token before workout starts

**Severity**: Medium (only affects stretch-goal internal API approach).

### F8: Android OS Update Breaks Permissions

**Scenario**: Android 15/16 changes notification listener or accessibility behavior. Permissions are reset or APIs change.

**Detection**:
- `onListenerConnected()` not called
- Accessibility service not bound
- `SecurityException` on API calls

**Recovery**:
1. **Permission check on startup**: Verify permissions are still granted
2. **Guide user to re-enable**: Show settings intent
3. **Android version detection**: Adjust behavior per API level
4. **Test on target OS**: Pixel 9a runs Android 14/15; test before updates

**Severity**: Low-Medium. Infrequent, but can be disruptive.

## Risk Acceptance

For a **personal, single-user tool**:
- We can accept higher fragility than a consumer product
- Chicho is the developer AND the user — he can debug and fix issues
- Fast iteration is more important than bulletproof reliability
- The watch-local-state design ensures **no workout data is lost** even if the bridge fails completely

The bridge is a **convenience layer**, not a critical path for data integrity.
