# 2026 Health Data Backfill Investigation

## Current DB state (2026)

`daily_summary` has rows for 2026-01-01 through 2026-05-19 (139 days), but step data exists only on 32 days.

Monthly step coverage:

| Month | Days | Days with steps | Step sum | Avg nonzero steps |
|---|---:|---:|---:|---:|
| 2026-01 | 31 | 0 | 0 | - |
| 2026-02 | 28 | 0 | 0 | - |
| 2026-03 | 31 | 0 | 0 | - |
| 2026-04 | 30 | 13 | 114706 | 8823 |
| 2026-05 | 19 | 19 | 247719 | 13037 |

Interpretation: Health Connect local store has real step data from 2026-04-18 onward. Historical rows before that were successfully queried but Health Connect returned zero steps.

## Health Connect conclusion

Health Connect has already been queried for the whole 2026 range via the Android app. Querying harder will not recover Jan-Mar if those records are not in the local Health Connect store. The app hit both API quota and TransactionTooLargeException during naive full-history approaches; the current yearly chunk + aggregateGroupByPeriod implementation is the correct Health Connect method.

## Best sources for complete 2026

### 1. Google Fit REST API — best candidate for Google-account historical steps

Docs:
- https://developers.google.com/fit/scenarios/read-daily-step-total
- https://developers.google.com/fit/rest/v1/reference/users/dataset/aggregate

Endpoint:

```http
POST https://www.googleapis.com/fitness/v1/users/me/dataset:aggregate
```

Request for daily steps matching Google Fit app style:

```json
{
  "aggregateBy": [{
    "dataSourceId": "derived:com.google.step_count.delta:com.google.android.gms:estimated_steps"
  }],
  "bucketByTime": {
    "period": {"type": "day", "value": 1, "timeZoneId": "America/Argentina/Buenos_Aires"}
  },
  "startTimeMillis": 1767222000000,
  "endTimeMillis": 1782961200000
}
```

Scopes required:

```txt
https://www.googleapis.com/auth/fitness.activity.read
```

Caveat: Google Fit APIs are deprecated/end-of-service in 2026 and new API signups have been blocked since 2024. If an existing OAuth client can request the scope and the Fitness API is enabled, this is still the cleanest route.

### 2. Fitbit Web API — best candidate if the data shown in Fitbit app is authoritative

Docs:
- https://dev.fitbit.com/build/reference/web-api/activity-timeseries/get-activity-timeseries-by-date-range

Endpoint:

```http
GET https://api.fitbit.com/1/user/-/activities/steps/date/2026-01-01/2026-05-19.json
```

Scope:

```txt
activity
```

Max range for steps: 1095 days, so all 2026 fits in one call.

### 3. Google Takeout — fallback if APIs/OAuth are blocked

Use Google Takeout export for Fit data, then parse JSON/CSV locally. This avoids Fit API registration problems but is manual: user must export/download archive.

## Implementation plan

1. Try Google Fit REST OAuth with `fitness.activity.read`.
2. If blocked by API availability/client verification, use Fitbit OAuth if the Fitbit account has the Jan-Mar step history.
3. If OAuth is annoying/blocked, ask user for Google Takeout Fit export and parse it locally.
4. Ingest imported rows into Health Bridge as `data_type='daily_aggregate'`, `source='google_fit_rest'` or `source='fitbit_api'`, with record keys like:
   - `daily_aggregate:chicho:google_fit_rest:2026-01-01`
   - `daily_aggregate:chicho:fitbit_api:2026-01-01`
5. Rebuild summaries idempotently and prefer cloud historical source over zero-valued Health Connect rows for the same date.
