# WtLossSurrogate-Kor (웨이트로스서로게이트코어)

> ⚠️ **연구용·참고용 (research/reference use only) — not for clinical decision-making.**
> All bundled effect sizes are **illustrative / synthetic** values loosely inspired by the
> public literature. They are **NOT** official trial readouts and must never be cited as real
> data or used for any patient-level or regulatory decision.

**Domain**: Obesity (비만대사질환)
**Category**: 연구 아이디어 생성 (research-hypothesis generation)

A standalone, **offline** tool that ingests trial-level `(% weight-loss, Δ hard-outcome)`
effect-size pairs from anti-obesity RCTs and computes, by drug class, the **trial-level
surrogacy** of % body-weight change for hard clinical outcomes. It separates the
weight-mediated from the weight-independent component of benefit, auto-flags
under-validated surrogate–outcome–class pairs, and generates validation-study hypotheses
with suggested sample sizes.

---

## The 5 core features

1. **By-class trial-level surrogacy** — inverse-variance weighted meta-regression of the
   hard-outcome treatment effect (log-HR) on the surrogate effect (% weight loss) within a
   drug class. Reports **R²_trial** + 95% CI (Fisher-z) and a strength **grade**
   (strong ≥0.70 / moderate 0.50–0.70 / weak <0.50 / invalid).
2. **Surrogate Threshold Effect (STE)** — the smallest % weight-loss at which the upper 95%
   prediction band of the hard-outcome effect crosses the null, i.e. the weight-loss
   magnitude needed before a hard benefit becomes statistically credible.
3. **Dose–response surrogacy** — bins trials by weight-loss magnitude and fits linear vs
   quadratic WLS to test whether "more weight loss = more hard benefit" holds linearly or
   **plateaus / curves**.
4. **PTE (proportion of treatment effect explained = weight-mediated fraction)** — a simple
   mediation framing, `PTE = 1 − β_adjusted / β_unadjusted`, quantifying what fraction of the
   hard benefit is mediated by weight loss vs a **weight-independent (direct) effect**
   (the SELECT debate). Implausible (<0 or >1) values are clamped and flagged.
5. **Gap mining → validation hypotheses** — scans the class × outcome grid, flags cells with
   too few trials, weak surrogacy, or a **surrogate paradox** (surrogate improves but hard
   outcome worsens → grade = invalid), and emits prioritized validation-study hypotheses with
   Schoenfeld-style per-arm sample sizes.

---

## Run commands

All commands assume you are inside the project directory:

```bash
cd "projects/2026-05-22-2-wtloss-surrogate-kor"

python3 main.py --help            # usage
python3 main.py                   # bare → useful default summary
python3 main.py --surrogacy       # R²_trial / STE / PTE / grade table (best single command)
python3 main.py --dose-response   # weight-loss bin vs hard benefit (non-linearity)
python3 main.py --paradox         # surrogate-paradox flags
python3 main.py --gaps            # gap map of under-validated / weak pairs
python3 main.py --hypotheses      # validation hypotheses + suggested sample sizes
python3 main.py --all             # everything
python3 main.py --data my.csv --top 10   # user CSV; limit rows/hypotheses
```

Optional Streamlit UI (not required; mirrors the CLI):

```bash
streamlit run app.py
```

**Dependencies**: Python 3.9+, `numpy`, `scipy`, `pandas` (and `matplotlib` + `streamlit`
for the optional UI). `statsmodels` is **not** required — the meta-regression is implemented
from scratch in numpy/scipy. No network access, no external/paid APIs.

---

## Input data schema (`data/demo_trials.csv`)

CSV with the columns below. Lines beginning with `#` are treated as comments (the demo file
carries its schema inline). One row = one trial's class-level effect-size pair.

| column                | type  | meaning |
|-----------------------|-------|---------|
| `trial`               | str   | short trial identifier |
| `drug_class`          | str   | `GLP1RA` \| `GIP_GLP` \| `TRIPLE` \| `AMYLIN_COMBO` \| `ORAL_GLP1` \| `MC4R` |
| `agent`               | str   | investigational agent (optional, informational) |
| `pct_weight_loss`     | float | placebo-adjusted % body-weight change (surrogate); **negative = loss** |
| `pct_weight_loss_se`  | float | SE of the % weight-loss estimate |
| `hard_outcome`        | str   | `MACE` \| `HF_HOSP` \| `INCIDENT_T2D` \| `MASH` \| `ALL_CAUSE_DEATH` \| `QOL_PHYS` |
| `loghr`               | float | log hazard/risk ratio for the hard outcome; **negative = benefit** |
| `loghr_se`            | float | SE of `loghr` |
| `n`                   | int   | illustrative trial size (informational) |
| `note`                | str   | free text |

---

## Methodology brief

Trial-level (Buyse–Molenberghs / Daniels–Hughes) surrogacy via **weighted least squares**:

- **R²_trial**: WLS of `loghr` on `pct_weight_loss` across trials in a class, weights =
  `1 / loghr_se²` (inverse-variance). Weighted R² with 95% CI from a Fisher-z transform of
  `r = √R²`.
- **STE**: scan % weight-loss from 0 toward the most extreme observed loss; return the first
  point where the **upper 95% prediction band** of the predicted `loghr` falls below 0.
  The prediction band uses the meta-regression dispersion `φ = WRSS/dof` for line
  uncertainty plus a representative new-trial sampling variance.
- **Dose–response**: WLS linear slope vs a quadratic term (curvature + its t-test) and a
  bin-to-bin plateau heuristic.
- **PTE**: `1 − β_adjusted / β_unadjusted`, where `β_unadjusted` is the inverse-variance
  weighted mean hard-outcome effect and `β_adjusted` is the WLS intercept (residual direct
  effect at zero surrogate change). Clamped to [0, 1] with an explicit flag on implausible
  raw values.
- **Surrogate paradox**: weighted-mean `loghr > ~0` among trials with meaningful weight loss
  → grade forced to **invalid**.
- **Sample size**: Schoenfeld events `d = (z_{α/2}+z_β)² / (0.25·loghr²)`, then
  `n_per_arm ≈ (d / event_rate) / 2` (defaults: 6% event rate, 80% power, α=0.05).

**Limitations**: trial-level surrogacy alone (no individual-patient data → no individual-level
R²); small per-cell trial counts widen CIs substantially; PTE here is a coarse mediation
proxy, not a formal causal-mediation estimator; demo effect sizes are synthetic.

---

## Data sources (described, not fetched — this tool is offline)

To populate a real dataset, a researcher would manually curate trial-level effect sizes from:

- **ClinicalTrials.gov** — registered anti-obesity / CVOT trial designs and posted results
  (STEP, SUSTAIN, SELECT, SURMOUNT/SURMOUNT-MMO, SURPASS, SUMMIT, retatrutide, CagriSema,
  orforglipron, etc.).
- **Published RCTs and meta-analyses** — peer-reviewed primary readouts and pooled analyses
  reporting % weight change and hard-outcome HRs with confidence intervals.
- **FDA / EMA review documents** — regulatory statistical reviews and advisory-committee
  briefing books with effect-size tables.

The tool itself performs **no** network calls; data is provided by the user as a CSV.

---

## 검수 체크리스트 (QA checklist)

- [ ] `python3 -c "import ast; ast.parse(open('main.py').read())"` 통과 (and `app.py`).
- [ ] `python3 main.py --help` 정상 실행.
- [ ] `python3 main.py --surrogacy` → 모든 R²_trial ∈ [0, 1], grade/STE/PTE 출력 정상.
- [ ] `python3 main.py --dose-response` → bin별 mean_wl / mean_logHR, 선형/비선형 판정 출력.
- [ ] `python3 main.py --paradox` → MC4R MACE paradox 플래그 검출.
- [ ] `python3 main.py --hypotheses` → 검증 가설 + 표본수 산출.
- [ ] `data/demo_trials.csv` 정상 파싱 (27 trials, 6 classes, 4 outcomes).
- [ ] 잘못된 CSV → 명확한 오류 메시지 + non-zero exit.
- [ ] CLI 헤더와 README에 연구용·참고용 면책 문구 노출.

See `QA.md` for the full verification log.