# QA / Verification Log — WtLossSurrogate-Kor

> ⚠️ 연구용·참고용 (research/reference use only) — not for clinical decision-making.
> Demo effect sizes are illustrative/synthetic, NOT official trial readouts.

**Date**: 2026-05-22
**Env**: macOS (darwin 25.5.0) · Python 3.9.6 · numpy 2.0.2 · scipy 1.13.1 · pandas 2.3.3 ·
matplotlib 3.9.4 · streamlit 1.50.0 · **statsmodels NOT installed** (meta-regression
implemented from scratch in numpy/scipy, as required).

Overall status: **✅ PASS**

---

## V1 — Syntax (ast.parse) of all modules

```
$ python3 -c "import ast; ast.parse(open('main.py').read()); ast.parse(open('app.py').read()); ast.parse(open('surrogacy.py').read()); ast.parse(open('dataio.py').read())"
PASS: all 4 modules parse
```
Result: ✅ `main.py`, `app.py`, `surrogacy.py`, `dataio.py` all parse.

## V2 — `main.py --help`

```
$ python3 main.py --help     →  exit 0
```
Lists all subcommands (--surrogacy, --dose-response, --paradox, --gaps, --hypotheses,
--all, --data, --top) and shows the research/reference-only disclaimer in the epilog.
Result: ✅

## V3 — Demo CSV parses + numeric invariants

```
$ python3 -c "import dataio, surrogacy as sg ..."
rows 28 classes 6 outcomes 4
PASS: all R2 in [0,1], CI brackets R2, PTE in [0,1]
grades present: ['insufficient', 'invalid', 'strong']
paradox cells: [('MC4R', 'MACE')]
hypotheses mined: 21 | top reason: SURROGATE PARADOX: weight improves but hard outcome trends worse
```
Checks performed (all assert-passed):
- `data/demo_trials.csv` parses to 28 trials × 6 drug classes × 4 hard outcomes.
- Every computed `R²_trial ∈ [0, 1]`.
- Every `R²_trial` lies within its reported 95% CI.
- Every computed `PTE ∈ [0, 1]` (raw out-of-range values are clamped + flagged).
- Surrogate paradox correctly detected for the seeded MC4R→MACE cell.
- 21 validation hypotheses mined; highest priority is the paradox cell.

Result: ✅

## V4 — `--surrogacy` sample output

```
class         outcome       k  R2_trial  95%CI        STE     PTE                                grade   flag
GIP_GLP       MACE          4  0.99      [0.57,1.00]  —       1.00 (implausible_over1(clamped))  STRONG
GLP1RA        INCIDENT_T2D  4  0.98      [0.41,1.00]  -5.5%   0.88                               STRONG
GLP1RA        MACE          5  0.94      [0.36,1.00]  -13.4%  0.86                               STRONG
MC4R          MACE          2  —         —            —       — (insufficient_trials)            INVALID  PARADOX
...

SELECT-style spotlight (GLP1RA → MACE):
  R²_trial=0.94  grade=STRONG  STE=-13.4%  PTE(weight-mediated)=0.86
```
Sanity:
- GLP1RA→MACE: strong trial-level surrogacy, but ~13.4% weight loss needed (STE) before a
  credible MACE benefit — and PTE 0.86 implies a non-trivial weight-INDEPENDENT component,
  matching the SELECT narrative.
- GLP1RA→INCIDENT_T2D: STE only -5.5% (modest loss suffices) — well-established surrogacy.
- GIP_GLP→MACE PTE raw value >1 correctly clamped to 1.00 and flagged.
Result: ✅

## V5 — `--dose-response` sample output

```
[GLP1RA → INCIDENT_T2D]  verdict: non-linear (curvature detected)
  linear slope (Δlog-HR per +1% loss) = 0.0669   quadratic curvature = -0.00429   nonlinearity p = 0.025
  wl_bin     mean_wl  mean_logHR  k
  -7..0%     -5.6     -0.482      2
  -14..-7%   -12.4    -0.880      1
  ...
[GIP_GLP → MACE]  verdict: approximately linear (dose-response consistent)
```
Result: ✅ — binning, linear & quadratic fits, and non-linearity verdicts all render.

## V6 — `--paradox` sample output

```
class  outcome  k  R2_trial  grade
MC4R   MACE     2  —         INVALID
```
Result: ✅ — the seeded paradox cell (weight improves, log-HR trends adverse) is flagged
and graded INVALID.

## V7 — `--hypotheses` sample output

```
H1  [priority 1.10]  Is %weight-loss a valid trial-level surrogate for MACE in MC4R? (SURROGATE PARADOX ...)
    class=MC4R  outcome=MACE  k(existing)=2  R²_trial=—
    suggested validation: ~2 more trial(s); per-arm size ≈ 6,734/arm (Schoenfeld, 6% event rate, 80% power)
```
Result: ✅ — prioritized hypotheses with Schoenfeld-based per-arm sample sizes.

## V8 — Custom CSV via `--data` + error handling

```
$ python3 main.py --data /tmp/wtloss_custom.csv --surrogacy
trials loaded : 3  (1 class, 1 outcome) → surrogacy computed.    ✅

$ python3 main.py --data /tmp/wtloss_bad.csv     # missing columns
ERROR: missing required column(s): ['pct_weight_loss_se', 'hard_outcome', 'loghr', 'loghr_se'] ...
exit=2                                                            ✅
```
Result: ✅ — user CSV honored; malformed CSV yields a clear message + non-zero exit.

## V9 — Streamlit `app.py` runtime (headless AppTest)

```
$ python3 -c "from streamlit.testing.v1 import AppTest; at=AppTest.from_file('app.py'); at.run(); print(at.exception)"
exception: ElementList()        # (empty = no exception)
title: WtLossSurrogate-Kor (웨이트로스서로게이트코어)
tabs: 6 | dataframes: 2 | metrics: [('Trials','27'), ('Drug classes','6'), ('Hard outcomes','4')]
```
Result: ✅ — app runs end-to-end with no exception; 6 tabs (surrogacy table, R²_trial
scatter/STE, dose–response, PTE bar, paradox, hypotheses) render. (The "missing
ScriptRunContext" notice is the expected bare-mode warning.) Deprecated
`use_container_width` replaced with `width="stretch"`.

---

## Intent-conformance check (vs the build spec)

| Required | Status |
|----------|--------|
| `main.py` CLI with --surrogacy/--dose-response/--paradox/--gaps/--hypotheses/--data/--top | ✅ |
| Bare `python3 main.py` prints a useful summary | ✅ |
| R²_trial + 95% CI via inverse-variance WLS | ✅ |
| STE (prediction-band null crossing) | ✅ |
| Dose–response surrogacy (linear vs quadratic / plateau) | ✅ |
| PTE = weight-mediated fraction, clamped + flagged | ✅ |
| Surrogate-paradox flag → invalid grade | ✅ |
| Strength grade strong/moderate/weak/invalid (configurable thresholds) | ✅ |
| Gap mining → validation hypotheses + sample sizes | ✅ |
| `app.py` Streamlit UI (imports cleanly, mirrors CLI) | ✅ |
| `data/` curated demo CSV with documented schema, marked synthetic | ✅ |
| `README.md` (purpose/domain/category/features/run/methodology/sources/disclaimer/checklist) | ✅ |
| Pure stdlib+numpy+scipy+pandas (statsmodels NOT used) | ✅ |
| Offline, no network/paid APIs | ✅ |
| Research/reference-only disclaimer in README + CLI/app headers | ✅ |

No deviations from the spec. No additional out-of-scope features added.