# QA / Verification log — GlyceSurrogate-Kor

> ⚠️ RESEARCH / REFERENCE USE ONLY — NOT FOR CLINICAL DECISION-MAKING.
> Demo numbers are synthetic/illustrative, not official trial readouts.

**Build date:** 2026-05-22
**Environment:** macOS (Darwin 25.5.0), Python 3.9.6
**Installed packages used:** numpy 2.0.2, scipy 1.13.1, pandas 2.3.3, matplotlib 3.9.4, streamlit 1.50.0
**statsmodels:** NOT installed — meta-regression (WLS, prediction bands, STE) implemented from first principles in `surrogacy.py`.
**Constraints honored:** offline only, no network calls, no external/paid APIs, no global installs.

Overall status: ✅ **PASS** (all required checks green; no FAILED items).

---

## 1. Syntax check — `ast.parse`

```
$ python3 -c "import ast; ast.parse(open('projects/2026-05-22-1-glyce-surrogate-kor/main.py').read()); print('main.py OK')"
main.py OK
$ python3 -c "import ast; ast.parse(open('projects/2026-05-22-1-glyce-surrogate-kor/app.py').read()); print('app.py OK')"
app.py OK
# also surrogacy.py:
PASS: main.py, app.py, surrogacy.py all parse
```

## 2. CLI `--help`

```
$ python3 main.py --help        # exits 0
usage: main.py [-h] [--data PATH] [--surrogacy] [--paradox] [--gaps]
               [--hypotheses] [--all] [--top N] [--csv-out PATH]
... (full help renders; subcommands documented) ...
PASS: --help exit 0
```
Also confirmed `python3 projects/2026-05-22-1-glyce-surrogate-kor/main.py --help` runs from the
workspace root (CLI resolves the bundled demo by absolute path → runs from any directory).

## 3. Demo CSV parses + invariants

```
PASS: CSV parsed 47 rows, 16 trials, cols ok
PASS: all finite R2 in [0,1] (11 fitted cells)
PASS: all reported PTE in [0,1]
PASS: grades assigned -> ['indeterminate', 'invalid', 'strong', 'weak']
PASS: 60 hypotheses generated
PASS: 6 paradox flags (expect ACCORD/VADT present)
PASS: ACCORD paradox detected (clinical lesson reproduced)
```
- Required-column validation works: a CSV with wrong columns → `exit 2` + clear message.
- Missing file → `exit 2` + clear message.

## 4. Sample real output

### `python3 main.py --surrogacy` (excerpt)
```
[ TRIAL-LEVEL SURROGACY BY CLASS x SURROGATE x OUTCOME ]
------------------------------------------------------------------------------
class        surr  outcome            n     R2         R2_CI      STE    PTE grade
------------------------------------------------------------------------------
DPP4i        HbA1c MACE               3  0.995       [--,--]   -0.390   1.00 invalid      <-- PARADOX
GLP1RA       HbA1c all_cause_death    3  0.921       [--,--]       --   0.00 strong
SGLT2i       HbA1c HF_hosp            5  0.274   [0.00,0.92]       --   0.00 weak
SGLT2i       HbA1c MACE               5  0.117   [0.00,0.88]       --   0.00 weak
SU_Intensive HbA1c all_cause_death    3  0.566       [--,--]       --   1.00 invalid      <-- PARADOX
```
R²_trial in [0,1]; grades assigned; SGLT2i HbA1c→hard-outcome surrogacy is **weak**
(benefit independent of HbA1c magnitude — intended clinical lesson reproduced).

### `python3 main.py --paradox`
```
[ SURROGATE-PARADOX FLAGS ]
trial         class         surr     Δsurr outcome              HR
TECOS         DPP4i         HbA1c    -0.29 MACE               1.01
SAVOR         DPP4i         HbA1c    -0.30 HF_hosp            1.28
EXAMINE       DPP4i         HbA1c    -0.36 HF_hosp            1.19
ACCORD        SU_Intensive  HbA1c    -1.01 all_cause_death    1.22
VADT          SU_Intensive  HbA1c    -1.16 all_cause_death    1.07
ACCORD        SU_Intensive  FPG     -25.00 all_cause_death    1.22
  6 paradox flag(s).
```
ACCORD/VADT intensive-control → improved HbA1c but **worse** all-cause death (HR>1) → flagged. ✅

### `python3 main.py --gaps --top 6` (excerpt)
```
H6. [WEAK CELL] GLP1RA | TIR -> MACE  (grade=indeterminate, n=2)
    Hypothesis : Is TIR a valid trial-level surrogate for MACE in GLP1RA?
                 (current evidence: only 2 trial(s) (<3), grade=indeterminate, implausible/undetermined PTE)
    Suggested  : +6 trial(s); per-arm n≈16,914 (target |logHR|=0.15, assumed event rate=0.041, ~1396 events).
```
Hypotheses generated with sample-size suggestions; matches the spec's example phrasing. ✅

## 5. STE methodology spot-check

The STE direction guard was verified: STE is only returned when the fitted slope points
toward benefit. On a clean synthetic beneficial-slope cell (R²=0.998), STE solved to
≈ −1.38 (you need ~1.38% HbA1c reduction before the upper 95% prediction band drops below
the null). On paradox / wrong-direction cells, STE correctly returns `--` (None).

A scale fix was applied to the prediction band: the new-trial sampling variance uses
`s²/w_med` (median observed inverse-variance weight) rather than a unit weight, keeping the
band on the same scale as the inverse-variance-weighted fit.

## 6. Streamlit app (`app.py`)

- `ast.parse` OK.
- Imports cleanly without a Streamlit runtime; `scatter_fig` / `ste_curve_fig` produce
  matplotlib figures headlessly (`matplotlib.use("Agg")`).
- Booted headless: `streamlit run app.py --server.headless true` → served on localhost
  without error (only a harmless macOS LibreSSL/urllib3 warning).

## 7. Issues found & fixed during build

1. **STE never solvable on demo data** — small per-cell n (dof=1) made t-critical huge and
   prediction bands enormous. Root cause was also a units mismatch: the band added the
   *weighted* residual variance `s²` as if it were an unweighted new-observation variance.
   **Fix:** new-trial variance = `s²/w_med`; STE now solvable on adequately-powered cells.
2. **STE returned for non-beneficial slopes** (e.g. positive-slope/paradox cells).
   **Fix:** added a slope-direction guard (`sign·β1 < 0` required).
3. **Summary "strongest surrogacy" picked a paradox cell** with high R² but `invalid` grade.
   **Fix:** "Strongest (valid)" now excludes paradox cells.

No remaining failures. No `FAILED:` items.
