# NITSurrogate-Kor (엔아이티서로게이트코어)

> ⚠️ **연구용·참고용 (research / reference use only) — NOT for clinical
> decision-making.** All bundled effect sizes are **ILLUSTRATIVE / SYNTHETIC**
> curated values, **not** official trial readouts. Do not cite the numbers.

**Domain:** MASLD / MASH (대사성간질환)
**Category:** 연구 아이디어 생성 (research-hypothesis generation)
**Mode:** 100% offline. stdlib + numpy + scipy + pandas (CLI); + matplotlib +
streamlit (optional UI). No network, no external/paid APIs, no EMR.

---

## What it does (one line)

Ingests trial-level effect-size pairs along the **NIT → histology → hard
hepatic outcome** chain from MASH RCTs, computes **stage-by-stage trial-level
surrogacy** (R²_trial, surrogate threshold effect [STE], PTE mediation) via
inverse-variance weighted meta-regression, identifies which chain stage is
unvalidated (especially the sparse histology→hard stage), and generates
validation-study hypotheses with required outcome-trial size / follow-up.

## Why this matters (background the tool encodes)

The **3-layer surrogacy chain** in MASH drug development:

| Layer | Role | Members in this tool |
|---|---|---|
| **NIT** | surrogate | FIB-4, VCTE liver stiffness (LSM), MRI-PDFF, ELF, MRE |
| **Histology** | intermediate | MASH resolution, fibrosis improvement ≥1 stage, ΔNAS |
| **Hard hepatic outcome** | endpoint | cirrhosis progression / decompensation, variceal bleeding, HCC, transplant, liver-related death, all-cause death |

**resmetirom (Rezdiffra)** received 2024 FDA accelerated approval on a
**histologic surrogate** (MASH resolution / fibrosis improvement). Two open
questions drive this tool:

1. Is the **histologic surrogate valid for hard outcomes** (the very premise of
   accelerated approval)? → the **histology→hard** stage.
2. Can **NIT changes validly replace biopsy** (NIT→histology→hard)? → the
   **NIT→histology** and **NIT→hard** stages.

**LITMUS / NIMBLE** are pursuing NIT qualification, but the trial-level
surrogacy chain is **not yet fully validated**. Hard outcomes take years, so
only cirrhosis-stage outcome trials report them → **(Δhistology, Δhard) data is
sparse**. This tool makes **sparsity a first-class, flagged concept**.

---

## The 5 core features

1. **Stage-by-stage trial-level surrogacy** (`--chain`): R²_trial (+95% CI),
   STE, PTE mediation, and surrogacy grade for each of NIT→histology,
   histology→hard, NIT→hard.
2. **Per-NIT comparison & ranking** (`--compare-nit`): ranks FIB-4 vs VCTE vs
   MRI-PDFF vs ELF vs MRE by surrogacy strength.
3. **Surrogate-paradox detection** (`--paradox`): flags trials where the
   upstream layer improves but the downstream layer worsens → grade INVALID.
4. **Unvalidated-stage mining** (`--gaps`): flags stages with too few trials
   (especially histology→hard), wide CIs, weak surrogacy, or paradox.
5. **Validation-hypothesis generation** (`--hypotheses`): emits validation-study
   questions with required outcome-trial **events / sample size / follow-up**
   (Schoenfeld) and NIT-qualification sub-studies.

---

## Run commands

From inside `projects/2026-05-22-3-nit-surrogate-kor/`:

```bash
# Self-help
python3 main.py --help

# Bare invocation: useful 3-stage summary
python3 main.py

# Core analyses
python3 main.py --chain            # per-stage R²_trial / STE / PTE / grade
python3 main.py --compare-nit      # rank NITs by surrogacy strength
python3 main.py --paradox          # surrogate-paradox scan
python3 main.py --gaps             # mine unvalidated / data-sparse stages
python3 main.py --hypotheses       # validation hypotheses + required N/follow-up

# Combine, limit, tune
python3 main.py --gaps --hypotheses --top 5
python3 main.py --chain --alpha 0.10 --grade-strong 0.8 --grade-moderate 0.55
python3 main.py --data path/to/your_trials.csv --chain

# Optional Streamlit UI (mirrors the CLI; not required)
streamlit run app.py
```

**Best single demo command:** `python3 main.py --chain`

---

## Methodology brief (Buyse–Molenberghs / Daniels–Hughes)

Trial-level surrogacy is estimated **per chain stage** via **inverse-variance
weighted least squares**, implemented in pure numpy/scipy (statsmodels is *not*
a dependency):

- **R²_trial** — weighted regression of the downstream treatment effect on the
  upstream treatment effect across trials; weights = 1/SE²(downstream). 95% CI
  via Fisher z-transform. For the **pooled** NIT stages, each NIT is
  z-standardized **within metric** before pooling so incommensurable scales
  (MRI-PDFF % vs FIB-4 points vs LSM kPa) do not distort the relationship;
  single-NIT runs stay on native units so STE is interpretable.
- **STE (surrogate threshold effect)** — the minimum upstream benefit at which
  the **lower 95% prediction band** of the downstream effect crosses the null
  (i.e. still predicts a real benefit). Smaller STE = more useful surrogate; an
  unreachable band → surrogate weak/unusable.
- **PTE mediation** — fraction of the NIT→hard effect mediated through
  histology, via a difference-of-coefficients decomposition
  (PTE = (b_total − b_direct)/b_total). Raw and clamped values are both shown;
  out-of-range PTE is flagged as unstable (expected with sparse hard-outcome
  data).
- **Surrogate paradox** — any trial with upstream benefit but downstream harm →
  stage grade forced to **INVALID**.
- **Surrogacy grade** — strong (R²≥0.70) / moderate (0.50–0.70) / weak (<0.50)
  / invalid (paradox); thresholds configurable via `--grade-strong` /
  `--grade-moderate`.
- **Trial sizing** — required events via **Schoenfeld** (HR target, α, power);
  sample size from a baseline cumulative event rate; default follow-up 4 yr.

---

## Data

`data/masld_surrogacy_demo.csv` — curated **synthetic/illustrative** demo set
(comment header starts with `#`). One row = one (trial × NIT-metric) contrast.

**Schema:** `trial, drug, drug_class, nit_metric, delta_nit, delta_nit_se,
histo_metric, histo_effect, histo_se, hard_outcome, loghr, loghr_se,
stage_status`.

- Sign conventions are normalized internally so "more upstream improvement" →
  "more downstream benefit" before regression.
- The dataset deliberately reflects **histology→hard sparsity**: most trials are
  histology-endpoint trials with empty `loghr`; only the cirrhosis-stage
  outcome programs carry hard-outcome data. One row (STELLAR-4 / simtuzumab) is
  a planted **surrogate-paradox** stress test.
- Drugs/programs represented (illustrative): resmetirom (MAESTRO),
  pegozafermin, efruxifermin, survodutide, lanifibranor, semaglutide-MASH,
  tirzepatide-MASH, plus obeticholic and simtuzumab as contrast cases.

### Bring your own data
Pass any CSV with the same schema via `--data`. Comment lines (`#`) and blank
lines are ignored; empty `loghr`/`loghr_se` cells mark "no hard outcome
reported".

### Data sources (described, not fetched — offline tool)
This tool performs **no network access**. The curated values were hand-built to
mirror, at a structural level, evidence described in:
- **ClinicalTrials.gov** registrations for the named MASH programs.
- Published **meta-analyses** of NIT ↔ histology ↔ outcome associations.
- **FDA / EMA** resmetirom (Rezdiffra) accelerated-approval review materials.
- **LITMUS** (EU) and **NIMBLE** (US) NIT biomarker-qualification consortia.
- Clinical-practice context: **AASLD MASLD 2023** practice guidance and the
  **EASL–EASD–EASO MASLD 2024** clinical practice guidelines.

None of the above were queried at runtime; they are cited only to document the
provenance of the modeling assumptions.

---

## 검수 체크리스트 (verification checklist)

- [ ] `python3 -c "import ast; ast.parse(open('main.py').read())"` → OK
- [ ] `python3 -c "import ast; ast.parse(open('app.py').read())"` → OK
- [ ] `python3 main.py --help` runs and lists all flags
- [ ] `python3 main.py` (bare) prints a 3-stage summary
- [ ] `--chain` prints R²_trial ∈ [0,1] + 95% CI + STE + PTE + grade per stage
- [ ] `--compare-nit` ranks the 5 NITs with grades
- [ ] `--paradox` detects the planted STELLAR-4 / simtuzumab paradox
- [ ] `--gaps` flags the histology→hard structural sparsity per drug class
- [ ] `--hypotheses` emits validation questions with required events / N / follow-up
- [ ] demo CSV parses with pandas (`comment='#'`)
- [ ] `app.py` imports cleanly and its plotting helpers run headlessly
- [ ] no RuntimeWarnings on the default runs
- [ ] research/reference-only disclaimer appears in README + CLI header/footer + app

See `QA.md` for the executed verification log.

---

## ⚠️ Disclaimer (반복 / repeated)

**연구용·참고용 (research / reference use only) — NOT for clinical
decision-making.** The bundled effect sizes are illustrative/synthetic and must
not be interpreted as real trial results or used to guide patient care,
regulatory submissions, or drug-development decisions.