# QA Log — GlpHypoMine MVP

Date: 2026-04-25
Builder: autonomous build pass (no clarifying Qs)
Python: system `python3`

## Checks

| # | Check | Command | Result |
|---|-------|---------|--------|
| 1 | Syntax | `python3 -c "import ast; ast.parse(open('main.py').read())"` | OK (`SYNTAX_OK`) |
| 2 | CLI help | `python3 main.py --help` | OK — usage block printed |
| 3 | Rank | `python3 main.py rank --top 5` | OK — 5 ranked cards with disclaimer |
| 4 | Domain filter | `python3 main.py rank --domain cardiovascular` | OK — 1 card (semaglutide × cardiovascular, novelty=explored) |
| 5 | Stats | `python3 main.py stats` | OK — 8 drugs / 36 organs / 18 abstracts / 50 FAERS / 12 trials, 46 FAERS signals |
| 6 | Data files | json+csv load roundtrip | OK — all 5 files load |

## Cards observed (top-5 from `rank --top 5`)

1. tirzepatide × metabolic_glucose (score 0.677, unexplored, FAERS-only)
2. dulaglutide × renal (0.612, unexplored, PubMed+FAERS)
3. exenatide × pancreatic (0.603, unexplored)
4. liraglutide × pancreatic (0.569, unexplored)
5. liraglutide × biliary (0.568, unexplored, with mechanism candidate)

The cardiovascular domain card has `novelty=explored` because semaglutide × cardiovascular sits in `clinicaltrials_sample.json` (NCT05000001 / NCT05000008), which correctly applies the in-trial penalty — verifying feature 4 cross-reference logic.

## Disclaimer printed

Every CLI output and `README.md` top carry:
"WARNING: 본 도구는 연구·참고용입니다. 임상 의사결정에 사용 금지. Generated hypotheses require expert validation before grant submission or experimental design."

## Files created

- `README.md`
- `main.py`
- `QA.md` (this file)
- `data/pubmed_sample.json` (18 records)
- `data/faers_sample.csv` (50 rows)
- `data/clinicaltrials_sample.json` (12 trials)
- `data/glp1_drugs.json` (8 drugs)
- `data/organ_systems.json` (36 organ systems)

## Constraints respected

- No network calls. No external installs. Stdlib only (`argparse`, `csv`, `json`, `math`, `collections`, `os`, `sys`).
- No paid APIs. No global pip installs.
- Disclaimer in CLI output and README.
- Built only the 5 listed features.

## Known limitations (logged, not failures)

- Synthetic data implies low FAERS counts (n=1) drive the ROR top — this is expected for the MVP and would be vastly larger with real quarterly dumps.
- `extract_tuples()` is a passthrough mock — production swap-in is the documented integration point.
- `ORGAN_PRIOR` weights are author heuristics, not literature-derived priors.

## Result: PASS