# QA log — InHospGlyWard-Kor

> 본 도구는 참고용·연구용입니다. 실 임상 의사결정용 아님.

생성일: 2026-05-27 (build idx=1, slug `in-hosp-gly-ward-kor`)

## 1. Python 구문 체크 (`python3 -c "import ast; ast.parse(open(f).read())"`)

| 파일 | 결과 |
|---|---|
| `main.py` | OK |
| `app.py` | OK |
| `modules/__init__.py` | OK |
| `modules/ingest.py` | OK |
| `modules/tir.py` | OK |
| `modules/regimen.py` | OK |
| `modules/episode.py` | OK |
| `modules/subgroup.py` | OK |
| `modules/report.py` | OK |
| `modules/subgroup.py` | OK |
| `modules/transition.py` | OK |

## 2. CLI smoke tests

| 명령 | 결과 |
|---|---|
| `python3 main.py --help` | OK — argparse 만으로 풀 help 출력 |
| `python3 main.py --gen-data --n 320 --seed 42` | OK — N환자=320 POCT=17611 orders=2906 episodes=45 |
| `python3 main.py --analyze --top 8` | OK — ward TIR ranking + regimen mix + episode + subgroup + transition + KM head |
| `python3 main.py --report --lang ko --out reports/inhospglyward_ko.docx` | OK — md (4.2 KB) + docx (38.7 KB) 생성 |
| `python3 main.py --report --lang en --out reports/inhospglyward_en.docx` | OK — md (3.9 KB) + docx (38.4 KB) 생성 |

## 3. 데이터 파일 로드 테스트 (`csv.DictReader`)

| CSV | rows | columns |
|---|---|---|
| `data/patients.csv` | 320 | 15 |
| `data/poct_bg.csv` | 17,611 | 4 |
| `data/insulin_orders.csv` | 2,906 | 7 |
| `data/episodes.csv` | 45 | 13 |

## 4. 의존성 정책

- `argparse`, `csv`, `hashlib`, `os`, `sys`, `datetime`, `collections`,
  `dataclasses`, `textwrap` — **표준 lib 만으로도** `main.py --help` /
  `--gen-data` / `--analyze` / `--report` 전체 흐름 가능
  (KM도 lifelines 없으면 fallback step-wise estimator 동작).
- 무거운 deps (streamlit, plotly, pandas, lifelines, scikit-learn,
  statsmodels, matplotlib, python-docx) 누락 시 friendly 메시지 후 exit 0
- 외부 네트워크 호출 0, 외부 유료 API 0, 전역 패키지 설치 0
- 모든 의존성은 `requirements.txt` 에 pinned

## 5. 의학적 안전성

- README, CLI banner, 리포트 markdown/docx 헤더/푸터 모두 "참고용·연구용 — Not for clinical decision" 디스클레이머 포함
- 모든 데이터는 synthetic surrogate (실제 PHI 없음)
- de-identification audit trail: `modules/ingest.py` 의 `IngestReport`
  - SHA-256 truncated surrogate patient ID (salt: "INHOSP-GLY-WARD-KOR")
  - 날짜는 study-day index (date-shift)
  - free-text 필드 제거, 코드화된 enum 만 보존

## 6. 의도 부합 점검

| 명세 | 구현 |
|---|---|
| 5개 핵심 기능 | TIR(`tir.py`) / regimen(`regimen.py`) / episode(`episode.py`) / subgroup(`subgroup.py`) / transition+report(`transition.py`+`report.py`) — 5개 모듈로 1:1 매핑 |
| Streamlit dashboard | `app.py` — 6 탭 (Ward TIR / Regimen mix / Episodes / Subgroups / Discharge & readmit / Report) |
| CLI `main.py` (a/b/c) | `--gen-data` / `--analyze` / `--report` / `--all` 4가지 |
| 데이터 폴더 (200~500명, 한국 ward, 다양한 DM type) | 320명, 8 ward (MICU/SICU/CCU/NICU/ER/GW-IM/GW-SG/GW-OG), 6 DM type (T1/T2/GDM/Steroid-DM/Stress-Hyper/No-DM) |
| modules 분석 로직 | TIR 계산 · basal-bolus 분류 · DKA/HHS episode 분석 · KM survival · KPI 리포트 — 모두 구현 |
| reports/ 샘플 1개 이상 | `reports/inhospglyward_ko.docx` + `.md` + `inhospglyward_en.docx` + `.md` 4개 |
| requirements.txt pinned | streamlit 1.36.0 / pandas 2.2.2 / plotly 5.22.0 / lifelines 0.28.0 / python-docx 1.1.2 등 모두 pinned |
| 의학적 디스클레이머 | 모든 출력물에 포함 |
| de-identification audit trail | `IngestReport.deid_method` + `notes` |

## 7. 알려진 한계 / 추후 개선

- 합성 데이터의 episode 수 (DKA=2, HHS=3) 가 적어 단일 ward로 stratify 시 분산이 크다.
  사이트 실제 EDW 데이터 (수백 입원/월) 에서 안정.
- `_count_persistent_hyper` 는 hour-jump 가정에 기반한 휴리스틱.
  실제 POCT timestamp 분 단위가 들어오면 정확도 향상 필요.
- statsmodels mixed-effects (ward random effect) 분석은 모듈에 hook 만 준비.
  본격 추론은 후속 빌드.

## 8. 재현 방법

```bash
cd projects/2026-05-27-1-in-hosp-gly-ward-kor
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python3 main.py --all              # gen -> analyze -> report
streamlit run app.py               # 대시보드
```

**검증 결과: PASS (모든 syntax / CLI / CSV / report 정상)**