EndoSysScore

EndoSysScore (ESS)

Synthetic data-augmented machine-learning model for 30-day mortality risk after cardiac surgery for infective endocarditis
External AUC 0.882
6/6 DeLong superior
30/30 seeds robust
Open source · MIT
Open the calculator View on Zenodo GitHub

1Key results

External validation on a held-out single-centre cohort (n=597 patients, 93 in-hospital deaths) — the SYS Score / EndoSysScore (ESS) achieves statistically significant DeLong superiority over all six contemporary IE-specific surgical risk scores. Robustness confirmed across 30 random seeds and 1,000 bootstrap resamples.

External AUC
0.882
95% CI 0.847–0.917
DeLong superiority
6/6
All p<0.005 vs comparator scores
Multi-seed stability
30/30
100% replication at 6/6 DeLong
Bootstrap stability
100%
SYS beats all 6 across 1,000 reps

Head-to-head — External validation

ScoreAUC (95% CI)ΔAUC vs ESSDeLong p
EndoSysScore (ESS) 0.882 (0.847–0.917) reference
EuroSCORE II0.824 (0.783–0.866)+0.058p<0.001
EndoSCORE0.849 (0.810–0.887)+0.033p=0.001
RISK-E0.859 (0.817–0.897)+0.023p=0.002
AEPEI0.834 (0.787–0.873)+0.049p<0.001
APORTEI0.830 (0.784–0.874)+0.052p<0.001
STS-IE0.840 (0.796–0.880)+0.042p<0.001

2Interactive calculator

⚠ Research tool — not for clinical decision-making. ESS must not be the sole basis for clinical decisions. Always integrate with clinical judgement and locally validated tools. All computation runs client-side in your browser: no patient data leaves your device.
Loading model…

3How the EndoSysScore was built

Complete pipeline, fully reproducible from the open Zenodo deposit.

Step 1

Raw multi-centre cohort

GIROC registry — 24 Italian cardiac surgery centres, 2010–2023. n=5,403 patients undergoing surgery for native or prosthetic infective endocarditis.

Step 2 · TIMA-1

Clinical data cleaning

31 expert-defined clinical-consistency rules applied to harmonise the real cohort (e.g. dialysis implies elevated creatinine; pre-op intubation excludes ambulatory NYHA class).

Step 3 · CTGAN

Generative model training

Conditional Tabular GAN trained on the cleaned cohort. Three checkpoints generated (300, 600, 1000 epochs) and tested. 1000-epoch checkpoint selected based on downstream model performance.

Step 4 · TIMA-2

Plausibility constraint filtering

The same 31 clinical rules applied to synthetic output; non-plausible synthetic records removed. ~85% retention rate.

Step 5 · TIMA-3

Blind realism test with 10 cardiac surgeons

10 senior cardiac surgeons each classified 200 randomly mixed real/synthetic profiles. Pooled accuracy = 52%, p=0.45 vs chance, AUROC 0.54, κ=0.04. Synthetic patients statistically indistinguishable from real ones.

Step 6 · Model

Bagged XGBoost ensemble

10 gradient-boosted models (n_estimators=500, max_depth=3, lr=0.03, strong L1/L2 regularisation), trained on 88,000 events / 12,000 non-events oversample from the 1000-epoch synthetic pool. Blend recalibration (50% isotonic + 50% Platt) on real development data only.

Step 7 · Ensemble

Transparent 0.60 / 0.40 with RISK-E

Final ESS = 0.60 × SYS-XGB + 0.40 × RISK-E. The blend weighting was selected on development data by grid search maximising DeLong superiority; stability verified across 30 seeds.

Step 8 · Validation

Leave-one-centre-out external validation

Held-out Centre 12 (n=597, 93 deaths). External AUC = 0.882 (95% CI 0.847–0.917). 6/6 DeLong superiority over EuroSCORE II, EndoSCORE, RISK-E, AEPEI, APORTEI, STS-IE (all p<0.005). 100% multi-seed and bootstrap robustness.

4Code, data & manuscript

Everything required to reproduce ESS — code, trained model artefacts, de-identified predictions, multi-seed validation results, the ablation study, and the manuscript draft — is openly deposited and citable.

Zenodo deposit · DOI
10.5281/zenodo.20327975
Source code · GitHub
sandro2462/sys-score
Manuscript
European Heart Journal
Submitted 2026 · companion paper at EHJ-DH
License
MIT
Code + model freely re-usable