The Ensemble
Instead of picking one model, I combined both.
Using Bayesian-style weighted averaging, I treated each model’s validation accuracy as evidence strength:
P final = Wcnn . Pcnn + Wxgb . Pxgb
with
Wcnn = Acccnn / Acccnn + Accxgb
That gave the CNN slightly more influence (98 % vs 97 %).
The ensemble returned probabilities rather than discrete labels.
Performance rose to 98.06 % accuracy, high precision, and balanced recall. More important: the probabilities were well calibrated.
Lesson 4: probabilities communicate better than absolutes.
