Abstract:
Artificial intelligence (AI), particularly deep neural networks (DNNs), has driven major advances across various sectors, including education, transportation, finance, entertainment, and communication. Yet in high-stakes domains such as healthcare, adoption remains limited due to their black-box nature, which obscures decision-making processes and hinders human interpretation. This lack of transparency undermines trust, restricts clinical use, and prevents clinicians from validating predictions.
To mitigate this issue, post-hoc attribution methods attempt to generate visual explanations that approximate model reasoning. However, these approaches are often unreliable in medical imaging, failing to reflect the model's true decision process faithfully, and are vulnerable to spurious correlations.
While inherently interpretable or self-explainable models embed explanations directly into their architecture, they often trade off accuracy, offer limited transparency, and lack generalizability or quantitative evaluation. Thus, transforming black-box models into self-explainable systems without sacrificing classification predictive performance remains a key challenge.
This thesis addresses these challenges through three main contributions. First, we introduce Sparse BagNet, a self-explainable DNN built upon BagNet---that already provides patchwise local explanations---and further enhances transparency by removing the average pooling layer and replacing the classification layer with a convolutional layer. This modification produces class evidence maps that preserve spatial information, while a lasso penalty enforces sparse explanations. Evaluated in a retrospective clinical study, Sparse BagNet's explanations improved ophthalmologist' diagnostic accuracy by 17% while reducing their decision time by approximately 25%.
Second, to extend local explanation toward global interpretability, we developed ProtoBagNet, which combines BagNet’s small receptive fields with prototype learning. ProtoBagNet provides both local explanations through prototype similarity maps and global explanations via learned prototypes. By incorporating a dissimilarity loss, it encourages diverse and non-redundant prototypes, overcoming limitations of prior prototype-based models and producing more precise, faithful explanations that better capture the model's underlying reasoning.
Finally, we generalized the Sparse BagNet into SoftCAM, a protocol for converting standard convolutional neural networks (CNNs) into self-explainable models. Like Sparse BagNet, SoftCAM systematically replaces the average pooling and fully connected layers with a convolutional classifier, but extends the sparsity regularization from Lasso to ElasticNet, allowing explanations to adapt to dataset-specific characteristics.
Evaluated on several medical imaging datasets against established post-hoc attribution methods, SoftCAM consistently produced more precise and faithful explanations while maintaining performance comparable to black-box baselines.
Building on this framework, we further designed a fully convolutional hybrid CNN-Transformer architecture for retinal disease detection, combining the locality of convolution with the long-range dependency modeling of transformers while preserving inherent interpretability.
Together, these contributions advance the development of transparent, trustworthy, and clinically useful AI systems, while establishing rigorous standards for evaluating model explainability, with principles that can be extended beyond medical imaging to other high-stakes vision tasks.