Abstract:
Graphs are a powerful data representation to model complex interactions between entities. They are ubiquitous across various domains, such as social networks, molecular biology, and traffic modeling. Message-passing Graph Neural Networks (GNNs) have recently emerged as effective tools for graph learning, demonstrating state-of-the-art performance in applications including drug discovery, weather forecasting, and recommender systems. Nevertheless, they suffer from both theoretical and practical limitations. Theoretical analyses have revealed that the expressive power of GNN architectures is constrained by the WL-1 isomorphism test [99, 156], indicating an inability to distinguish certain graph structures. Their expressivity is also limited by issues such oversmoothing [121] and oversquazing [5, 31], which in practice, impose limitations on architecture complexity, for instance, the number of layers. Additionally, theoretical and empirical results—including those in this thesis and previous works—expose that the performance of GNNs depends on the task at hand and nature of the graph. Practical challenges further limit the efficacy of GNNs, including limited data availability, local optima in stochastic gradient descent, and a vast hyperparameter space. Consequently, GNNs often do not obtain successful results out of the box and require extensive customization for specific tasks. This explains the existence of dozens of architectures and training procedures tailored to individual settings. Consequently, practitioners must invest significant computational resources and time into cross-validation of GNN architectures and hyperparameters to achieve competitive performance. All of this highlights the important role of *explicitly incorporating inductive biases* into GNN algorithms to tailor solutions to specific tasks.
This thesis introduces novel designs of inductive biases in GNNs to address practical limitations such as computational complexity, out-of-distribution generalization, and interpretability. The proposed models are backed with theory and/or extensive empirical evaluation. Moreover, we provide open-source implementations.
First, we propose L-CAT to address the computational complexity associated with architecture cross-validation. We show for the first time in the GNN community that learning to interpolate reduces the need for cross-validation of the GNN architectures. The proposed approach introduces minimal overhead, with only two extra learnable parameters per layer, and exhibits robustness to network initialization and input noise. These aspects make L-CAT a preferable approach in practice.
Next, we introduce VACA—a novel model based graph variational autoencoders—to perform approximate causal inference with GNNs. We first show that off-the-shelf GNN architectures are not suitable for causal inference tasks. However, we show that by leveraging the causal graph to impose constraints on the GNN architecture, causal inference queries can be approximated. As a specific application, we illustrate how VACA can be used to assess counterfactual fairness and to train a counterfactually fair classifier.
Finally, we introduce CORES to address inherently interpretable graph classification. In particular, we aim to achieve interpretability through *sparsity* in the input graph. We argue that the less information in the input given to the model, the easier the understanding of the prediction becomes. To find minimal informative subgraphs, we design a reinforcement learning pipeline that integrates conformal predictions in the design of the reward function to improve quality of the rewards and thus ease the optimization. Then, CORES jointly optimizing for sparsity and performance in a bi-level fashion. We show empirically that CORES achieves competitive performance while using smaller input graphs compared to competing methods.
Collectively, the work presented in this thesis highlights the importance of explicit inductive biases in GNN to narrow the search space and guide optimization towards solutions that achieve competitive performance in target tasks. Our proposed GNN-based methodologies contribute to advancing the field across three domains crucial for real-world applications with potential socio-economic impact.