Variation graph applications

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/177445
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1774456
http://dx.doi.org/10.15496/publikation-118769
Dokumentart: Dissertation
Erscheinungsdatum: 2026-03-23
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Weigel, Detlef (Prof. Dr.)
Tag der mündl. Prüfung: 2025-10-28
DDC-Klassifikation: 004 - Informatik
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

Variation graphs provide a powerful solution to overcome the limitations of linear reference genomes, especially in representing the diversity and structural complexity within species. As genome sequencing becomes more accessible and datasets grow in both quality and scope, it is increasingly clear that traditional reference-based analyses fall short in capturing large-scale variation, population structure, and genomic complexity. However, the practical interpretation and use of genome graphs remains an open challenge. Both graph construction and downstream analysis require new tools that can operate at scale, preserve biological interpretability, and offer meaningful metrics to describe the underlying structure. In this thesis, I present a set of tools developed to address key challenges in variation graph analysis. The core contribution is gretl, a fast and flexible framework for computing graph- and path-based statistics. It enables systematic comparisons across parameter settings and graph construction methods, and has been used to analyze graphs built from multiple species, including a yeast dataset and the 1001 Genomes Arabidopsis pangenome. The framework reveals how parameters such as segment length and alignment thresholds strongly affect graph structure and interpretability. I also introduce gfa2bin, a graph-to-GWAS bridge that supports association testing directly from graph node coverage. This method demonstrates the potential of graph-based GWAS to detect both known and novel signals of trait associations. In addition, I develop a novel variation detection approach based on bifurcation events between paths, offering a complementary alternative to standard bubble detection algorithms. Together, these tools enable direct statistical exploration and biological analysis of genome graphs at both global and sample-specific levels. Applied to the Arabidopsis dataset, they reveal population structure, patterns of pangenome expansion, and the role of private and structural variation across diverse accessions. While challenges remain in variant extraction, graph augmentation, and performance scaling, this work demonstrates that genome graphs can be used not only to store variation, but also to interpret and analyze it in meaningful ways. The tools and methods presented here are a step toward more flexible, interpretable, and biologically aware graph-based genomics.

Das Dokument erscheint in: