Inhaltszusammenfassung:
Plants are the major nutritional component of the human diet, provide us with shel-
ter, fuel, and enjoyment. Substantial yield loss is caused by plant diseases transmitted
by bacteria, fungi, and oomycete pathogens. Plants have an elaborate innate immune
system to fight threatening pathogens, relying to a great extend on highly variable re-
sistance (R) genes. R genes often encode intracellular nucleotide-binding leucine-rich
repeat receptors (NLRs) that directly or indirectly recognize pathogens by the presence
or the activity of effector proteins in the plants’ cells. NLRs contain variable N-terminal
domains, a central nucleotide-binding (NB) domain, and C-terminal leucine-rich repeats
(LRRs). The N-terminal domains can be used to distinguish between the evolutionary
conserved NLR classes TNL (with a toll/interleucin-1 receptor homology (TIR) domain),
CNL (with a coiled-coil (CC) domain), and RNL (with an RPW8 domain). The archi-
tectural diversity is increased by additional integrated domains (IDs) found in different
positions. Plant species have between a few dozen and several hundred NLRs. The
intraspecific R gene diversity is also high, and the still few known NLRs responsible
for long-term resistance are often accession-specific. Intraspecific NLR studies to date
suffer from several shortcomings: The pan-NLR’omes (the collection of all NLR genes
and alleles occurring in a species) can often not be comprehensively described because
too few accessions are analyzed, and NLR detection is essentially always guided by
reference genomes, which biases the detection of novel genes and alleles. In addition,
inappropriate or immature bioinformatics analysis pipelines may miss NLRs during the
assembly or annotation phase, or result in erroneous NLR annotations. Knowing the
pan-NLR’ome of a plant species is key to obtain novel resistant plants in the future. I
created an extensive and reliable database that defines the near-complete pan-NLR’ome
of the model plant Arabidopsis thaliana. Efforts were focused on a panel of 65 diverse
accessions and applied state-of-the-art targeted long read sequencing (SMRT RenSeq).
My analysis pipeline was designed to include optimized methods that could be applied to
any SMRT RenSeq project. In the first part of my thesis I set quality control standards
for the assembly of NLR-coding genomic fragments. I further introduce a novel and
thorough gene annotation pipeline, supported by careful manual curation. In the second
part, I present the manuscript reporting the saturated near-complete A. thaliana pan-
NLR’ome. The species-wide high NLR diversity is revealed on the domain architecture
level, and the usage of novel IDs is highlighted. The core NLR complement is defined
and presence-absence polymorphisms in non-core NLRs are described. Furthermore,
haplotype saturation is shown, selective forces are quantified, and evolutionary coupled
co-evolving NLRs are detected. The method optimization results show that final NLR
assembly quality is mainly influenced by the amount and the quality of input sequencing
data. The results further show that manual curation of automated NLR predictions are
crucial to prevent frequently occurring misannotations. The saturation of an NLR’ome
has not been shown in any plant species so far, thus this study provides an unprecedented
view on intraspecific NLR variation, the core NLR complement, and the evolutionary
trajectories of NLRs. IDs are more frequently used than known before, suggesting a
pivotal role of noncanonical NLRs in plant-pathogen interactions. This work sets new standards for the analysis of gene families at the species level. Future NLR’ome projects
applied to important crop species will profit from my results and the easy-to-adopt anal-
ysis pipeline. Ultimately, this will extend our knowledge of intraspecific NLR diversity
beyond few reference species or genomes, and will facilitate the detection of functional
NLRs, to be used in disease resistance breeding programs.