Identification of Novel Gene variants in Patients with Alzheimer’s Disease by Whole Exome Sequencing

Alzheimer’s Disease (AD) affects millions of elderly people, many of the patients partially or completely lost the capability to maintain independent daily living [1-3]. Limited progress was made in the past decades in the designing intervention approaches that could effectively delay the progression of the disease. Novel leads are in urgent need. Next Generation DNA sequencing (NGS) technology has been widely used in the basic biomedical research and molecular diagnosis in clinical settings. Few NGS studies in AD were reported, and those reported focused on rare variants from a few genes, such as APP, PSEN1, PSEN2, SORL1, and TREM2 [4-6]. Recently results from some Whole Exome Sequencing (WES) studies with specimens from AD patients were reported, but a general variant landscape is still missing [7-9]. Accumulated variants of these genes only account for the genetics of a small fraction of AD patients. Genome-Wide Association Studies (GWAS) identifi ed several dozen AD associated genes, but most of the associations are weak [10-12]. We performed a WES study with specimens from AD patients, and we identifi ed several dozen novel gene variants. These novel variants could potentially be causative mutations for AD or variants in association with AD.


Introduction
Alzheimer's Disease (AD) affects millions of elderly people, many of the patients partially or completely lost the capability to maintain independent daily living [1][2][3]. Limited progress was made in the past decades in the designing intervention approaches that could effectively delay the progression of the disease. Novel leads are in urgent need. Next Generation DNA sequencing (NGS) technology has been widely used in the basic biomedical research and molecular diagnosis in clinical settings. Few NGS studies in AD were reported, and those reported focused on rare variants from a few genes, such as APP, PSEN1, PSEN2, SORL1, and TREM2 [4][5][6]. Recently results from some Whole Exome Sequencing (WES) studies with specimens from AD patients were reported, but a general variant landscape is still missing [7][8][9]. Accumulated variants of these genes only account for the genetics of a small fraction of AD patients. Genome-Wide Association Studies (GWAS) identifi ed several dozen AD associated genes, but most of the associations are weak [10][11][12]. We performed a WES study with specimens from AD patients, and we identifi ed several dozen novel gene variants. These novel variants could potentially be causative mutations for AD or variants in association with AD.

Materials and methods
All subjects enrolled in this study were outpatients or hospitalized patients in the Jiangsu Province Geriatric Hospital from 2015 to 2018. Specimens from 36 AD patients were analyzed in this WES study. The clinicopathological data were summarized in Table S1. Diagnosis of AD was based on NINCDS-ADRDA criteria [13,14]. No data from cerebrospinal fl uid analysis or PET imaging were used for diagnosis or analysis in this study. No post-mortem data were used. Most patients belong to late onset AD, and about 20% of patients were younger than 65 years old. Informed consent was obtained from all subjects. The study was approved by the Ethical Committee Blood specimens were drawn to tubes with EDTA, and stored at -80 C before DNA extraction. Genomic DNA extraction, library construction, human exome capturing, and NGS sequencing were as performed as described previously [15,16].
Initial data processing and variant calling were performed at Novogene Co. Ltd in Beijing. Human genome hg19 was used as reference sequences in BWA mapping. GATK and Samtools were used in variant calling. VCF fi les were annotated using ANNOVAR [15]. Variants meeting all of the following criteria were selected for further analysis: the variant in protein coding region; the variant not listed in dbSNP database (https://www.ncbi. nlm.nih.gov/snp/, dbSNP152) or listed but with minor allele frequencies less than 1%; no frequency information available from ExAc (http://exac.broadinstitute.org/gene) or CMDB; recurrent variants or multiple variants of the same gene; the variant projected to be damaging or possibly damaging to the function of the protein by at least two of the three programs, SIFT, Polyphen-2, Mutation Taster, or the variant being a stopgain mutation; reported to be AD associated or related to brain development and function.

Results
Sequencing with each specimen generated 3-7 Gb data.
Overall, the NGS sequencing data showed Q20 data over 97%, and Q30 data over 92%. Variants with sequencing quality below Q20 were not used. SNP typing results from MassARRAY and from the WES were compared. The result showed the WES method had a variant detection specifi city of 100% and sensitivity of 80.5%.
We identifi ed novel variants of 28 most relevant genes (Tables 1,2). These genes were not listed in ClinVar (https:// www.ncbi.nlm.nih.gov/clinvar) as AD pathogenic genes. The genes or products of the genes in Table 1 have been studied in AD as reported in the literature. Research results of the genes in Table 2 in AD were not found in the literature, but the gene products are related to brain development and function, thus their variants may still contribute to the AD development. We found 26 out of the 36 (72.2%) AD patients carried variants in genes listed in Tables 1,2, some of them carried more than 10 variants. Some of the gene variants were recurrent, some genes were frequently affected (occurrence was 5 or more among the 36 specimens). Most of the genes in Table 1, Table 2 were located in Chromosome 1, 4, 7, or 11. The details of gene variants, functional analysis of variants, and the clinicopathological data were summarized in Table S1. Variants of genes seemingly not directly related to AD or brain development and function were summarized in Table S2. It seems that some of the gene variants showed characteristic distribution patterns as much more common in male patients or patients with advanced disease stages (Table S3). Of course, the distribution patterns were preliminary considering the limited sample size.

Discussion
Amyloid cascade hypothesis has been the center of AD pathogenesis and basis for the development of therapeutics [1].
Infl ammation, especially the innate immunity, has also been recognized as a key process in the pathology of the disease [2].  Table 1 have   Table S1: Gene variants and clinicopathological data.