The identification of gene ontologies and candidate genes for digital dermatitis in beef cattle from a genome-wide association study

Bovine Digital Dermatitis (DD) is a worldwide infectious disease that causes severe lameness in cattle across all production systems [1-4]. The consequences of DD are decreased animal welfare and economic losses due to reduced milk production, decreased reproductive performance and premature culling [5,6]. DD has a multifactorial etiology with the primary causative infectious agent being Treponema spp. and the pathogenesis of DD is not completely understood [7,8]. DD occurs in different clinical manifestations and recurrence after therapy is high, resulting in problems for prevention and control [1,9]. Moderate to high heritability estimates for DD traits based on the M-stage system and varying immune responses supports the infl uence of host genetic factors [1014]. Genome wide association and gene expression studies have been performed to identify potential candidate genes with infl uence on DD, however causative mutations have yet to be defi ned based on the current literature [15-17]. Gene-set enrichment and pathway-based analyses have been described to investigate the polygenic background of complex traits, such as leucosis, bull fertility, and beef cattle meat quality [18-20]. DAVID 6.8 is a web based tool to discover enriched functionally related gene groups for large gene lists and for their interpretation in biological context [21]. Gene-set enrichment directs the focus from a single gene-oriented Abstract


Introduction
Bovine Digital Dermatitis (DD) is a worldwide infectious disease that causes severe lameness in cattle across all production systems [1][2][3][4]. The consequences of DD are decreased animal welfare and economic losses due to reduced milk production, decreased reproductive performance and premature culling [5,6]. DD has a multifactorial etiology with the primary causative infectious agent being Treponema spp. and the pathogenesis of DD is not completely understood [7,8]. DD occurs in different clinical manifestations and recurrence after therapy is high, resulting in problems for prevention and control [1,9]. Moderate to high heritability estimates for DD traits based on the M-stage system and varying immune responses supports the infl uence of host genetic factors [10][11][12][13][14]. Genome wide association and gene expression studies have been performed to identify potential candidate genes with infl uence on DD, however causative mutations have yet to be defi ned based on the current literature [15][16][17]. Gene-set enrichment and pathway-based analyses have been described to investigate the polygenic background of complex traits, such as leucosis, bull fertility, and beef cattle meat quality [18][19][20]. DAVID 6.8 is a web based tool to discover enriched functionally related gene groups for large gene lists and for their interpretation in biological context [21]. Gene-set

Material and methods
The experimental procedures for the trial were approved by the Institutional Animal Care and Use Committee (IACUC V01525) at the University of Wisconsin-Madison. While cattle were in the chute, their hind feet were evaluated for the presence of DD and scored based on DD lesion type using the M-stage DD classifi cation system described by Döpfer and Berry [23,24]. Based on this classifi cation system cattle with: normal skin appearance observed on the foot were classifi ed as M0; active ulcerative DD lesions ≥ 20 mm in diameter were classifi ed as M2 lesions and chronic lesions were classifi ed as M4. Further signs of chronicity were classifi ed as: none = smooth skin without thickening and no extra tissue beyond the surface of the normal level of the skin; hyperkeratotic = thickened callous skin; and proliferative = diffuse, fi lamentous or mass-like overly grown epidermal tissues. If both hind legs were affected, or more than one lesion was present during scoring, the more severe lesion was documented (M2 > M4 > M0 in severity, proliferative lesions were considered more severe than hyperkeratotic lesions). The previously described DD Check App was used for data documentation [25]. Collection of data was conducted by two trained investigators. All hind feet of cattle were evaluated during single scoring events, three in the chute and three during alley checks, for which where groups of 3 to 5 cattle were herded into an alleyway.

Phenotype and genotype data
The phenotype data set contained 9,653 observations of 2,197 cattle from farms A and B. The cattle that had at least two observations for DD-status with the majority of cattle (85.3%) having four and more observations were used for analysis. Table 1 shows an overview of characteristics and average prevalence of M-Stages and signs of chronicity per farm. Binary traits were defi ned for the distinction between affected (1) and unaffected (0) cattle, regarding the general DD-status (DD AFFECTED), acute disease events (DD ACUTE), visible signs of chronicity (DD CHRONICITY) and proliferation of the skin (DD PROLIFERATION). Table 2 shows an overview of trait defi nitions and overall DD prevalence.
DNA was extracted from approximately 30 hair follicles from each animal using the DNeasy blood and tissue kit (Qiagen, Valencia, CA) according to the manufacturer's instructions. Although hair follicle samples for genotyping were collected from 2,197 animals unfortunately only 307 samples yielded the required concentration of DNA for analysis. The genotype analysis was performed after the completion of the study

Statistical analysis and modeling
Computation of a genomic relationship matrix (G-matrix) as well as a principal component analysis (PCA) and a cluster analysis using a Bray Curtis distance matrix and UPGMA clustering were performed with the packages "rrBLUP" and "vegan" in R version 3.3.3 [28][29][30].
Model testing was performed for the fi xed effects (farm, coat color, treatment, and cluster) and covariates (3 principal components and initial body weight) as well as their interactions. The simplest model regarding fi xed effects with the lowest Akaike information criterion (AIC) was chosen: where p ijkl = P(Y ijkl = 1) is the probability of occurrence, Y ijkl is Square Estimates were calculated for initial body weight as well as Least Square Means (LSMeans) and odds ratios (OR) for farm, treatment, and coat color using a logit link function in the "GLIMMIX" Procedure in SAS version 9.4 [31].

Genome-wide association study, identifi cation of candidate genes and gene-set enrichment analysis
A GWAS was performed for each binary trait using the "easyGWAS" [32] package in R version 3.3.3 applying the following model: Where μ is the population mean, v is the vector of fi xed effects,  i is the effect of the ith SNP, u is the vector of the polygenic effects, e is the residual, W, X, and Z are the incidence matrices for v,  i , and u. SNPs with a p-value < 0.01 were assumed to be statistically signifi cant and a list of these SNPs was generated for every single trait. For enrichment analysis statistically signifi cant SNPs were assigned to genes within 20kb upstream and downstream of the gene's start and end position as well as within the coding region of the gene using the "biomaRt" Package in R version 3.3.3 [33,34]. The distance of 20kb was chosen to include proximal functional and regulatory regions in the analysis. If a SNP was assigned to more than one gene, all assigned genes remained for further analysis. The Bos taurus UMD3.1: Release 90 -August 2017 from Ensembl was used as reference genome [35,36]. For enrichment analysis a list of the genes identifi ed by statistically signifi cant SNPs was submitted to DAVID 6.8 for each trait separately [21,37]. The Bos taurus gene list of DAVID 6.8 was used as the reference genome.
To determine functional categories the Gene Ontology (GO) database was used [38,39]. The GO database offers biological descriptors (GO terms) for genes based on the characteristics of encoded proteins. The terms are assigned to three categories: biological process (BP), Molecular Function (MF) and Cellular Component (CC) [38,40]. A Fisher's exact test was performed to search for an overrepresentation of signifi cantly associated genes among all the genes in the given GO term. Finally only functional enriched terms with a Fisher's exact test <0.001 were considered statistically signifi cant and used for further interpretation. GO Plots were created in R for all binary traits using the "GO:" Database and the documented ancestors of the GO terms to show connections between the functional categories. Description of single GO terms was provided by the database AmiGO 2 version 2.4.26 [41]. For identifi cation of candidate genes the lists of statistically signifi cant genes per trait from enrichment analysis were used to perform an extensive literature search in the National Center for Biotechnology Information (NCBI) database for the following gene involvement terms: infl ammation, adhesion, skin, keratinocyte, wound healing, hyperkeratosis, proliferation, and immune response.

Statistical analysis
Estimates and confi dence intervals for the fi xed effects of farm, coat color and OTM treatment for the binary traits are presented in Table 3. Higher values were obtained for farm A as well as for Treatment CON for all traits except DD PROLIFERATION, with the higher values being statistically signifi cant for farm A and the traits DD AFFECTED and DD CHRONICITY. For coat color, estimates were found for black coated animals in DD CHRONICITY and DD PROLIFERATION. The Least Square Estimate for the regression on initial body weight for the probability of the binary traits is presented in Figure 1. As expected, all binary traits showed a signifi cant increase in probability as initial body weight increased, except for DD PROLIFERATION for which only a slight increase of probability was observed. These associations are indicated by the positive slope in each binary trait panel of Figure 1, while the shaded area depicts the 95% confi dence interval of the estimated slope. Signifi cant odds ratios (OR) were calculated for the infl uence of farm as well as a gain in 200 lbs./91 kg in initial body weight for the trait DD AFFECTED (Table 4). In addition, Table 4  As can be seen in Table 4, the farm had the greatest infl uence on the occurrence of DD in this study. Farm differences are understandable, because DD has a multifactorial etiology, highly infl uenced by management and hygiene [42]. Furthermore the results indicate that coat color was associated with the occurrence of DD, with a higher prevalence of chronic and proliferative lesions in black coated cattle (see coat color vs. DD PROLIFERATION in Table 4). If we consider coat color as a proxy for breed, this agrees with the infl uence of breed on DD, which has been previously described [42][43][44]. Several other studies described the importance of animal genetic factors and breed on DD as well [12,45]. In the current study it was not possible to determine the exact breed because no pedigree information was available.

PCA and cluster analysis
Principal Component (PC) analysis and cluster analysis resulted in three principal components and three clusters based on the genomic relationship matrix. The fi rst three principal components explained 26.5% of the genomic variation in the data set. Figure 2 illustrates a 3D plot for the site scores of the fi rst three principal components containing the three previously generated clusters. Clusters 1, 2, and 3 are shown as purple, yellow, and green clusters of different sized data points where the size of the data points simply represents the different clusters. The 3D plot indicates that the three clusters are mostly determined by PC1. Since the AIC of GWAS model was lowest when including the three principal components from Figure 2, those three principal components were chosen as predictors for the fi nal GWAS model, not the cluster numbers.

Genome-wide association study and gene mapping
Manhattan plots illustrate the genome wide association for the binary traits in Figure 3. The genome wide suggestive level was calculated to be 1.7×10 -6 corresponding to a Bonferroni-adjusted p-value of 0.2. Only one SNP (rs110651789, BTA22:5833750) showed a genome wide suggestive association to the trait DD PROLIFERATION (p-value < 1.7×10 -6 ). No candidate genes were identifi ed in a 20kb window surrounding this SNP. The low number of associated SNPs was expected due to the limited sample size. Therefore, for the gene set enrichment analysis the p-value of < 0.01 was used as signifi cance criterion to ensure the necessary required number of signifi cant SNPs for the ongoing analysis, as it was stated that gene set enrichment should include 100 to 2000 genes [37]. Table 5 shows the number of statistically signifi cant SNPs and the number of assigned signifi cant genes per binary trait based on the p-value (< 0.01). An SNP can be assigned to more than one gene therefore the number of signifi cant SNPs assigned to genes is higher than the number of (unique) signifi cant mapped genes. The 708 -849 unique signifi cant genes (depending on the trait) could be used for enrichment and pathway based analysis. Unfortunately, not all of these unique signifi cant genes could be mapped in DAVID 6.8 resulting in a reduced number of signifi cant mapped genes. For all binary traits 2,637 genes were identifi ed to be within 20kb of a signifi cant SNP. Due to the limited sample size we did not include a correction for multiple testing in this analysis and the results were analyzed based on the raw p-values to maximize the data available from this study. Although the high number of expected false positive SNPs from the GWAS without correction for multiple testing reduces the strength of the analysis, this is the fi rst described genomic study for DD in a beef cattle data set.
We consider our work to be an example for follow-up studies about the genetics of DD in beef cattle.

Candidate genes and enrichment analysis
A Fisher's exact test was performed to search for an overrepresentation of signifi cantly associated genes within each GO term. Only functional enriched terms with a Fisher's exact test <0.001 were considered signifi cant and used for   Citation: Kopke Table   6. The genes will be numbered 1 to 25 in the following text.
For each gene ontology category (BP, CC, and MF) we will list the statistically signifi cant enriched GO terms for the binary traits and describe associated genes from literature followed by briefl y discussing their general link to the pathogenesis of DD.

GO terms signifi cantly enriched for the category biological process (BP)
In

GO term group -localization and transport
The term localization (GO: 0051179) is signifi cantly  Table 7 lists the function of these fi ve genes.      (Table 7). Any dysfunction in these genes could lead to an impaired epidermal barrier and allow for signs of DD to occur and to persist [24].

GO terms signifi cantly enriched for the category cellular components (CC)
For the category CC, four GO terms showed signifi cant overrepresentation of genes associated with general DDstatus (DD AFFECTED). In addition 2, 4, and 1 GO terms were signifi cantly enriched for the traits DD ACUTE, DD CHRONICITY, and DD PROLIFERATION, respectively (Online Resource 2).

GO term group -plasma membrane
The term integral component of plasma membrane GO:  (Table 7).
Theses candidate genes are involve in integrin-mediated cell adhesion and cell migration and could affect DD prognosis and pathogenesis.

GO terms signifi cantly enriched for the category molecular function (MF)
With regards to the category MF, fi ve GO terms showed Transmembrane transport plays a pivotal role in the communication between cells. Cells must be able to communicate across their membrane barriers to exchange materials with the environment [49]. Calcium dynamics play a key role in keratinocyte differentiation and epidermal homeostasis [50]. Impairment of these genes could interrupt epidermal homeostasis allowing for the more rapid progression of DD. The following relevant genes were included in the term category molecular function and associated with transmembrane transport/channel, (21) RYR1 and (22) TRPV4 (Table 7).

GO term group -cellular enzyme and receptor activity
The term transmembrane-ephrin receptor activity GO: 0005005 was signifi cantly enriched for the trait DD PROLIFERATION and is related to protein kinase activity GO: 0004672 which was signifi cantly enriched for the trait DD AFFECTED and DD PROLIFERATION. The ontologies related to transferase activity (GO: 0016740 and GO: 0016772) and the following subtypes: phosphotransferase activity (GO: 0016773) kinase activity (GO: 0016301) were also signifi cantly enriched for the trait DD PROLIFERATION. The previously mentioned terms are in close hierarchical relationship to phosphoric ester hydrolase activity (GO: 0042578) and hydrolase activity (GO: 0016788) which were signifi cantly enriched for the traits DD AFFECTED and DD ACUTE. Furthermore, the term collagen receptor activity (GO: 0038064) that was signifi cantly enriched for the trait DD AFFECTED also showed close relationship to the previously mentioned terms.
Epidermal development, homeostasis, and repair are complex processes requiring regulation of cell proliferation, migration, and differentiation. Several of the signals essential for epidermal development and repair are tranduced by the epidermal growth factor receptor (EGFR) and integrins [51]. Two genes were represented in the cellular enzyme and receptor activity term of the category molecular function, (23) RAF1 and (24) ROCK1 (Table 7). These genes could infl uence the barrier function and integrity of the skin as well as proliferation of the skin in chronic DD lesions.

GO term group -enzyme regulator activity
Ontologies related to the terms GTPase regulator activity GO: 0030695, nucleoside-triphosphatase regulator activity GO: 0060589, and protein kinase activator activity GO: 0030295 which are subtypes of molecular function regulator GO: 0098772 were found signifi cantly enriched for the traits DD ACUTE, DD CHRONICITY, and DD PROLIFERATION. The RAS pathway is essential to epidermal homeostasis. Epidermal function requires the balance between keratinocyte proliferation, adhesion to ECM Citation: Kopke (Table 7).
The incidence of DD is a complex trait controlled by multiple genes. Therefore this approach of searching the signifi cantly enriched GO terms and identifying 25 functional genes highly involved in cellular and membrane function pertaining to adhesion, migration and proliferation which could contribute to DD traits provides a unique opportunity to better understand the genetic architecture of this complex disease. Further pathway analysis research is needed to explore the associated biochemical, molecular, and cellular pathways to link the predominant pathways with the 25 candidate genes. This could reveal factors that contribute simultaneously to the complex trait of DD. 12 Serine and arginine rich splicing factor 6 (SRSF6) Important for wound healing, the regulation of keratinocyte differentiation and proliferation as well as tissue homeostasis in skin. [66] 13 Platelet derived growth factor subunit A (PDGFA) Promotes myofi broblast differentiation. [67] 14 Epidermal growth factor receptor (EGFR) Regulates fundamental functions in mammalian cells including migration and proliferation. [68] 15 Integrin subunit alpha 1 (ITGA1) Important for initiating infl ammation as well as maintenance of chronic infl ammatory responses. [69] 16 Integrin subunit alpha 2 (ITGA2) Expressed during wound healing and in combination with ITGA1 it facilitates cellular attachment to collagen and migration. [70,71] 17 Integrin subunit beta 5 (ITGB5) Required for TGFβ-induced epithelial-mesenchymal transition, anchoring cells at the specifi c cell-matrix adhesions that mediate cell stretching and lead to the disruption of cell-cell contacts without downregulation of E-cadherin. [72] 18 Integrin subunit alpha 11 (ITGA11) Exclusively expressed on fi broblasts and involved with would repair. [73,74] 19 Integrin subunit alpha v (ITGAV) Plays an important role in regulating osteoclasts, tumor proliferation and angiogenesis. [75] 20 Prostaglandin E receptor 4 (PTGER4) Plays an important role in the initiation of antigen-specifi c immune response in the skin by stimulating Langerhans cell mobilization, migration and maturation. [76] 21 Ryanodine receptor (RYR1) Expressed in epidermal keratinocytes and is associated with keratinocyte differentiation and epidermal permeability barrier homeostasis. [77] 22 Transient receptor potential cation channel subfamily V member 4 (TRPV4) Involved in controlling homeostasis of the skin permeability barrier, acting as an osmotic pressure detector.

Conclusion
In this study a genomic analysis was performed for DD incidence in beef cattle. Digital dermatitis data were analyzed combining phenotype and genetic marker information into genomic models for the assessment of genetic parameters and for the detection of genomic regions associated with the disease. The GWAS results revealed DD incidence as a complex trait, where multiple genes control or regulate minor effects, with no specifi c genomic region highly contributing to the genetic variance. Although our sample size was limited, using gene set enrichment analysis (GSEA) signifi cantly enriched gene ontologies associated with DD traits were identifi ed. The signifi cantly enriched GO terms from the ontology categories, biological process and cellular components, and molecular function, were highly involved in cellular and membrane function, revealing genes pertaining to adhesion, migration and proliferation. The processes are considered to be crucial for the pathogenesis of DD. This study provides a foundation for future studies in understanding the complex pathogenesis and genetic background of DD in beef cattle.

Data availability
The data that support the fi ndings of this study are available from the corresponding author upon reasonable request.