Cite this asKopke G, Anklam K, Kulow M, Baker L, Swalve HH, et al. (2020) The identification of gene ontologies and candidate genes for digital dermatitis in beef cattle from a genome-wide association study. Int J Vet Sci Res 6(1): 027-037. DOI: 10.17352/ijvsr.000050
Bovine Digital Dermatitis (DD) is an infectious disease causing severe lameness in cattle. The aim of this study was to perform a Genome Wide Association Study (GWAS) and a Gene-Set Enrichment Analysis (GSEA) to identify candidate genes, instead of an individual Single Nucleotide Polymorphism (SNP), associated with DD traits in beef cattle. Beef cattle (n= 307) were genotyped with the Illumina GGP-HD bovine 150K SNP chip. The M-scores of the cattle over the observation period were used to define the DD traits with different complexities, the distinction between affected (1) and unaffected (0) cattle, regarding the general DD-status (DD AFFECTED), acute disease events (DD ACUTE), visible signs of chronicity (DD CHRONICITY) and proliferation of the skin (DD PROLIFERATION). The gene-set enrichment analysis revealed 30 Gene Ontology (GO) terms associated with the DD AFFECTED trait and 17, 31, and 16 GO terms were associated with DD ACUTE, DD CHRONICITY, and DD PROLIFERATION traits, respectively. By searching the significantly enriched GO terms from the ontology categories, biological process and cellular components, and molecular function, 25 functional genes were identified that were highly involved in cellular and membrane function pertaining to adhesion, migration and proliferation which could contribute to DD traits. These results could provide insight into the genetic framework of this complex trait and disease in beef cattle to aid the development of potential genetic therapies as well as selective breeding strategies to decrease DD prevalence in cattle.
Bovine Digital Dermatitis (DD) is a worldwide infectious disease that causes severe lameness in cattle across all production systems [1-4]. The consequences of DD are decreased animal welfare and economic losses due to reduced milk production, decreased reproductive performance and premature culling [5,6]. DD has a multifactorial etiology with the primary causative infectious agent being Treponema spp. and the pathogenesis of DD is not completely understood [7,8]. DD occurs in different clinical manifestations and recurrence after therapy is high, resulting in problems for prevention and control [1,9]. Moderate to high heritability estimates for DD traits based on the M-stage system and varying immune responses supports the influence of host genetic factors [10-14]. Genome wide association and gene expression studies have been performed to identify potential candidate genes with influence on DD, however causative mutations have yet to be defined based on the current literature [15-17]. Gene-set enrichment and pathway-based analyses have been described to investigate the polygenic background of complex traits, such as leucosis, bull fertility, and beef cattle meat quality [18-20]. DAVID 6.8 is a web based tool to discover enriched functionally related gene groups for large gene lists and for their interpretation in biological context . Gene-set enrichment directs the focus from a single gene-oriented view to a gene-group-based analysis. Findings of these analyses can contribute to an improved understanding about the genetic and biological architecture of complex traits, such as DD. In addition, these findings could be useful for future genetic prevention strategies against DD and an improved management of cows susceptible to DD would benefit from gene-set enrichment analysis. The objective of this study was to perform a Genome Wide Association Study (GWAS) for DD in beef cattle and to identify candidate genes using a gene-set enrichment based analysis.
The experimental procedures for the trial were approved by the Institutional Animal Care and Use Committee (IACUC V01525) at the University of Wisconsin-Madison.
Data were collected between June 2014 and November 2015 during randomized field trials to evaluate the effect of an Organic Trace Mineral (OTM) (Availa Plus; Zinpro Performance Minerals, Eden Prairie, MN) program on the prevalence of DD in two commercial feedlot farms (Midwest, US) . The producers had observed increasing numbers of outbreaks of DD over the past five years on these farms. The highest occurrence of DD outbreaks were observed during the summer months 60 days prior to slaughter. Cattle were housed in barns covered by monoslopes on concrete, grooved flooring and a bedded pack in the center. Cattle were randomly split over 19 pens, 7 control pens receiving a diet supplemented with trace minerals of inorganic sources and 12 OTM pens receiving the same diet supplemented with organic trace minerals (OTM). A total of 1,120 heifers (farm A) and 1,077 steers (farm B) were enrolled in the study. Cattle were mixed breeds, and were assigned into a “breed” category based on coat color and were sourced from multiple locations in North America. Pens of cattle selected to be in the study were processed for enrollment using a restraint chute (Silencer, Moly Manufacturing Inc., Lorraine, KS). Individual weights were recorded for all cattle enrolled and tail hair follicle samples were taken from 2,197 animals for genotyping.
While cattle were in the chute, their hind feet were evaluated for the presence of DD and scored based on DD lesion type using the M-stage DD classification system described by Döpfer and Berry [23,24]. Based on this classification system cattle with: normal skin appearance observed on the foot were classified as M0; active ulcerative DD lesions ≥ 20 mm in diameter were classified as M2 lesions and chronic lesions were classified as M4. Further signs of chronicity were classified as: none = smooth skin without thickening and no extra tissue beyond the surface of the normal level of the skin; hyperkeratotic = thickened callous skin; and proliferative = diffuse, filamentous or mass-like overly grown epidermal tissues. If both hind legs were affected, or more than one lesion was present during scoring, the more severe lesion was documented (M2 > M4 > M0 in severity, proliferative lesions were considered more severe than hyperkeratotic lesions). The previously described DD Check App was used for data documentation . Collection of data was conducted by two trained investigators. All hind feet of cattle were evaluated during single scoring events, three in the chute and three during alley checks, for which where groups of 3 to 5 cattle were herded into an alleyway.
The phenotype data set contained 9,653 observations of 2,197 cattle from farms A and B. The cattle that had at least two observations for DD-status with the majority of cattle (85.3%) having four and more observations were used for analysis. Table 1 shows an overview of characteristics and average prevalence of M-Stages and signs of chronicity per farm. Binary traits were defined for the distinction between affected (1) and unaffected (0) cattle, regarding the general DD-status (DD AFFECTED), acute disease events (DD ACUTE), visible signs of chronicity (DD CHRONICITY) and proliferation of the skin (DD PROLIFERATION). Table 2 shows an overview of trait definitions and overall DD prevalence.
DNA was extracted from approximately 30 hair follicles from each animal using the DNeasy blood and tissue kit (Qiagen, Valencia, CA) according to the manufacturer’s instructions. Although hair follicle samples for genotyping were collected from 2,197 animals unfortunately only 307 samples yielded the required concentration of DNA for analysis. The genotype analysis was performed after the completion of the study therefore we were unable to recollect the samples. Genotyping was performed on the DNA by GeneSeek (Neogen Genomics, Lincoln, NE) utilizing the GGP-HD Bovine Illumina 150K Chip a genome-wide genotyping array. The genotype data were edited using PLINK v. 1.07 . Single nucleotide polymorphisms (SNPs) with a minor allele frequency < 0.05, not in Hardy Weinberg equilibrium (p < 0.0000001) and with missing call rates exceeding 5% were removed. As the data contained both steers and heifers, the X chromosome was excluded from further analysis. A total of 117,479 SNP on Bos taurus autosome (BTA) 1 to 29 were available for final analysis. All the missing SNP information on these SNPs (0.27%) was imputed using FImpute in R .
Computation of a genomic relationship matrix (G-matrix) as well as a principal component analysis (PCA) and a cluster analysis using a Bray Curtis distance matrix and UPGMA clustering were performed with the packages “rrBLUP” and “vegan” in R version 3.3.3 [28-30].
Model testing was performed for the fixed effects (farm, coat color, treatment, and cluster) and covariates (3 principal components and initial body weight) as well as their interactions. The simplest model regarding fixed effects with the lowest Akaike information criterion (AIC) was chosen:
logit(pijkl) = μ + farmi + coatclj + treatmentk + β1(InitialWtl) + β2(PC1l) + β3(PC2l) + β4(PC3l) (1)
where pijkl = P(Yijkl = 1) is the probability of occurrence, Yijkl is the DD-trait (DD AFFECTED, DD ACUTE, DD CHRONICITY, DD PROLIFERATION), μ is the overall mean, farmi describes the fixed effect of farm i(i = A or B); coatclj is the fixed effect of coat color j (j = black or others), treatmentk is the fixed effect of treatment k (k = CON or OTM), β1(InitialWt1) is the regression coefficient for initial body weight of animal l and β2(PC1l), β3(PC2l), β4(PC3l) are the regression coefficients for the first three principal components from PCA of all SNPs. Least Square Estimates were calculated for initial body weight as well as Least Square Means (LSMeans) and odds ratios (OR) for farm, treatment, and coat color using a logit link function in the “GLIMMIX” Procedure in SAS version 9.4 .
A GWAS was performed for each binary trait using the “easyGWAS”  package in R version 3.3.3 applying the following model:
y = μ+Wv+Xβi+Zu+e,
Where μ is the population mean, v is the vector of fixed effects, βi is the effect of the ith SNP, u is the vector of the polygenic effects, e is the residual, W, X, and Z are the incidence matrices for v, βi, and u. SNPs with a p-value < 0.01 were assumed to be statistically significant and a list of these SNPs was generated for every single trait. For enrichment analysis statistically significant SNPs were assigned to genes within 20kb upstream and downstream of the gene’s start and end position as well as within the coding region of the gene using the “biomaRt” Package in R version 3.3.3 [33,34]. The distance of 20kb was chosen to include proximal functional and regulatory regions in the analysis. If a SNP was assigned to more than one gene, all assigned genes remained for further analysis. The Bos taurus UMD3.1: Release 90 - August 2017 from Ensembl was used as reference genome [35,36]. For enrichment analysis a list of the genes identified by statistically significant SNPs was submitted to DAVID 6.8 for each trait separately [21,37]. The Bos taurus gene list of DAVID 6.8 was used as the reference genome. To determine functional categories the Gene Ontology (GO) database was used [38,39]. The GO database offers biological descriptors (GO terms) for genes based on the characteristics of encoded proteins. The terms are assigned to three categories: biological process (BP), Molecular Function (MF) and Cellular Component (CC) [38,40]. A Fisher’s exact test was performed to search for an overrepresentation of significantly associated genes among all the genes in the given GO term. Finally only functional enriched terms with a Fisher’s exact test <0.001 were considered statistically significant and used for further interpretation. GO Plots were created in R for all binary traits using the “GO:” Database and the documented ancestors of the GO terms to show connections between the functional categories. Description of single GO terms was provided by the database AmiGO 2 version 2.4.26 . For identification of candidate genes the lists of statistically significant genes per trait from enrichment analysis were used to perform an extensive literature search in the National Center for Biotechnology Information (NCBI) database for the following gene involvement terms: inflammation, adhesion, skin, keratinocyte, wound healing, hyperkeratosis, proliferation, and immune response.
Estimates and confidence intervals for the fixed effects of farm, coat color and OTM treatment for the binary traits are presented in Table 3. Higher values were obtained for farm A as well as for Treatment CON for all traits except DD PROLIFERATION, with the higher values being statistically significant for farm A and the traits DD AFFECTED and DD CHRONICITY. For coat color, estimates were found for black coated animals in DD CHRONICITY and DD PROLIFERATION. The Least Square Estimate for the regression on initial body weight for the probability of the binary traits is presented in Figure 1. As expected, all binary traits showed a significant increase in probability as initial body weight increased, except for DD PROLIFERATION for which only a slight increase of probability was observed. These associations are indicated by the positive slope in each binary trait panel of Figure 1, while the shaded area depicts the 95% confidence interval of the estimated slope. Significant odds ratios (OR) were calculated for the influence of farm as well as a gain in 200 lbs./91 kg in initial body weight for the trait DD AFFECTED (Table 4). In addition, Table 4 shows the significantly different OR calculated for the influence of all fixed effects and initial body weight on DD CHRONICITY and coat color on DD PROLIFERATION.
As can be seen in Table 4, the farm had the greatest influence on the occurrence of DD in this study. Farm differences are understandable, because DD has a multifactorial etiology, highly influenced by management and hygiene . Furthermore the results indicate that coated color was associated with the occurrence of DD, with a higher prevalence of chronic and proliferative lesions in black coated cattle (see coat color vs. DD PROLIFERATION in Table 4). If we consider coat color as a proxy for breed, this agrees with the influence of breed on DD, which has been previously described [42-44]. Several other studies described the importance of animal genetic factors and breed on DD as well [12,45]. In the current study it was not possible to determine the exact breed because no pedigree information was available.
Principal Component (PC) analysis and cluster analysis resulted in three principal components and three clusters based on the genomic relationship matrix. The first three principal components explained 26.5% of the genomic variation in the data set. Figure 2 illustrates a 3D plot for the site scores of the first three principal components containing the three previously generated clusters. Clusters 1, 2, and 3 are shown as purple, yellow, and green clusters of different sized data points where the size of the data points simply represents the different clusters. The 3D plot indicates that the three clusters are mostly determined by PC1. Since the AIC of GWAS model was lowest when including the three principal components from Figure 2, those three principal components were chosen as predictors for the final GWAS model, not the cluster numbers.
Manhattan plots illustrate the genome wide association for the binary traits in Figure 3. The genome wide suggestive level was calculated to be 1.7×10-6 corresponding to a Bonferroni-adjusted p-value of 0.2. Only one SNP (rs110651789, BTA22:5833750) showed a genome wide suggestive association to the trait DD PROLIFERATION (p-value < 1.7×10-6). No candidate genes were identified in a 20kb window surrounding this SNP. The low number of associated SNPs was expected due to the limited sample size. Therefore, for the gene set enrichment analysis the p-value of < 0.01 was used as significance criterion to ensure the necessary required number of significant SNPs for the ongoing analysis, as it was stated that gene set enrichment should include 100 to 2000 genes . Table 5 shows the number of statistically significant SNPs and the number of assigned significant genes per binary trait based on the p-value (< 0.01). An SNP can be assigned to more than one gene therefore the number of significant SNPs assigned to genes is higher than the number of (unique) significant mapped genes. The 708 - 849 unique significant genes (depending on the trait) could be used for enrichment and pathway based analysis. Unfortunately, not all of these unique significant genes could be mapped in DAVID 6.8 resulting in a reduced number of significant mapped genes. For all binary traits 2,637 genes were identified to be within 20kb of a significant SNP. Due to the limited sample size we did not include a correction for multiple testing in this analysis and the results were analyzed based on the raw p-values to maximize the data available from this study. Although the high number of expected false positive SNPs from the GWAS without correction for multiple testing reduces the strength of the analysis, this is the first described genomic study for DD in a beef cattle data set. We consider our work to be an example for follow-up studies about the genetics of DD in beef cattle.
A Fisher’s exact test was performed to search for an overrepresentation of significantly associated genes within each GO term. Only functional enriched terms with a Fisher’s exact test <0.001 were considered significant and used for further interpretation. A total of 13,824 background genes were assigned to the Biological Process (BP) GO term category, 15,461 background genes to the Cellular Component (CC) GO term category, and 13,094 background genes to the Molecular Function (MF) GO term category. All background genes were tested for significant enrichment of genes connected with DD in beef cattle. For the binary trait DD AFFECTED 478, 526, and 444 genes from the list of significant genes were mapped to the BP, CC, and MF gene ontology categories respectively. Slightly different numbers of genes were mapped to the BP, CC, and MF gene ontology categories for the other binary traits, DD ACUTE (580, 589, 503), DD CHRONICITY (472, 520, 431), and DD PROLIFERATION (431, 463, 380) (Online Resources 1, 2, 3). To include all possibly involved ontologies we decided to not exclude rare and widely distributed GO terms from our analysis. This is in agreement with previously published reports about GSEA in cattle . Although these categories seem to be functionally broad and encompass up to 4,149 genes of the Bos taurus genome, genes within 20kb distance to a significant SNP and with functions in relation to cows individual DD-status are found within these categories. From the significantly enriched genes 25 highly relevant genes were identified by their function described in literature. Their relations to the significant SNP from GWAS and identified significantly enriched gene ontologies are provided in Table 6. The genes will be numbered 1 to 25 in the following text. For each gene ontology category (BP, CC, and MF) we will list the statistically significant enriched GO terms for the binary traits and describe associated genes from literature followed by briefly discussing their general link to the pathogenesis of DD.
In the category BP 21 GO terms showed significant overrepresentation of genes associated with general DD-status (DD AFFECTED). Furthermore 8, 17, and 6 GO terms were significantly enriched for the binary traits DD ACUTE, DD CHRONICITY, and DD PROLIFERATION, respectively (Online Resource 1).
The term localization (GO: 0051179) is significantly (Fisher’s Exact < 0.001) enriched for three different traits (DD AFFECTED, DD ACUTE, and DD CHRONICITY). Two subtypes of this term are significantly enriched in this analysis for the trait DD AFFECTED: single-organism localization (GO: 1902578) and establishment of localization (GO: 0051234). These terms are connected to localization of cells, substances, or cellular entity (protein complex or organelle) and show a close hierarchy with ontologies related to transport (GO: 0006810, GO: 0044765). The transport function is the directed movement of substances or cellular components within a cell, or between cells, or within a multicellular organism by a transporter, pore or motor protein. Cellular localization and transport play a critically important role in regulating cellular interactions. Five candidate genes were represented in this biological process and may play a role in DD pathogenesis: (1) ATG4C, (2) DDR1, (3) DOCK1, (4) ,LASP1, and (5) LIMD1. Table 7 lists the function of these five genes.
Ontologies related to the regulation of differentiation of adipose cells (GO: 0090335), GO: 0045598) and the differentiation hair cells (GO: 0035315) were significantly enriched for the trait DD AFFECTED. Further negative regulation of differentiation of keratinocytes (GO: 0045617) showed significant enrichment for the trait DD PROLIFERATION as well as a close hierarchical connection to the two previously mentioned differentiation terms. Ancestors of this term show significant enrichment for the trait DD CHRONICITY including developmental process (GO: 0032502) as well as the development of anatomical structure (GO:0048856), and nervous system (GO: 0007399). Keratinocyte differentiation is crucial for the formation of the epidermal barrier to the environment. Disturbances of the epidermal barrier formation and the dysregulation of keratinocyte differentiation are involved in the pathophysiology of several skin diseases including DD . The following nine genes are involved in keratinocyte or myofibroblast differentiation: (6) DSC3, (7) CSTA, (8) COL11A1, (9) COL13A1, (10) TRAM2, (11) PRTN3, (12) SRSF6, (13) PDGFA, and (14) EGFR (Table 7). Any dysfunction in these genes could lead to an impaired epidermal barrier and allow for signs of DD to occur and to persist .
For the category CC, four GO terms showed significant overrepresentation of genes associated with general DD-status (DD AFFECTED). In addition 2, 4, and 1 GO terms were significantly enriched for the traits DD ACUTE, DD CHRONICITY, and DD PROLIFERATION, respectively (Online Resource 2).
The term integral component of plasma membrane GO: 0005887 was significantly enriched for the DD AFFECTED trait and is a subtype of GO: 0031226 intrinsic component of plasma membrane which was significantly enriched for the traits DD AFFECTED and DD CHRONICITY. GO: 0031226 is a subtype of GO: 0044459 plasma membrane part which is also significantly enriched for the DD AFFECTED trait. These terms are in close hierarchical relationship to the term GO: 0031225 anchored component of membrane which is significantly enriched for the trait DD CHRONICITY.
Plasma membrane receptors are essential for signaling many important biological events, including cell migration . Motility requires adhesive interactions of cells with the extracellular matrix (ECM). These interactions are partly mediated by integrins, a family of cell surface adhesion receptors. Integrin-mediated cell adhesion regulates a multitude of cellular responses, including proliferation, survival, and cross-talk between different cellular pathways. Integrins confer different cell adhesive properties, particularly with respect to focal adhesion formation and motility . Six genes are represented in the component plasma membrane term of the category cellular component: (15) ITGA1, (16) ITGA2, (17) ITGB5, (18) ITGA11, (19) ITGAV, and (20) PTGER4 (Table 7). Theses candidate genes are involve in integrin-mediated cell adhesion and cell migration and could affect DD prognosis and pathogenesis.
With regards to the category MF, five GO terms showed significant overrepresentation of genes associated with general DD-status (DD AFFECTED). Furthermore 7, 10, and 9 GO terms were significantly enriched for the traits DD ACUTE, DD CHRONICITY, and DD PROLIFERATION, respectively (Online Resource 3).
Ontologies related to transmembrane transport activity (GO:0022857) included the following subtypes: metal ion (GO: 0046873), inorganic cation (GO: 0022890), cation (GO: 0008324), ion (GO: 0015075), and substrate-specific (GO: 0022891 and GO: 0022892) and the ontologies were significantly enriched for the traits DD CHRONICITY and partly for DD AFFECTED.
Transmembrane transport plays a pivotal role in the communication between cells. Cells must be able to communicate across their membrane barriers to exchange materials with the environment . Calcium dynamics play a key role in keratinocyte differentiation and epidermal homeostasis . Impairment of these genes could interrupt epidermal homeostasis allowing for the more rapid progression of DD. The following relevant genes were included in the term category molecular function and associated with transmembrane transport/channel, (21) RYR1 and (22) TRPV4 (Table 7).
The term transmembrane-ephrin receptor activity GO: 0005005 was significantly enriched for the trait DD PROLIFERATION and is related to protein kinase activity GO: 0004672 which was significantly enriched for the trait DD AFFECTED and DD PROLIFERATION. The ontologies related to transferase activity (GO: 0016740 and GO: 0016772) and the following subtypes: phosphotransferase activity (GO: 0016773) kinase activity (GO: 0016301) were also significantly enriched for the trait DD PROLIFERATION. The previously mentioned terms are in close hierarchical relationship to phosphoric ester hydrolase activity (GO: 0042578) and hydrolase activity (GO: 0016788) which were significantly enriched for the traits DD AFFECTED and DD ACUTE. Furthermore, the term collagen receptor activity (GO: 0038064) that was significantly enriched for the trait DD AFFECTED also showed close relationship to the previously mentioned terms.
Epidermal development, homeostasis, and repair are complex processes requiring regulation of cell proliferation, migration, and differentiation. Several of the signals essential for epidermal development and repair are tranduced by the epidermal growth factor receptor (EGFR) and integrins . Two genes were represented in the cellular enzyme and receptor activity term of the category molecular function, (23) RAF1 and (24) ROCK1 (Table 7). These genes could influence the barrier function and integrity of the skin as well as proliferation of the skin in chronic DD lesions.
Ontologies related to the terms GTPase regulator activity GO: 0030695, nucleoside-triphosphatase regulator activity GO: 0060589, and protein kinase activator activity GO: 0030295 which are subtypes of molecular function regulator GO: 0098772 were found significantly enriched for the traits DD ACUTE, DD CHRONICITY, and DD PROLIFERATION. The RAS pathway is essential to epidermal homeostasis. Epidermal function requires the balance between keratinocyte proliferation, adhesion to ECM proteins and terminal differentiation be tightly regulated. This process of balancing proliferation, adhesion, and differentiation is profoundly disrupted during pathogenesis of DD. One gene was represented in the enzyme regulator activity term of the category molecular function, (25) DAB2IP (Table 7).
The incidence of DD is a complex trait controlled by multiple genes. Therefore this approach of searching the significantly enriched GO terms and identifying 25 functional genes highly involved in cellular and membrane function pertaining to adhesion, migration and proliferation which could contribute to DD traits provides a unique opportunity to better understand the genetic architecture of this complex disease. Further pathway analysis research is needed to explore the associated biochemical, molecular, and cellular pathways to link the predominant pathways with the 25 candidate genes. This could reveal factors that contribute simultaneously to the complex trait of DD.
In this study a genomic analysis was performed for DD incidence in beef cattle. Digital dermatitis data were analyzed combining phenotype and genetic marker information into genomic models for the assessment of genetic parameters and for the detection of genomic regions associated with the disease. The GWAS results revealed DD incidence as a complex trait, where multiple genes control or regulate minor effects, with no specific genomic region highly contributing to the genetic variance. Although our sample size was limited, using gene set enrichment analysis (GSEA) significantly enriched gene ontologies associated with DD traits were identified. The significantly enriched GO terms from the ontology categories, biological process and cellular components, and molecular function, were highly involved in cellular and membrane function, revealing genes pertaining to adhesion, migration and proliferation. The processes are considered to be crucial for the pathogenesis of DD. This study provides a foundation for future studies in understanding the complex pathogenesis and genetic background of DD in beef cattle.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Compliance with ethical standards: The experimental procedures for the trial were approved by the Institutional Animal Care and Use Committee (IACUC V01525) at the University of Wisconsin-Madison, ensuring the ethical and sensitive care and use of animals in research, teaching and testing.
Subscribe to our articles alerts and stay tuned.