A Preliminary analysis of potential allergens in a GMO Rice: A Bioinformatics approach

Large protein molecules, called allergens, frequently trigger an allergy. It involves a series of intrinsic and extrinsic reactions that contribute to triggering the symptoms (Somvanshi, et al., 2008). A typical allergy (Type I hypersensitive reaction) is induced by allergens that trigger specifi c IgE antibodies between common homologous allergens from different sources. Allergy symptoms include asthma, atopic dermatitis, and rhinitis, but severe reactions may also occur like an anaphylactic shock, which could lead to death.

insertion (Haslberger, 2003). Comprehensive analysis, such as bioinformatics, may be useful tools in assessing allergenicity of newly expressed proteins in GMOs versus conventionallybred crops. It covers various bioinformatics methods available, including allergen databases and algorithms for the search of sequence identity of newly expressed proteins with known allergens and assessing the relevance of alignments observed (Kaiserlian, et al., 2010).
Chemical comparison of a new protein with known allergens is a useful method for assessing a protein's allergenic potential. Proteins are made up of long chains of amino acids, but their allergenicity is determined by only a few residues which serve as a binding site for antibodies. The ability of a protein to induce both humoral and cellular Th2 immune responses, leading to the release of allergen-specifi c IgE and Th2 cytokines, is a measure of its allergenicity [1]. Such property is usually assessed by in vitro and in vivo tests. No protein can be classifi ed as allergenic without at least the evidence of the ability to bind in vitro IgEs from sera of sensitized individuals.
If antibodies can attach themselves to a new protein and elicit a hypersensitivity cascade, there is a high chance that an allergic reaction can occur. The amino acid sequences of a protein are compared with the known allergens to determine homologies.
A minimal requirement for sequence homology with a known allergen is a 35% sequence identity over a window of at least 80 amino acids [2]. GM crops undergo rigorous assessment for food, feed, and environmental safety before commercialization. However, detecting potential allergens, either in vivo or in vitro using molecular biology techniques, is challenging, time-consuming, and costly [3]. Additionally, eliciting an immune response is very complicated as the body responds to allergens by inducing many processes [4]. Possible development of potentially lifethreatening allergic reactions or even the lack of sensitivity due to genetic factors could be some of these methods' challenges.
These limitations make the in silico (or bioinformatics) an acceptable, but a preliminary approach for identifying crossreactive epitopes and allergenicity [5].
The allergenicity assessment for GM plants includes two approaches: (1) the assessment of the entire GM plant, and (2) the evaluation of the newly expressed proteins [6]. Based on the latter, in this study, an in silico based allergenicity screening is demonstrated as a quick and straightforward approach to identify potential allergens from newly expressed proteins based on differentially-expressed genes that arise from a transgenic variety of rice.

Microarray data information
The protein sequences used in this study were obtained from the differentially expressed genes from the rice microarray data by Batista et al. [7]. Briefl y, the gene expression profi les of GSE12069 were downloaded from the GEO database (https:// www.ncbi.nlm.nih.gov/). The dataset was based on the GPL2025 platform (Affymetrix Rice Genome Array, Thermo Fisher Scientifi c, Inc., Waltham, MA, USA). GSE12069 included data from a well-characterized transgenic rice line (cv. Bengal) and its non-transgenic mother plant as control (submission date, 10 July 2008). The stable transgenic line is on its third generation of self-pollination after transformation. The plant expresses a ScFV antibody (ScFvT84.66) against the carcinoembryonic antigen [7].
We used the GEO2R tool (http://www.ncbi.nlm.nih.gov/geo/ geo2r/) to identify differentially expressed mRNAs from the GEO series [8]. Each sample within a GEO series was fi rst classifi ed into either normal or mutant variety. Then the defi ned groups were inputted into GEO2R. GEO2R provided a list of DEGs ranked according to differential expression levels. DEGs up-regulated signifi cantly (>2.0-fold) and with signifi cance (p < 0.01) were collected. Finally, the sequences were extracted from the David

In silico prediction of allergenicity
The protein sequences were subjected to allergenicity screening using AlgPred and Allermatch. AlgPred is an online tool that allows the prediction of allergens based on the similarity of known epitope with any region of a query protein (http://www.imtech.res.in/raghava/algpred/). Three modules were used: the SVM module based on amino acid composition prediction, the MEME/MAST motif prediction approach, and

Homology modeling and visualization
Homology models of proteins are of great interest for better appreciating biological properties experiments when no experimental three-dimensional structures are available, especially if these are novel proteins. The 3D protein structure was modeled on the SWISS-MODEL workspace using the Alignment Mode [12], and was visualized using the program PyMol [13].

Results and discussion
Following our preliminary screening of the 50 rice proteins with AlgPred and Allermatch, we found fi ve proteins that were considered potentially allergenic (Table I). A cysteine proteinase precursor (Q7F3A8) was predicted to be the most allergenic. It contains the IgE epitope, VKNSWGTAWGEGGYI, which is homologous to known allergens from short ragweed (Ambrosia artemisiifolia), kiwi (Actinidia deliciosa), papaya (Carica papaya), pineapple (Ananas comosus), and soybean (Glycine max). A homology model of the protein is shown in Figure 1, and the IgE cross-reactive epitope is highlighted in red. Cysteine proteinase, one of the signifi cant groups of plant proteases, has been thoroughly studied due to its crucial role in senescence and programmed cell death [14]. Interestingly, the potential allergen is a member of the structurally-related protein superfamilies -the prolamins [15]. These protein families are have structural domains of plant food allergens known to trigger a reaction via the gastrointestinal tract [16].
Based on the SVM method, due to dipeptide composition, a putative germin A protein (Q6YZA4) was predicted to be allergenic by AlgPred. It should be noted that the sensitivity, specifi city, accuracy, and Mathew's correlation coeffi cient for SVM for dipeptide composition for Algpred are 88.8%, 88.2%, 88.5%, and 0.770, respectively [10,17]. Germins and the related germin-like proteins (GLPs) are glycoproteins expressed in many plants in response to biotic and abiotic stress. In an immunoblotting assay, 24 out of 82 tested sera (29.26%) from allergic patients showed IgE-binding to germins. Germins can bind to IgE antibodies likely via their carbohydrate moieties [18].
Similar to germins, subtilisin-like serine proteinase (Q0E256) are also involved in the protective signaling mechanisms. ALE1, a gene homologous to subtilisin-like serine proteases, was found to be expressed within specifi c endosperm cells adjacent to the embryo and regulates the formation of cuticle on embryos and juvenile plants [19]. Strangely, its biological role in modifying proteins for plants is also mirrored by the C. elegans (nematode). A homolog subtilisin-like serine protein is involved in cuticle formation and essential for early development and adult morphology [20]. Subtilisin-like serine proteinase (Q0E256) in rice is homologous to the allergen Cuc m 1 from muskmelon. The thermally-stable Cuc m 1 is the only plant food allergen belonging to the family of serine proteases.
Most allergens from this family are fungal allergens from the subfamilies of alkaline or vacuolar serine proteases [21].
The glucan endo-1,3-beta-glucosidase 14 (Q0JDD4), a poly- Table 1: Up-regulated genes in GM rice that have allergenic potentials based on the differences in the microarray profi le differences between T1 NipponBare GM vs. control. The GM rice is a T1 generation of an Agrobacterium-transformed transgenic line [7]. Allergens which have similarity within an 80-amino acid window of the up-regulated protein are shown in bold letters. * Information on allergen name, species, and % identical amino acid ** No allergen name, allergen ID provided   [22]. Recently, a novel allergen, PR-1a was cloned. PR-1a is a causative agent of peach tree pollen sensitization and is similar to glucan endo-1,3-beta-glucosidase 14. IgE of subjects who developed peach tree pollen allergies living in areas where this tree is widely cultivated was found to recognize a glucan endo-1,3-beta-glucosidase-like protein [23].

Affymetrix ID
Lastly, we found potentially allergenic an unnamed protein (Q7XVQ2) that was overly expressed up to 19- PIII from Lolium perenne or ryegrass [24]. These grass pollens are widely implicated in the cross-sensitivity of people to house dust mites triggering severe asthma or allergic rhinitis [25].
From 1992-2017, more than 1,300 separate assessments by regulatory agencies around the world have reviewed the safety data on various GM crops, with every report concluding that the GM crop is as safe as conventionally-developed crops.
For the 20 years of accumulated data, it has been concluded by far that the application of transgenic methods does not affect the levels of allergenic proteins native to a particular crop [6]. While a motif, sequence homology-based screening, is a part of a pipeline for allergenicity assessment of novel foods, we emphasize that in silico analysis should be backed up or validated by empirical research. A signifi cant limitation of this study is that the potential allergens were data mined from transcriptome data. The food allergens Ara h 1 (peanut), Pru p 3 (peach ) and Gly m 5 (soybean ) are usually present at relatively high amounts from 1,000 to 10,000 ppm [6,26].
By contrast, the putative allergens proteins identifi ed in this study were associated with only 8-15 fold increased in mRNA expression as compared to the non-GMO counterpart. Thus, it may be orders of magnitude lower than the sensitizing level of typical food allergens. There is also a possibility that posttranslational modifi cations may happen downstream and that the potentially allergenic proteins may susceptible to gastric digestion. Of note, the ability of a protein's epitopes to survive gastric digestion is an essential characteristic of food allergens, while in vitro pepsin resistance tests remain an integral part of the weight-of-evidence approach to assessing the allergenic potential of any novel protein [27]. Further research can focus on studying these potential allergens in the food matrix since the context of the food ingestion can modify the conformational epitopes. This technical issue assumes signifi cance and needs to be given some thought concerning novel foods' safety assessment.

Conclusions and perspectives
In conclusion, an in silico method may be a useful tool for the initial prediction of potentially allergenic proteins provided gene expression data is available. Five potential allergens have been identifi ed for a transgenic rice variety. As recent studies recommend, modern bioinformatics tools could serve as a preliminary yet robust approach to identifying allergenic proteins in food [28][29][30][31][32]. Since this is a preliminary study, further tests should be done to conclude the likelihood of allergenicity of the proteins.