Protein profiling as a tool for identifying environmental aerobic endospore-forming bacteria

Species of the genus Bacillus and related genera are collectively designated Aerobic Endospore-Forming Bacteria (AEFB). Inside the phylum Firmicutes, these species are allocated in the class Bacilli, order Bacillales which contains seven out of ten families harbouring endospore-formers: Alicyclobacillaceae, Bacillaceae, Paenibacillaceae, Pasteuriaceae, Planococcaceae, Sporolactobacillaceae and Thermoactinomycetacea [1,2]. AEFB are widely distributed in nature, including extreme environments, and the soil is considered their main repository [3]. Bacillus anthracis and B. cereus are known for infecting humans. To highlight the ecological and economic importance of some AEFB strains we can mention a wide range of properties, including nitrogen fi xation; plant growth promotion; activity toward insects, nematodes, and fungi; soil phosphorus solubilisation; production of exopolysaccharides, high diversity of hydrolytic enzymes, antibiotics, cytokinins, among other bioproducts [1,2].

draw the largest frontiers in the prokaryotic classifi cation system [5]. However, phenotype can infl uence the depth of a hierarchical line consistency and is necessary to generate useful characterization [6,7].
Among the phenotype-based methods for the identifi cation of microorganisms, the use of the matrix-assisted laser desorption/ionization time-of-fl ight mass spectrometry (MALDI-TOF MS) has dramatically increased [8,9]. Analyses by MALDI-TOF MS do not require lengthy biochemical reactions and are faster than other conventional phenotypic identifi cation methods, presenting similar or even superior reliability [8]. Besides, the toleration of varying growth conditions and the high reproducibility of this technique resulted in the elaboration of standard protocols [10,11]. Indeed, clinical laboratories have been successfully using MALDI-TOF MS to identify microorganisms at the species level, allowing that most of the clinically relevant pathogens to be rapidly included in the spectra database [12][13][14]. The effi cacy of method relies on the stability of mass spectral patterns generated, since some cell components, routinely used on the analyses, are ubiquitous, highly conserved, integral, and abundant in living cells [15,16]. Mass spectra resulting from whole cells, or protein extracts, are compared to reference spectra available in commercial databases, based, in particular, on clinical strains.
The more similar the mass spectral patterns are, the closer to the phylogenetic relationships. Given the predominance of ribosomal and regulatory proteins, besides clinical diagnoses, these biomarkers are also useful for taxonomic studies of bacteria.
Using MALDI-TOF MS we generated spectra from 64 environmental AEFB samples isolated from Brazilian soils and quoted as SDF (Solo do Distrito Federal) strains [17]. The predictive molecular relationship of protein profi ling obtained for these environmental AEFB was further compared with classifi cation based on the reference-method for taxonomic assignment of prokaryotes, the 16S rRNA gene sequencing.   to determine best quality regions. Consensus sequences (550-600 nucleotides) were created using BioEdit 7.2.6 software and deposited at NCBI (Tables 1-5 for accession numbers).

Material and methods
Similarity of 95%-96% and ≥97% were considered as the threshold values for identifi cation at the genus and species levels, respectively.       [24].

Fresh cells from 4 single colonies per SDF
Correspondingly, Bacillus megaterium/aryabhattai are also among many pairs of distinct taxa of AEFB that bear extreme close evolutionary relationship sharing 99.7% similarity of 16S rRNA sequences [25].
Currently, MALDI-TOF MS is well-established as a fast and reliable technique in clinical laboratories to identify microorganism species [13]. However, application of this technique in other fi elds of microbiology, whose reference databases cover only a small portion of the vast range of microbial diversity, has been limited [26,27]. Even though, protein profi ling has been found to be useful in discriminating many closed-related Bacillus sp. [28][29][30][31][32][33].
In this work, we compared MALDI-TOF MS analysis of 64 As for the results discussed above, these discrepancies are most likely due to the insuffi cient coverage of bacterial species in the databases. Indeed, at the time these analyses were performed, most environmental species studied here were underrepresented with one or few spectra in the reference library.
With respect to 31 out of 64 SDF strains which MALDI BioTyper identifi cations reached scores of >2.000 (species identifi cation), 19(61.29%) were concordant (Table 3) and SDF0014 and SDF0108 (6.45%) were identifi ed only at the genus level by 16S rRNA gene sequencing (Table 5). Nevertheless, the remaining 10(32.25%) strains were also identifi ed at the species level by both methods ( Our classifi cation based on 16S rRNA gene sequences is a preliminary determination of genera or species. Thus, when 16S rRNA gene profi ling placed these strains within these Bacillus sp. groups, the sample analysed can belong to two or even more species alternatives within the same affi liation cluster. Therefore, in these instances the 16S rRNA gene sequencing can only identify these sets of bacteria but cannot assign it accurately to a certain species according to its low discrimination capacity. Even so, our assignments are useful because they clearly identify the genera and restrict the identity of SDF strains to one or two possible species in the genera described. Since strains SDF0108 and SDF0139 presented similarity of 94%, the 16S rRNA gene-sequencing tool failed to classify both even at genus level (Tables 5,1, respectively; highlighted in grey). Though sometimes the use of these sequences as a single marker is not enough to delineate species, low gene sequence similarity may grant the fi rst indication that a novel species could have been isolated [34]. However, description of new species is beyond the scope of this study.
The results obtained here demonstrated that both techniques used for the identifi cation of SDF strains had good resolution at the genus level. However, 16S rRNA gene sequences achieved superior capacity in identifying these environmental AEFB at the species level when compared with MALDI-TOF MS method.
Both tools showed a lack of effi ciency to discriminate closely related species. Nevertheless, this initial outline clarifi ed the genetic interrelationships of these environmental strains.
Hence, sequence similarity values and score ranges were complementary to each other and can help if comprehensive high-quality reference datasets are available.
Considering that in the present study less than 50% of the SDF strains were identifi ed at the species level using MALDI- This study also supports the need of using phenotypic along with genotypic methods into polyphasic approaches for taxonomic purposes of the diversity of AEFB.