The results suggest that the non-coding trnH-psbA intergenic spacer remains the most viable candidate for a single-locus barcode for land plants [8]. In the expanded sampling of loci and taxa the trnH-psbA spacer continued to successfully address the trade-off between universal application and high sequence divergence. PCR priming sites within highly conserved flanking coding sequences combined with a non-coding region that exhibits high sequence divergence among species as well as diagnostic insertion/deletion mutations makes the trnH-psbA spacer highly suitable as a plant barcode. The significant length variation in trnH-psbA due to insertions, deletions, and simple sequence repeats as well as the genomic rearrangement of the inverted repeat in some monocots [19] could be considered as a possible limitation. Non-coding spacers can be difficult to align thereby limiting their utility in phylogenetic studies at higher taxonomic levels [20]. However, this issue has minimal effect on barcoding because the primary goal is species identification and not phylogenetic reconstruction that requires correct alignments. As demonstrated here for trnH-psbA GenBank BLASTn searches can find the correct match despite sequence length variation and gaps and thus allow the presence of indels in a target barcode sequence. The local alignment algorithm currently used in a BLASTn search should be improved by substituting a global alignment algorithm, such as the one used in the Barcode of Life Data System (BOLD)[21], that is more efficient at aligning sequences with significant length variation and therefore more successful at matching them within a known sequence database. Search algorithms that use indels as characters should then have greater power to discriminate through exclusion of sequences that do not align and thereby reduce the database population against which the query sequence is compared [22].
The trnH-psbA spacer is the most promising single locus for a land plant barcode according to the criteria of universal application and high sequence divergence among species. The intent of the present study was to use these criteria to compare the trnH-psbA spacer with other suggested barcode loci across land plants. Several of the plastid genes (matK, rbcL, rpoB2, and rpoC1) as well as the nuclear ITS region exhibit some features that would make each a possible candidate for a plant barcode (Table 1). However, each of these loci also possesses one or more significant flaws that make it less suitable either due to low PCR amplification success, low levels of sequence divergence, limited utility in non-angiosperms, and/or absence in some land plant lineages. For example, rpoB2 had a high mean sequence divergence value (2.05%), but poor PCR success in non-angiosperms (failed in all tested gymnosperms, ferns and all but one moss); rpoC1 had better PCR success (83.3%) than rpoB2, but a lower mutation rate (1.38%). The locus matK, which has been shown to be quite variable in numerous phylogenetic studies [20], [23], had the lowest amplification success (39.3%) of all loci tested in this study. Further development of primer designs for matK and the other loci may improve amplification success, but none of these genes have highly conserved sites near the most variable parts of the locus and hence it is not likely that sufficiently universal primers will be developed. Interestingly, rbcL-a in some cases proved better than other coding loci as a barcode. The mean percent sequence divergence for rbcL-a ranked sixth, but it exceeded all other loci except ITS1 and trnH-psbA in the percent of genera in which species pairs could be differentiated (69.8%). PCR success in rbcL-a was also very high (92.7%). ITS1, which was earlier suggested as a possible barcode for flowering plants [8], in this study proved less favorable because of the low primer success across land plants (60.4%). In addition, due to its multicopy nature ITS exhibits high levels of within-species and even within-individual sequence differentiation [24] further reducing its application as a barcode. Three of the tested genes have been shown to be absent in some major groups of land plants, i.e., accD absent in grasses, ndhJ absent in pines, ycf5 absent in bryophytes [25], thereby disqualifying them for consideration as widely applicable plant barcodes.
Six of the 48 genera in our sample (Citrus, Encephalartos, Ludisia, Magnolia, Raphanus, and Sabal) were invariant at each of the nine loci in the species pairs tested. Some of these genera are members of families that are known to show low levels of interspecific sequence divergence (e.g., Arecaceae [26], Cycadaceae [27]) and were selected for this reason to be tested in this study. The possible explanations for the lack of sequence variation are several: exceptionally low rates of sequence evolution in these taxa, taxonomic misidentification, and experimental error. If these six genera are examples of overall low rates of sequence divergence, then effective barcoding of such taxa will be difficult no matter which locus is selected. If the lack of sequence variation is due to taxonomic misidentification, i. e., supposedly different species of a pair are actually the same species, or experimental error, i. e., faulty sequencing techniques, a significantly increase in success rate of identification should be possible in the future.
Despite the promise of trnH-psbA as a candidate for a land plant barcode, the results reported here suggest that a single locus may not differentiate more than 80% of plant species. If discriminatory power greater than 80% is required, then two or more loci will be needed for maximal species identification in land plants. Here efforts have focused on a two-locus rather than a three or more locus approach because it is simply the most expedient system to use requiring less cost and effort with the desired results. In fact in the present study three-locus systems demonstrated little or no gain over two-locus systems in the proportion of species in a pair that could be differentiated.
A two-locus combinatorial method has been suggested previously [7]–[8], [28], but has never been satisfactorily tested. The results of both generating new test sequences across land plants (Table 4) and in data mining GenBank (Table 3) demonstrate the utility of this approach. The loci chosen should complement each other both in terms of the lineages within which each can discriminate and in balancing type I (incorrect species assignment) and type II (falsely rejecting proper assignment) errors. The combination of the non-coding trnH-psbA spacer with one of three coding regions, rbcL-a, rpoB2, or rpoC1, promises the highest universality and the greatest ability to differentiate species pairs in our sample. Complementing a rapidly evolving locus such as the trnH-psbA spacer with a more conservative locus (such as the coding locus rbcL) can minimize type I errors (such that sequences are robustly assigned to the correct genus at least) and type II errors (higher rates of sequence divergence can discriminate among closely allied species in highly speciose genera). Thus rbcL with its proven ease of amplification with broadly applicable primers across land plants and its proven ability to identify taxa at the level of genus and family make it the most appropriate choice for a two-locus barcode coupled with trnH-psbA.
The balance of within- and between-species sequence variation is an important aspect of barcode identification [1]–[2], [29] and should be taken into account in the development of a barcode for any group of organisms. Multiple samples per species were not included in the present study to ascertain the level of intraspecific sequence variation for each locus. Such trials are now underway. However, prior reports demonstrate that both rbcL [30] and trnH-psbA [28] show significantly lower levels of genetic divergence within species than between species.
In conclusion a two-locus barcode that combines a subunit of the coding locus rbcL (rbcL-a) with the non-coding trnH-psbA spacer is recommended. rbcL-a provides a strong recognition anchor that will place an unidentified specimen into a family, genus, and sometimes species; the highly variable trnH-psbA spacer will further narrow the correct species identification where rbcL-a lacks discriminating power, especially in species-rich genera of angiosperms. Both of these loci have standard primers currently available that make them universally amplifiable with the least effort in the broadest range of land plants. This two-locus plant barcode is now being applied to build a library of over 700 species of the world's most important medicinal plants [31; Kress and Erickson, unpubl.]. This barcode library can then be used to test the identity and purity of plant-based medicines and herbals, such as ginseng, ginkgo, echinacea, and St. John's wort, sold in commercial markets and used by consumers. The results of this effort will contribute to the suite of uses of DNA barcodes with substantial economic and social value.
النتائج (
العربية) 1:
[نسخ]نسخ!
وتشير النتائج إلى أن فاصل إينتيرجينيك بسبا ترنة غير الترميز لا يزال المرشح الأكثر قدرة على البقاء لرمز شريطي محور واحد للنباتات البرية [8]. في أخذ العينات التوسع المكاني والأصناف واصل مباعدة بسبا ترنة التصدي بنجاح للمفاضلة بين التطبيق الشامل وتسلسل عالية التباين. بكر فتيلة مواقع داخل المرافقة عاليا يحافظ الترميز تسلسل جنبا إلى جنب مع منطقة غير الترميز الذي يسلك تسلسل عالية التباين بين الأنواع، فضلا عن الطفرات التشخيص الإدراج/الحذف يجعل مباعدة بسبا ترنة مناسبة جداً كرمز شريطي نبات. يمكن اعتبار اختلاف طول كبير في بسبا ترنة بسبب عمليات الإدراج والحذف، وتكرار تسلسل بسيط، فضلا عن إعادة ترتيب الجينوم من تكرار مقلوب في بعض مونوكوتس [19] حد ممكن. الفواصل عدم الترميز يمكن أن يكون صعباً محاذاة مما يحد من فائدتها في دراسات النشوء والتطور على أعلى المستويات التصنيفية [20]. ومع ذلك، هذه المسألة له تأثير الحد الأدنى على المتوازية لأن الهدف الأساسي هو تحديد الأنواع والتعمير النشوء والتطور لا يتطلب التحالفات الصحيحة. كما أظهرت هنا بسبا ترنة البحث بلاستن بنك الجينات العثور على المباراة على الرغم من اختلاف طول التسلسل والثغرات الصحيحة وبالتالي السماح بوجود إينديلس في تسلسل باركود هدف. وينبغي تحسين محاذاة المحلية الخوارزمية المستخدمة حاليا في بحث بلاستن عن طريق استبدال خوارزمية محاذاة عالمية، مثل تلك المستخدمة في الباركود للحياة بيانات النظام (غامق) [21]، الذي أكثر كفاءة في محاذاة تسلسلات مع اختلاف طول كبير وذا أكثر نجاحا في مطابقة لهم داخل قاعدة بيانات تسلسل معروفة. ينبغي أن يكون البحث الخوارزميات التي تستخدم إينديلس كأحرف ثم سلطة أكبر لتميز عن طريق استبعاد تسلسلات التي لا محاذاة وبالتالي الحد من قاعدة بيانات السكان ضد الذي هو الاستعلام تسلسل المقارنة [22].مباعدة بسبا ترنة هو محور واحد الواعدة لرمز شريطي نبات أراضي وفقا لمعايير التطبيق الشامل وتسلسل عالية التباين بين الأنواع. وكان القصد من هذه الدراسة استخدام هذه المعايير لمقارنة مباعدة ترنة بسبا مع المكاني الباركود المقترحة الأخرى عبر النباتات البرية. العديد من الجينات البلاستيديه (ماتك، ربكل، rpoB2، و rpoC1)، فضلا عن المنطقة للبحث عن النووي يحمل بعض الميزات التي تجعل كل مرشح محتمل لنبات رمز شريطي (الجدول 1). ومع ذلك، كل من هذه المكاني تمتلك أيضا عيوب كبيرة واحدة أو أكثر التي جعلها أقل مناسبة أما بسبب تدني النجاح التضخيم PCR، ومستويات منخفضة من الاختلاف في التسلسل، وفائدة محدودة في غير كاسيات البذور، و/أو غياب في بعض السلالات النباتية البرية. على سبيل المثال، قد rpoB2 تسلسل يعني ارتفاع قيمة اختلاف (2.05%)، لكن نجاح بكر الفقراء في غير كاسيات البذور (فشل في جميع عاريات البذور المجربة، سرخس، ولكن كل واحد موس)؛ rpoC1 كان أفضل نجاح بكر (83.3%) من rpoB2، ولكن انخفاض معدل تحور (1.38%). ماتك المكان، الذي أظهر أن يكون متغير تماما في دراسات عديدة النشوء والتطور [20]، [23]، كان نجاح التضخيم أدنى (39.3%) من جميع المكاني اختبارها في هذه الدراسة. مواصلة تطوير التصاميم التمهيدي ماتك وفي مواضع أخرى قد تحسين التضخيم النجاح، لكن أيا من هذه الجينات العالية قد حفظت مواقع بالقرب من الأجزاء الأكثر متغير من محور وبالتالي ليس من المرجح أنه سيتم وضع كبسولة تفجير عالمي بما فيه الكفاية. من المثير للاهتمام، ربكل في بعض الحالات أثبتت أنها أفضل من غيرها الترميز المكاني كرمز شريطي. تجاوز اختلاف تسلسل متوسط النسبة المئوية ربكل في المرتبة السادسة، ولكن جميع مواضع أخرى فيما عدا ITS1 وبسبا ترنة في المائة من أجناس في الأنواع التي يمكن أن تكون أزواج متباينة (69.8%). وكان أيضا نجاح بكر في ربكل عالية جداً (92.7%). ITS1، الذي اقترح في وقت سابق كرمز شريطي المحتملة للنباتات المزهرة [8]، في هذه الدراسة أثبتت أقل إيجابية بسبب نجاح التمهيدي منخفضة عبر النباتات الأرضية (60.4%). وبالإضافة إلى ذلك، نظراً للمعارض للبحث عن طبيعة مولتيكوبي مزيد من مستويات عالية من داخل الأنواع والتمايز حتى داخل الفرد تسلسل [24] الحد من تطبيقه كرمز شريطي. ثلاثة من اختبار الجينات أظهرت أن تكون غائبة في بعض المجموعات الرئيسية من النباتات البرية، أي، accD غائبة في الأعشاب، ندهي غائبة في الصنوبر، ycf5 غائبة في بريوفيتيس [25]، وبالتالي إلغاء تأهيل لهم للنظر فيها كالرموز الشريطية النباتية المطبقة على نطاق واسع.Six of the 48 genera in our sample (Citrus, Encephalartos, Ludisia, Magnolia, Raphanus, and Sabal) were invariant at each of the nine loci in the species pairs tested. Some of these genera are members of families that are known to show low levels of interspecific sequence divergence (e.g., Arecaceae [26], Cycadaceae [27]) and were selected for this reason to be tested in this study. The possible explanations for the lack of sequence variation are several: exceptionally low rates of sequence evolution in these taxa, taxonomic misidentification, and experimental error. If these six genera are examples of overall low rates of sequence divergence, then effective barcoding of such taxa will be difficult no matter which locus is selected. If the lack of sequence variation is due to taxonomic misidentification, i. e., supposedly different species of a pair are actually the same species, or experimental error, i. e., faulty sequencing techniques, a significantly increase in success rate of identification should be possible in the future.Despite the promise of trnH-psbA as a candidate for a land plant barcode, the results reported here suggest that a single locus may not differentiate more than 80% of plant species. If discriminatory power greater than 80% is required, then two or more loci will be needed for maximal species identification in land plants. Here efforts have focused on a two-locus rather than a three or more locus approach because it is simply the most expedient system to use requiring less cost and effort with the desired results. In fact in the present study three-locus systems demonstrated little or no gain over two-locus systems in the proportion of species in a pair that could be differentiated.
A two-locus combinatorial method has been suggested previously [7]–[8], [28], but has never been satisfactorily tested. The results of both generating new test sequences across land plants (Table 4) and in data mining GenBank (Table 3) demonstrate the utility of this approach. The loci chosen should complement each other both in terms of the lineages within which each can discriminate and in balancing type I (incorrect species assignment) and type II (falsely rejecting proper assignment) errors. The combination of the non-coding trnH-psbA spacer with one of three coding regions, rbcL-a, rpoB2, or rpoC1, promises the highest universality and the greatest ability to differentiate species pairs in our sample. Complementing a rapidly evolving locus such as the trnH-psbA spacer with a more conservative locus (such as the coding locus rbcL) can minimize type I errors (such that sequences are robustly assigned to the correct genus at least) and type II errors (higher rates of sequence divergence can discriminate among closely allied species in highly speciose genera). Thus rbcL with its proven ease of amplification with broadly applicable primers across land plants and its proven ability to identify taxa at the level of genus and family make it the most appropriate choice for a two-locus barcode coupled with trnH-psbA.
The balance of within- and between-species sequence variation is an important aspect of barcode identification [1]–[2], [29] and should be taken into account in the development of a barcode for any group of organisms. Multiple samples per species were not included in the present study to ascertain the level of intraspecific sequence variation for each locus. Such trials are now underway. However, prior reports demonstrate that both rbcL [30] and trnH-psbA [28] show significantly lower levels of genetic divergence within species than between species.
In conclusion a two-locus barcode that combines a subunit of the coding locus rbcL (rbcL-a) with the non-coding trnH-psbA spacer is recommended. rbcL-a provides a strong recognition anchor that will place an unidentified specimen into a family, genus, and sometimes species; the highly variable trnH-psbA spacer will further narrow the correct species identification where rbcL-a lacks discriminating power, especially in species-rich genera of angiosperms. Both of these loci have standard primers currently available that make them universally amplifiable with the least effort in the broadest range of land plants. This two-locus plant barcode is now being applied to build a library of over 700 species of the world's most important medicinal plants [31; Kress and Erickson, unpubl.]. This barcode library can then be used to test the identity and purity of plant-based medicines and herbals, such as ginseng, ginkgo, echinacea, and St. John's wort, sold in commercial markets and used by consumers. The results of this effort will contribute to the suite of uses of DNA barcodes with substantial economic and social value.
يجري ترجمتها، يرجى الانتظار ..