Phylogenetic placement and contamination screening of Amoebozoa genomic data from the Protist 10,000 Genomes (P10K) Database

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Background: Genomic data are essential for uncovering the evolutionary history, ecological roles, and diversity of life. Yet, microbial eukaryotes like Amoebozoa, an ancient and morphologically diverse lineage, remain critically underrepresented in genomic repositories. This has limited our ability to address fundamental questions in eukaryotic evolution. The Protist 10,000 Genomes (P10K) initiative seeks to fill this gap by generating and compiling genome- and transcriptome-level data for a wide range of microbial eukaryotes. To ensure the reliability of these resources, accurate taxonomic identification and contamination screening are vital. In this study, we aimed to assess the taxonomic consistency and integrity of the P10K database with a phylogenetic-based approach using Amoebozoa as a case study. Results: Through SSU rDNA/rRNA and COI phylogenetic reconstructions this study confirmed several initial taxonomic identifications provided in the P10K database, resolved ambiguities at higher taxonomic levels, and corrected misassignments among morphologically similar but phylogenetically distant taxa. Moreover, the contamination screening using SSU rDNA/rRNA revealed several amoebozoan data that are contaminated by sequence from other eukaryotic taxa, representing contaminated genomic assemblies. Conclusion: Phylogenetic placement coupled with contamination screening enabled us to distinguish the higher-quality Amoebozoa datasets currently available in the P10K database from those requiring decontamination or additional sequencing before downstream use. These findings serve as a reference for the future use of these data and as a guide for further sequencing efforts aimed at expanding the taxonomic diversity of Amoebozoa represented at the genomic level. By applying a phylogenetic survey to the Amoebozoa data, we present a framework that can be extended to other microbial eukaryote lineages. Addressing imprecise taxonomic identifications and contamination in certain P10K datasets, as well as data reproducibility, will further enhance the value of this unprecedented genomic resource for protists, with significant potential to illuminate the evolution and diversification of eukaryotic life.

Related articles

Related articles are currently not available for this article.