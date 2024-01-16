In a recent update, a comprehensive dataset that incorporates genomic, transcript, and protein data has been released. The dataset is meticulously organized into various directories, presenting both the complete collection and segmented sections divided by logical categories for user-friendly access. This enhancement is a part of the ongoing initiative to enrich the dataset's detail and utility for research and analysis.
NCBI's Eukaryotic Genome Annotation Pipeline
Underlining the update, the National Center for Biotechnology Information's (NCBI) eukaryotic genome annotation pipeline has added new annotations for an additional 33 species. This represents a significant expansion to the dataset, offering scientists and researchers a broader scope for genomic studies.
Revamping the Naming Convention
Another notable change is the revamping of the naming convention of the files within the dataset. Precisely, files that were previously known as *.nonredundant_protein* have been rechristened to reflect more accurately that they pertain to the prokaryote WP protein dataset. The updated file names follow a structured format, facilitating an efficient search and retrieval system.
RefSeq Protein Data and SUCNR1
The RefSeq protein data naming conventions have been revised as well. Particular attention is drawn to the SUCNR1 protein, formerly named G protein-coupled receptor 91 (GPR91). SUCNR1, a receptor activated by succinate—a metabolite in the citric acid cycle—has been found to have varying effects on different physiological and pathological functions across diverse cell types. Researchers interested in a comprehensive set of RefSeq proteins across all taxa are advised to use the *complete.protein.* files, ensuring access to the entire spectrum of available protein data.