An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
Background: Trypanosoma cruzi, the protozoan causing Chagas disease, is responsible for a neglected tropical disease affecting millions in Latin America. Its genome contains rapidly evolving multigene families, such as mucins (TcMUC), trans-sialidases (TS), and mucin-associated surface proteins (MAS...
Guardat en:
| Autor principal: | |
|---|---|
| Altres autors: | , , , |
| Format: | article |
| Idioma: | anglès |
| Publicat: |
2025
|
| Matèries: | |
| Accés en línia: | https://hdl.handle.net/20.500.12008/53922 |
| Etiquetes: |
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
| _version_ | 1868890230315024384 |
|---|---|
| author | Cepeda Dean, Aldana Alexandra |
| author2 | Berná, Luisa Robello Porto, Carlos Buscaglia, Carlos A. Balouz, Virginia |
| author2_role | author author author author |
| author_browse | Balouz, Virginia Berná, Luisa Buscaglia, Carlos A. Cepeda Dean, Aldana Alexandra Robello Porto, Carlos |
| author_facet | Cepeda Dean, Aldana Alexandra Berná, Luisa Robello Porto, Carlos Buscaglia, Carlos A. Balouz, Virginia |
| author_role | author |
| collection | COLIBRI |
| dc.contributor.none.fl_str_mv | Cepeda Dean Aldana Alexandra Berná Luisa, Universidad de la República (Uruguay). Facultad de Ciencias. Instituto de Biología. Robello Porto Carlos, Universidad de la República (Uruguay). Facultad de Medicina. Buscaglia Carlos A. Balouz Virginia |
| dc.creator.none.fl_str_mv | Cepeda Dean, Aldana Alexandra Berná, Luisa Robello Porto, Carlos Buscaglia, Carlos A. Balouz, Virginia |
| dc.date.none.fl_str_mv | 2025 2026-03-18T14:16:33Z 2026-03-18T14:16:33Z |
| dc.format.none.fl_str_mv | 17 h application/pdf |
| dc.identifier.none.fl_str_mv | Cepeda Dean, A, Berná, L, Robello Porto, C [y otros autores]. "An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability". BMC Genomics. [en línea] 2025, 26: 194. 17 h. DOI: 10.1186/s12864-025-11384-5 1471-2164 https://hdl.handle.net/20.500.12008/53922 10.1186/s12864-025-11384-5 |
| dc.language.none.fl_str_mv | en eng |
| dc.publisher.none.fl_str_mv | Springer |
| dc.relation.none.fl_str_mv | BMC Genomics, 2025, 26: 194 |
| dc.rights.none.fl_str_mv | info:eu-repo/semantics/openAccess Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) |
| dc.source.none.fl_str_mv | reponame:COLIBRI instname:Universidad de la República instacron:Universidad de la República |
| dc.subject.none.fl_str_mv | Trypanosoma cruzi Mucin-associated surface proteins Hidden Markov Models Molecular signatures Genome annotation |
| dc.title.none.fl_str_mv | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability |
| dc.type.none.fl_str_mv | Artículo info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
| description | Background: Trypanosoma cruzi, the protozoan causing Chagas disease, is responsible for a neglected tropical disease affecting millions in Latin America. Its genome contains rapidly evolving multigene families, such as mucins (TcMUC), trans-sialidases (TS), and mucin-associated surface proteins (MASP), which are essential for parasite transmission and disease mechanisms. However, methodological challenges in genome assembly and annotation have limited the characterization of these gene families, particularly MASPs. Results: We developed a bioinformatic pipeline for the automatic identification, characterization, and annotation of MASPs directly from T. cruzi genome assemblies. This algorithm, based on a manually curated MASP database and HMM-based identification of MASP diagnostic motifs, enables the robust classification of these molecules into canonical MASPs, MASP-related molecules (mostly pseudogenes), and chimeric sequences combining MASPs and TcMUC/TS genes. Validation against a rigorously annotated dataset demonstrated high accuracy, and allowed us to reclassify misanotated sequences and, more crucially, to accurately identify previously unrecognized canonical MASPs and MASP chimeras. This algorithm was then used to explore the MASP repertoire in the genomes of 13 parasite strains from different evolutionary lineages, revealing patterns of diversity. For instance, TcI and TcII strains exhibited higher ratios of canonical MASP/MASP-related molecules and a greater abundance of MASP chimeras, suggesting that their genomes are under strong selective pressures towards maintaining a broader panel of full-length MASP genes at the expense of pseudogenes. On the contrary, structural features of canonical MASPs, MASP-related sequences, and MASP-chimeras were largely conserved across parasite genomes. Conclusions This novel pipeline automates the annotation of MASPs, a key surface protein family unique to T. cruzi, improving genome annotation and enabling robust comparative analyses. It provides an essential tool for exploring the evolutionary dynamics of multigene families in T. cruzi and could be extended to other gene families. |
| eu_rights_str_mv | openAccess |
| format | article |
| id | anni_f18dea9c23a4e84456728ea121bfa7b4 |
| identifier_str_mv | Cepeda Dean, A, Berná, L, Robello Porto, C [y otros autores]. "An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability". BMC Genomics. [en línea] 2025, 26: 194. 17 h. DOI: 10.1186/s12864-025-11384-5 1471-2164 10.1186/s12864-025-11384-5 |
| instacron_str | Universidad de la República |
| institution | Universidad de la República |
| instname_str | Universidad de la República |
| language | eng |
| language_invalid_str_mv | en |
| network_acronym_str | anni |
| network_name_str | oai-lr-anni |
| oai_identifier_str | oai:colibri.udelar.edu.uy:20.500.12008/53922 |
| publishDate | 2025 |
| publishDateSort | 2025 |
| publisher.none.fl_str_mv | Springer |
| reponame_str | COLIBRI |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) |
| spelling | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variabilityCepeda Dean, Aldana AlexandraBerná, LuisaRobello Porto, CarlosBuscaglia, Carlos A.Balouz, VirginiaTrypanosoma cruziMucin-associated surface proteinsHidden Markov ModelsMolecular signaturesGenome annotationBackground: Trypanosoma cruzi, the protozoan causing Chagas disease, is responsible for a neglected tropical disease affecting millions in Latin America. Its genome contains rapidly evolving multigene families, such as mucins (TcMUC), trans-sialidases (TS), and mucin-associated surface proteins (MASP), which are essential for parasite transmission and disease mechanisms. However, methodological challenges in genome assembly and annotation have limited the characterization of these gene families, particularly MASPs. Results: We developed a bioinformatic pipeline for the automatic identification, characterization, and annotation of MASPs directly from T. cruzi genome assemblies. This algorithm, based on a manually curated MASP database and HMM-based identification of MASP diagnostic motifs, enables the robust classification of these molecules into canonical MASPs, MASP-related molecules (mostly pseudogenes), and chimeric sequences combining MASPs and TcMUC/TS genes. Validation against a rigorously annotated dataset demonstrated high accuracy, and allowed us to reclassify misanotated sequences and, more crucially, to accurately identify previously unrecognized canonical MASPs and MASP chimeras. This algorithm was then used to explore the MASP repertoire in the genomes of 13 parasite strains from different evolutionary lineages, revealing patterns of diversity. For instance, TcI and TcII strains exhibited higher ratios of canonical MASP/MASP-related molecules and a greater abundance of MASP chimeras, suggesting that their genomes are under strong selective pressures towards maintaining a broader panel of full-length MASP genes at the expense of pseudogenes. On the contrary, structural features of canonical MASPs, MASP-related sequences, and MASP-chimeras were largely conserved across parasite genomes. Conclusions This novel pipeline automates the annotation of MASPs, a key surface protein family unique to T. cruzi, improving genome annotation and enabling robust comparative analyses. It provides an essential tool for exploring the evolutionary dynamics of multigene families in T. cruzi and could be extended to other gene families.SpringerCepeda Dean Aldana AlexandraBerná Luisa, Universidad de la República (Uruguay). Facultad de Ciencias. Instituto de Biología.Robello Porto Carlos, Universidad de la República (Uruguay). Facultad de Medicina.Buscaglia Carlos A.Balouz Virginia2026-03-18T14:16:33Z2026-03-18T14:16:33Z2025Artículoinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersion17 happlication/pdfCepeda Dean, A, Berná, L, Robello Porto, C [y otros autores]. "An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability". BMC Genomics. [en línea] 2025, 26: 194. 17 h. DOI: 10.1186/s12864-025-11384-51471-2164https://hdl.handle.net/20.500.12008/5392210.1186/s12864-025-11384-5reponame:COLIBRIinstname:Universidad de la Repúblicainstacron:Universidad de la RepúblicaenengBMC Genomics, 2025, 26: 194Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)info:eu-repo/semantics/openAccessLicencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)oai:colibri.udelar.edu.uy:20.500.12008/539222026-04-14T10:10:59Z |
| spellingShingle | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability Cepeda Dean, Aldana Alexandra Trypanosoma cruzi Mucin-associated surface proteins Hidden Markov Models Molecular signatures Genome annotation |
| status_str | publishedVersion |
| title | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability |
| title_full | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability |
| title_fullStr | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability |
| title_full_unstemmed | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability |
| title_short | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability |
| title_sort | An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability |
| topic | Trypanosoma cruzi Mucin-associated surface proteins Hidden Markov Models Molecular signatures Genome annotation |
| url | https://hdl.handle.net/20.500.12008/53922 |