An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability

Background: Trypanosoma cruzi, the protozoan causing Chagas disease, is responsible for a neglected tropical disease affecting millions in Latin America. Its genome contains rapidly evolving multigene families, such as mucins (TcMUC), trans-sialidases (TS), and mucin-associated surface proteins (MAS...

Descripció completa

Guardat en:
Dades bibliogràfiques
Autor principal: Cepeda Dean, Aldana Alexandra (author)
Altres autors: Berná, Luisa (author), Robello Porto, Carlos (author), Buscaglia, Carlos A. (author), Balouz, Virginia (author)
Format: article
Idioma:anglès
Publicat: 2025
Matèries:
Accés en línia:https://hdl.handle.net/20.500.12008/53922
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
_version_ 1868890230315024384
author Cepeda Dean, Aldana Alexandra
author2 Berná, Luisa
Robello Porto, Carlos
Buscaglia, Carlos A.
Balouz, Virginia
author2_role author
author
author
author
author_browse Balouz, Virginia
Berná, Luisa
Buscaglia, Carlos A.
Cepeda Dean, Aldana Alexandra
Robello Porto, Carlos
author_facet Cepeda Dean, Aldana Alexandra
Berná, Luisa
Robello Porto, Carlos
Buscaglia, Carlos A.
Balouz, Virginia
author_role author
collection COLIBRI
dc.contributor.none.fl_str_mv Cepeda Dean Aldana Alexandra
Berná Luisa, Universidad de la República (Uruguay). Facultad de Ciencias. Instituto de Biología.
Robello Porto Carlos, Universidad de la República (Uruguay). Facultad de Medicina.
Buscaglia Carlos A.
Balouz Virginia
dc.creator.none.fl_str_mv Cepeda Dean, Aldana Alexandra
Berná, Luisa
Robello Porto, Carlos
Buscaglia, Carlos A.
Balouz, Virginia
dc.date.none.fl_str_mv 2025
2026-03-18T14:16:33Z
2026-03-18T14:16:33Z
dc.format.none.fl_str_mv 17 h
application/pdf
dc.identifier.none.fl_str_mv Cepeda Dean, A, Berná, L, Robello Porto, C [y otros autores]. "An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability". BMC Genomics. [en línea] 2025, 26: 194. 17 h. DOI: 10.1186/s12864-025-11384-5
1471-2164
https://hdl.handle.net/20.500.12008/53922
10.1186/s12864-025-11384-5
dc.language.none.fl_str_mv en
eng
dc.publisher.none.fl_str_mv Springer
dc.relation.none.fl_str_mv BMC Genomics, 2025, 26: 194
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
dc.source.none.fl_str_mv reponame:COLIBRI
instname:Universidad de la República
instacron:Universidad de la República
dc.subject.none.fl_str_mv Trypanosoma cruzi
Mucin-associated surface proteins
Hidden Markov Models
Molecular signatures
Genome annotation
dc.title.none.fl_str_mv An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
dc.type.none.fl_str_mv Artículo
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
description Background: Trypanosoma cruzi, the protozoan causing Chagas disease, is responsible for a neglected tropical disease affecting millions in Latin America. Its genome contains rapidly evolving multigene families, such as mucins (TcMUC), trans-sialidases (TS), and mucin-associated surface proteins (MASP), which are essential for parasite transmission and disease mechanisms. However, methodological challenges in genome assembly and annotation have limited the characterization of these gene families, particularly MASPs. Results: We developed a bioinformatic pipeline for the automatic identification, characterization, and annotation of MASPs directly from T. cruzi genome assemblies. This algorithm, based on a manually curated MASP database and HMM-based identification of MASP diagnostic motifs, enables the robust classification of these molecules into canonical MASPs, MASP-related molecules (mostly pseudogenes), and chimeric sequences combining MASPs and TcMUC/TS genes. Validation against a rigorously annotated dataset demonstrated high accuracy, and allowed us to reclassify misanotated sequences and, more crucially, to accurately identify previously unrecognized canonical MASPs and MASP chimeras. This algorithm was then used to explore the MASP repertoire in the genomes of 13 parasite strains from different evolutionary lineages, revealing patterns of diversity. For instance, TcI and TcII strains exhibited higher ratios of canonical MASP/MASP-related molecules and a greater abundance of MASP chimeras, suggesting that their genomes are under strong selective pressures towards maintaining a broader panel of full-length MASP genes at the expense of pseudogenes. On the contrary, structural features of canonical MASPs, MASP-related sequences, and MASP-chimeras were largely conserved across parasite genomes. Conclusions This novel pipeline automates the annotation of MASPs, a key surface protein family unique to T. cruzi, improving genome annotation and enabling robust comparative analyses. It provides an essential tool for exploring the evolutionary dynamics of multigene families in T. cruzi and could be extended to other gene families.
eu_rights_str_mv openAccess
format article
id anni_f18dea9c23a4e84456728ea121bfa7b4
identifier_str_mv Cepeda Dean, A, Berná, L, Robello Porto, C [y otros autores]. "An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability". BMC Genomics. [en línea] 2025, 26: 194. 17 h. DOI: 10.1186/s12864-025-11384-5
1471-2164
10.1186/s12864-025-11384-5
instacron_str Universidad de la República
institution Universidad de la República
instname_str Universidad de la República
language eng
language_invalid_str_mv en
network_acronym_str anni
network_name_str oai-lr-anni
oai_identifier_str oai:colibri.udelar.edu.uy:20.500.12008/53922
publishDate 2025
publishDateSort 2025
publisher.none.fl_str_mv Springer
reponame_str COLIBRI
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
spelling An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variabilityCepeda Dean, Aldana AlexandraBerná, LuisaRobello Porto, CarlosBuscaglia, Carlos A.Balouz, VirginiaTrypanosoma cruziMucin-associated surface proteinsHidden Markov ModelsMolecular signaturesGenome annotationBackground: Trypanosoma cruzi, the protozoan causing Chagas disease, is responsible for a neglected tropical disease affecting millions in Latin America. Its genome contains rapidly evolving multigene families, such as mucins (TcMUC), trans-sialidases (TS), and mucin-associated surface proteins (MASP), which are essential for parasite transmission and disease mechanisms. However, methodological challenges in genome assembly and annotation have limited the characterization of these gene families, particularly MASPs. Results: We developed a bioinformatic pipeline for the automatic identification, characterization, and annotation of MASPs directly from T. cruzi genome assemblies. This algorithm, based on a manually curated MASP database and HMM-based identification of MASP diagnostic motifs, enables the robust classification of these molecules into canonical MASPs, MASP-related molecules (mostly pseudogenes), and chimeric sequences combining MASPs and TcMUC/TS genes. Validation against a rigorously annotated dataset demonstrated high accuracy, and allowed us to reclassify misanotated sequences and, more crucially, to accurately identify previously unrecognized canonical MASPs and MASP chimeras. This algorithm was then used to explore the MASP repertoire in the genomes of 13 parasite strains from different evolutionary lineages, revealing patterns of diversity. For instance, TcI and TcII strains exhibited higher ratios of canonical MASP/MASP-related molecules and a greater abundance of MASP chimeras, suggesting that their genomes are under strong selective pressures towards maintaining a broader panel of full-length MASP genes at the expense of pseudogenes. On the contrary, structural features of canonical MASPs, MASP-related sequences, and MASP-chimeras were largely conserved across parasite genomes. Conclusions This novel pipeline automates the annotation of MASPs, a key surface protein family unique to T. cruzi, improving genome annotation and enabling robust comparative analyses. It provides an essential tool for exploring the evolutionary dynamics of multigene families in T. cruzi and could be extended to other gene families.SpringerCepeda Dean Aldana AlexandraBerná Luisa, Universidad de la República (Uruguay). Facultad de Ciencias. Instituto de Biología.Robello Porto Carlos, Universidad de la República (Uruguay). Facultad de Medicina.Buscaglia Carlos A.Balouz Virginia2026-03-18T14:16:33Z2026-03-18T14:16:33Z2025Artículoinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersion17 happlication/pdfCepeda Dean, A, Berná, L, Robello Porto, C [y otros autores]. "An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability". BMC Genomics. [en línea] 2025, 26: 194. 17 h. DOI: 10.1186/s12864-025-11384-51471-2164https://hdl.handle.net/20.500.12008/5392210.1186/s12864-025-11384-5reponame:COLIBRIinstname:Universidad de la Repúblicainstacron:Universidad de la RepúblicaenengBMC Genomics, 2025, 26: 194Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)info:eu-repo/semantics/openAccessLicencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)oai:colibri.udelar.edu.uy:20.500.12008/539222026-04-14T10:10:59Z
spellingShingle An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
Cepeda Dean, Aldana Alexandra
Trypanosoma cruzi
Mucin-associated surface proteins
Hidden Markov Models
Molecular signatures
Genome annotation
status_str publishedVersion
title An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
title_full An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
title_fullStr An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
title_full_unstemmed An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
title_short An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
title_sort An algorithm for annotation and classification of T. cruzi MASP sequences: towards a better understanding of the parasite genetic variability
topic Trypanosoma cruzi
Mucin-associated surface proteins
Hidden Markov Models
Molecular signatures
Genome annotation
url https://hdl.handle.net/20.500.12008/53922