Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , , |
| Format: | article |
| Language: | English |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://hdl.handle.net/20.500.12008/39141 |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1868890219101552640 |
|---|---|
| author | Pazos Obregón, Flavio |
| author2 | Silvera, Diego Cantera, Rafael Yankilevich, Patricio Guerberoff, Gustavo Soto, Pablo |
| author2_role | author author author author author |
| author_browse | Cantera, Rafael Guerberoff, Gustavo Pazos Obregón, Flavio Silvera, Diego Soto, Pablo Yankilevich, Patricio |
| author_facet | Pazos Obregón, Flavio Silvera, Diego Cantera, Rafael Yankilevich, Patricio Guerberoff, Gustavo Soto, Pablo |
| author_role | author |
| collection | COLIBRI |
| dc.contributor.none.fl_str_mv | Pazos Obregón Flavio, Universidad de la República (Uruguay). Facultad de Ciencias. Instituto de Biología. Silvera Diego, IIBCE Cantera Rafael, IIBCE Yankilevich Patricio Guerberoff Gustavo, Universidad de la República (Uruguay). Facultad de Ingeniería. Soto Pablo, IIBCE |
| dc.creator.none.fl_str_mv | Pazos Obregón, Flavio Silvera, Diego Cantera, Rafael Yankilevich, Patricio Guerberoff, Gustavo Soto, Pablo |
| dc.date.none.fl_str_mv | 2022 2023-08-10T12:24:40Z 2023-08-10T12:24:40Z |
| dc.format.none.fl_str_mv | 11 h. application/pdf |
| dc.identifier.none.fl_str_mv | Pazos Obregón, F, Silvera, D, Cantera, R, [y otros autores]. "Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning". Scientific Reports. [en línea] 2022, 12: 11655. 11 h. DOI: 10.1038/s41598-022-15329-w 2045-2322 https://hdl.handle.net/20.500.12008/39141 10.1038/s41598-022-15329-w |
| dc.language.none.fl_str_mv | en_US eng |
| dc.publisher.none.fl_str_mv | Springer Nature |
| dc.relation.none.fl_str_mv | Scientific Reports, 2022, 12: 11655. |
| dc.rights.none.fl_str_mv | info:eu-repo/semantics/openAccess Licencia Creative Commons Atribución (CC - By 4.0) |
| dc.source.none.fl_str_mv | reponame:COLIBRI instname:Universidad de la República instacron:Universidad de la República |
| dc.subject.none.fl_str_mv | Bioinformatics Comparative genomics Machine learning Protein function predictions |
| dc.title.none.fl_str_mv | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning |
| dc.type.none.fl_str_mv | Artículo info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
| description | The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene’s function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function. |
| eu_rights_str_mv | openAccess |
| format | article |
| id | anni_e6010287c0fbe0fa11fc718a10a6ff69 |
| identifier_str_mv | Pazos Obregón, F, Silvera, D, Cantera, R, [y otros autores]. "Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning". Scientific Reports. [en línea] 2022, 12: 11655. 11 h. DOI: 10.1038/s41598-022-15329-w 2045-2322 10.1038/s41598-022-15329-w |
| instacron_str | Universidad de la República |
| institution | Universidad de la República |
| instname_str | Universidad de la República |
| language | eng |
| language_invalid_str_mv | en_US |
| network_acronym_str | anni |
| network_name_str | oai-lr-anni |
| oai_identifier_str | oai:colibri.udelar.edu.uy:20.500.12008/39141 |
| publishDate | 2022 |
| publishDateSort | 2022 |
| publisher.none.fl_str_mv | Springer Nature |
| reponame_str | COLIBRI |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | Licencia Creative Commons Atribución (CC - By 4.0) |
| spelling | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learningPazos Obregón, FlavioSilvera, DiegoCantera, RafaelYankilevich, PatricioGuerberoff, GustavoSoto, PabloBioinformaticsComparative genomicsMachine learningProtein function predictionsThe function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene’s function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.ANII: FSDA_1_2017_1_14242Springer NaturePazos Obregón Flavio, Universidad de la República (Uruguay). Facultad de Ciencias. Instituto de Biología.Silvera Diego, IIBCECantera Rafael, IIBCEYankilevich PatricioGuerberoff Gustavo, Universidad de la República (Uruguay). Facultad de Ingeniería.Soto Pablo, IIBCE2023-08-10T12:24:40Z2023-08-10T12:24:40Z2022Artículoinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersion11 h.application/pdfPazos Obregón, F, Silvera, D, Cantera, R, [y otros autores]. "Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning". Scientific Reports. [en línea] 2022, 12: 11655. 11 h. DOI: 10.1038/s41598-022-15329-w2045-2322https://hdl.handle.net/20.500.12008/3914110.1038/s41598-022-15329-wreponame:COLIBRIinstname:Universidad de la Repúblicainstacron:Universidad de la Repúblicaen_USengScientific Reports, 2022, 12: 11655.Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)info:eu-repo/semantics/openAccessLicencia Creative Commons Atribución (CC - By 4.0)oai:colibri.udelar.edu.uy:20.500.12008/391412026-04-14T10:10:12Z |
| spellingShingle | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning Pazos Obregón, Flavio Bioinformatics Comparative genomics Machine learning Protein function predictions |
| status_str | publishedVersion |
| title | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning |
| title_full | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning |
| title_fullStr | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning |
| title_full_unstemmed | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning |
| title_short | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning |
| title_sort | Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning |
| topic | Bioinformatics Comparative genomics Machine learning Protein function predictions |
| url | https://hdl.handle.net/20.500.12008/39141 |