Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning

The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that...

Full description

Saved in:
Bibliographic Details
Main Author: Pazos Obregón, Flavio (author)
Other Authors: Silvera, Diego (author), Cantera, Rafael (author), Yankilevich, Patricio (author), Guerberoff, Gustavo (author), Soto, Pablo (author)
Format: article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/20.500.12008/39141
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1868890219101552640
author Pazos Obregón, Flavio
author2 Silvera, Diego
Cantera, Rafael
Yankilevich, Patricio
Guerberoff, Gustavo
Soto, Pablo
author2_role author
author
author
author
author
author_browse Cantera, Rafael
Guerberoff, Gustavo
Pazos Obregón, Flavio
Silvera, Diego
Soto, Pablo
Yankilevich, Patricio
author_facet Pazos Obregón, Flavio
Silvera, Diego
Cantera, Rafael
Yankilevich, Patricio
Guerberoff, Gustavo
Soto, Pablo
author_role author
collection COLIBRI
dc.contributor.none.fl_str_mv Pazos Obregón Flavio, Universidad de la República (Uruguay). Facultad de Ciencias. Instituto de Biología.
Silvera Diego, IIBCE
Cantera Rafael, IIBCE
Yankilevich Patricio
Guerberoff Gustavo, Universidad de la República (Uruguay). Facultad de Ingeniería.
Soto Pablo, IIBCE
dc.creator.none.fl_str_mv Pazos Obregón, Flavio
Silvera, Diego
Cantera, Rafael
Yankilevich, Patricio
Guerberoff, Gustavo
Soto, Pablo
dc.date.none.fl_str_mv 2022
2023-08-10T12:24:40Z
2023-08-10T12:24:40Z
dc.format.none.fl_str_mv 11 h.
application/pdf
dc.identifier.none.fl_str_mv Pazos Obregón, F, Silvera, D, Cantera, R, [y otros autores]. "Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning". Scientific Reports. [en línea] 2022, 12: 11655. 11 h. DOI: 10.1038/s41598-022-15329-w
2045-2322
https://hdl.handle.net/20.500.12008/39141
10.1038/s41598-022-15329-w
dc.language.none.fl_str_mv en_US
eng
dc.publisher.none.fl_str_mv Springer Nature
dc.relation.none.fl_str_mv Scientific Reports, 2022, 12: 11655.
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
Licencia Creative Commons Atribución (CC - By 4.0)
dc.source.none.fl_str_mv reponame:COLIBRI
instname:Universidad de la República
instacron:Universidad de la República
dc.subject.none.fl_str_mv Bioinformatics
Comparative genomics
Machine learning
Protein function predictions
dc.title.none.fl_str_mv Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
dc.type.none.fl_str_mv Artículo
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
description The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene’s function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.
eu_rights_str_mv openAccess
format article
id anni_e6010287c0fbe0fa11fc718a10a6ff69
identifier_str_mv Pazos Obregón, F, Silvera, D, Cantera, R, [y otros autores]. "Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning". Scientific Reports. [en línea] 2022, 12: 11655. 11 h. DOI: 10.1038/s41598-022-15329-w
2045-2322
10.1038/s41598-022-15329-w
instacron_str Universidad de la República
institution Universidad de la República
instname_str Universidad de la República
language eng
language_invalid_str_mv en_US
network_acronym_str anni
network_name_str oai-lr-anni
oai_identifier_str oai:colibri.udelar.edu.uy:20.500.12008/39141
publishDate 2022
publishDateSort 2022
publisher.none.fl_str_mv Springer Nature
reponame_str COLIBRI
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv Licencia Creative Commons Atribución (CC - By 4.0)
spelling Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learningPazos Obregón, FlavioSilvera, DiegoCantera, RafaelYankilevich, PatricioGuerberoff, GustavoSoto, PabloBioinformaticsComparative genomicsMachine learningProtein function predictionsThe function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene’s function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.ANII: FSDA_1_2017_1_14242Springer NaturePazos Obregón Flavio, Universidad de la República (Uruguay). Facultad de Ciencias. Instituto de Biología.Silvera Diego, IIBCECantera Rafael, IIBCEYankilevich PatricioGuerberoff Gustavo, Universidad de la República (Uruguay). Facultad de Ingeniería.Soto Pablo, IIBCE2023-08-10T12:24:40Z2023-08-10T12:24:40Z2022Artículoinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersion11 h.application/pdfPazos Obregón, F, Silvera, D, Cantera, R, [y otros autores]. "Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning". Scientific Reports. [en línea] 2022, 12: 11655. 11 h. DOI: 10.1038/s41598-022-15329-w2045-2322https://hdl.handle.net/20.500.12008/3914110.1038/s41598-022-15329-wreponame:COLIBRIinstname:Universidad de la Repúblicainstacron:Universidad de la Repúblicaen_USengScientific Reports, 2022, 12: 11655.Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)info:eu-repo/semantics/openAccessLicencia Creative Commons Atribución (CC - By 4.0)oai:colibri.udelar.edu.uy:20.500.12008/391412026-04-14T10:10:12Z
spellingShingle Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
Pazos Obregón, Flavio
Bioinformatics
Comparative genomics
Machine learning
Protein function predictions
status_str publishedVersion
title Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_full Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_fullStr Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_full_unstemmed Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_short Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_sort Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
topic Bioinformatics
Comparative genomics
Machine learning
Protein function predictions
url https://hdl.handle.net/20.500.12008/39141