RFPDR: a random forest approach for plant disease resistance protein prediction

Background: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and charact...

Full description

Saved in:
Bibliographic Details
Main Author: Simón, Diego (author)
Other Authors: Borsani, Omar (author), Filippi, Carla V. (author)
Format: article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/20.500.12008/43411
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1868890095587688448
author Simón, Diego
author2 Borsani, Omar
Filippi, Carla V.
author2_role author
author
author_browse Borsani, Omar
Filippi, Carla V.
Simón, Diego
author_facet Simón, Diego
Borsani, Omar
Filippi, Carla V.
author_role author
collection COLIBRI
dc.contributor.none.fl_str_mv Simón Diego, Universidad de la República (Uruguay). Facultad de Ciencias. Centro de Investigaciones Nucleares.
Borsani Omar, Universidad de la República (Uruguay). Facultad de Agronomía.
Filippi Carla V., Universidad de la República (Uruguay). Facultad de Agronomía
dc.creator.none.fl_str_mv Simón, Diego
Borsani, Omar
Filippi, Carla V.
dc.date.none.fl_str_mv 2022
2024-04-11T12:33:24Z
2024-04-11T12:33:24Z
dc.format.none.fl_str_mv 20 h.
application/pdf
dc.identifier.none.fl_str_mv Simón, D, Borsani, O y Filippi, C. "RFPDR: a random forest approach for plant disease resistance protein prediction". PeerJ. [en línea] 2022, 10: e11683. 20 h. DOI: 10.7717/peerj.11683.
2167-8359
https://hdl.handle.net/20.500.12008/43411
10.7717/peerj.11683
dc.language.none.fl_str_mv en
eng
dc.publisher.none.fl_str_mv PeerJ
dc.relation.none.fl_str_mv PeerJ, 2022, 10: e11683.
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
Licencia Creative Commons Atribución (CC - By 4.0)
dc.source.none.fl_str_mv reponame:COLIBRI
instname:Universidad de la República
instacron:Universidad de la República
dc.subject.none.fl_str_mv Disease resistance
Plant immunity
Defense response
Machine learning
Random forest
dc.title.none.fl_str_mv RFPDR: a random forest approach for plant disease resistance protein prediction
dc.type.none.fl_str_mv Artículo
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
description Background: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and characterization. However, the multi-domain nature, high sequence diversity and complex evolutionary history of disease resistance (DR) proteins make their prediction a real challenge. Here we present RFPDR, which pioneers the application of Random Forest (RF) for Plant DR protein prediction. Methods: A recently published collection of experimentally validated DR proteins was used as a positive dataset, while 10x10 nested datasets, ranging from 400-4,000 non-DR proteins, were used as negative datasets. A total of 9,631 features were extracted from each protein sequence, and included in a full dimension (FD) RFPDR model. Sequence selection was performed, to generate a reduced-dimension (RD) RFPDR model. Model performances were evaluated using an 80/20 (training/testing) partition, with 10- cross fold validation, and compared to baseline, sequence-based and state-of-the-art strategies. To gain some insights into the underlying biology, the most discriminatory sequence-based features in the RF classifier were identified. Results and Discussion: RD-RFPDR showed to be sensitive (86.4 ± 4.0%) and specific (96.9 ± 1.5%) for identifying DR proteins, while robust to data imbalance. Its high performance and robustness, added to the fact that RD-RFPDR provides valuable information related to DR proteins underlying properties, make RD-RFPDR an interesting approach for DR protein prediction, complementing the state-of-the-art strategies.
eu_rights_str_mv openAccess
format article
id anni_7335a0b1d71bc985c33b9580cd7ebbb6
identifier_str_mv Simón, D, Borsani, O y Filippi, C. "RFPDR: a random forest approach for plant disease resistance protein prediction". PeerJ. [en línea] 2022, 10: e11683. 20 h. DOI: 10.7717/peerj.11683.
2167-8359
10.7717/peerj.11683
instacron_str Universidad de la República
institution Universidad de la República
instname_str Universidad de la República
language eng
language_invalid_str_mv en
network_acronym_str anni
network_name_str oai-lr-anni
oai_identifier_str oai:colibri.udelar.edu.uy:20.500.12008/43411
publishDate 2022
publishDateSort 2022
publisher.none.fl_str_mv PeerJ
reponame_str COLIBRI
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv Licencia Creative Commons Atribución (CC - By 4.0)
spelling RFPDR: a random forest approach for plant disease resistance protein predictionSimón, DiegoBorsani, OmarFilippi, Carla V.Disease resistancePlant immunityDefense responseMachine learningRandom forestBackground: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and characterization. However, the multi-domain nature, high sequence diversity and complex evolutionary history of disease resistance (DR) proteins make their prediction a real challenge. Here we present RFPDR, which pioneers the application of Random Forest (RF) for Plant DR protein prediction. Methods: A recently published collection of experimentally validated DR proteins was used as a positive dataset, while 10x10 nested datasets, ranging from 400-4,000 non-DR proteins, were used as negative datasets. A total of 9,631 features were extracted from each protein sequence, and included in a full dimension (FD) RFPDR model. Sequence selection was performed, to generate a reduced-dimension (RD) RFPDR model. Model performances were evaluated using an 80/20 (training/testing) partition, with 10- cross fold validation, and compared to baseline, sequence-based and state-of-the-art strategies. To gain some insights into the underlying biology, the most discriminatory sequence-based features in the RF classifier were identified. Results and Discussion: RD-RFPDR showed to be sensitive (86.4 ± 4.0%) and specific (96.9 ± 1.5%) for identifying DR proteins, while robust to data imbalance. Its high performance and robustness, added to the fact that RD-RFPDR provides valuable information related to DR proteins underlying properties, make RD-RFPDR an interesting approach for DR protein prediction, complementing the state-of-the-art strategies.PeerJSimón Diego, Universidad de la República (Uruguay). Facultad de Ciencias. Centro de Investigaciones Nucleares.Borsani Omar, Universidad de la República (Uruguay). Facultad de Agronomía.Filippi Carla V., Universidad de la República (Uruguay). Facultad de Agronomía2024-04-11T12:33:24Z2024-04-11T12:33:24Z2022Artículoinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersion20 h.application/pdfSimón, D, Borsani, O y Filippi, C. "RFPDR: a random forest approach for plant disease resistance protein prediction". PeerJ. [en línea] 2022, 10: e11683. 20 h. DOI: 10.7717/peerj.11683.2167-8359https://hdl.handle.net/20.500.12008/4341110.7717/peerj.11683reponame:COLIBRIinstname:Universidad de la Repúblicainstacron:Universidad de la RepúblicaenengPeerJ, 2022, 10: e11683.Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)info:eu-repo/semantics/openAccessLicencia Creative Commons Atribución (CC - By 4.0)oai:colibri.udelar.edu.uy:20.500.12008/434112026-04-14T10:10:38Z
spellingShingle RFPDR: a random forest approach for plant disease resistance protein prediction
Simón, Diego
Disease resistance
Plant immunity
Defense response
Machine learning
Random forest
status_str publishedVersion
title RFPDR: a random forest approach for plant disease resistance protein prediction
title_full RFPDR: a random forest approach for plant disease resistance protein prediction
title_fullStr RFPDR: a random forest approach for plant disease resistance protein prediction
title_full_unstemmed RFPDR: a random forest approach for plant disease resistance protein prediction
title_short RFPDR: a random forest approach for plant disease resistance protein prediction
title_sort RFPDR: a random forest approach for plant disease resistance protein prediction
topic Disease resistance
Plant immunity
Defense response
Machine learning
Random forest
url https://hdl.handle.net/20.500.12008/43411