RFPDR: a random forest approach for plant disease resistance protein prediction
Background: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and charact...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , |
| Format: | article |
| Language: | English |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://hdl.handle.net/20.500.12008/43411 |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1868890095587688448 |
|---|---|
| author | Simón, Diego |
| author2 | Borsani, Omar Filippi, Carla V. |
| author2_role | author author |
| author_browse | Borsani, Omar Filippi, Carla V. Simón, Diego |
| author_facet | Simón, Diego Borsani, Omar Filippi, Carla V. |
| author_role | author |
| collection | COLIBRI |
| dc.contributor.none.fl_str_mv | Simón Diego, Universidad de la República (Uruguay). Facultad de Ciencias. Centro de Investigaciones Nucleares. Borsani Omar, Universidad de la República (Uruguay). Facultad de Agronomía. Filippi Carla V., Universidad de la República (Uruguay). Facultad de Agronomía |
| dc.creator.none.fl_str_mv | Simón, Diego Borsani, Omar Filippi, Carla V. |
| dc.date.none.fl_str_mv | 2022 2024-04-11T12:33:24Z 2024-04-11T12:33:24Z |
| dc.format.none.fl_str_mv | 20 h. application/pdf |
| dc.identifier.none.fl_str_mv | Simón, D, Borsani, O y Filippi, C. "RFPDR: a random forest approach for plant disease resistance protein prediction". PeerJ. [en línea] 2022, 10: e11683. 20 h. DOI: 10.7717/peerj.11683. 2167-8359 https://hdl.handle.net/20.500.12008/43411 10.7717/peerj.11683 |
| dc.language.none.fl_str_mv | en eng |
| dc.publisher.none.fl_str_mv | PeerJ |
| dc.relation.none.fl_str_mv | PeerJ, 2022, 10: e11683. |
| dc.rights.none.fl_str_mv | info:eu-repo/semantics/openAccess Licencia Creative Commons Atribución (CC - By 4.0) |
| dc.source.none.fl_str_mv | reponame:COLIBRI instname:Universidad de la República instacron:Universidad de la República |
| dc.subject.none.fl_str_mv | Disease resistance Plant immunity Defense response Machine learning Random forest |
| dc.title.none.fl_str_mv | RFPDR: a random forest approach for plant disease resistance protein prediction |
| dc.type.none.fl_str_mv | Artículo info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
| description | Background: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and characterization. However, the multi-domain nature, high sequence diversity and complex evolutionary history of disease resistance (DR) proteins make their prediction a real challenge. Here we present RFPDR, which pioneers the application of Random Forest (RF) for Plant DR protein prediction. Methods: A recently published collection of experimentally validated DR proteins was used as a positive dataset, while 10x10 nested datasets, ranging from 400-4,000 non-DR proteins, were used as negative datasets. A total of 9,631 features were extracted from each protein sequence, and included in a full dimension (FD) RFPDR model. Sequence selection was performed, to generate a reduced-dimension (RD) RFPDR model. Model performances were evaluated using an 80/20 (training/testing) partition, with 10- cross fold validation, and compared to baseline, sequence-based and state-of-the-art strategies. To gain some insights into the underlying biology, the most discriminatory sequence-based features in the RF classifier were identified. Results and Discussion: RD-RFPDR showed to be sensitive (86.4 ± 4.0%) and specific (96.9 ± 1.5%) for identifying DR proteins, while robust to data imbalance. Its high performance and robustness, added to the fact that RD-RFPDR provides valuable information related to DR proteins underlying properties, make RD-RFPDR an interesting approach for DR protein prediction, complementing the state-of-the-art strategies. |
| eu_rights_str_mv | openAccess |
| format | article |
| id | anni_7335a0b1d71bc985c33b9580cd7ebbb6 |
| identifier_str_mv | Simón, D, Borsani, O y Filippi, C. "RFPDR: a random forest approach for plant disease resistance protein prediction". PeerJ. [en línea] 2022, 10: e11683. 20 h. DOI: 10.7717/peerj.11683. 2167-8359 10.7717/peerj.11683 |
| instacron_str | Universidad de la República |
| institution | Universidad de la República |
| instname_str | Universidad de la República |
| language | eng |
| language_invalid_str_mv | en |
| network_acronym_str | anni |
| network_name_str | oai-lr-anni |
| oai_identifier_str | oai:colibri.udelar.edu.uy:20.500.12008/43411 |
| publishDate | 2022 |
| publishDateSort | 2022 |
| publisher.none.fl_str_mv | PeerJ |
| reponame_str | COLIBRI |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | Licencia Creative Commons Atribución (CC - By 4.0) |
| spelling | RFPDR: a random forest approach for plant disease resistance protein predictionSimón, DiegoBorsani, OmarFilippi, Carla V.Disease resistancePlant immunityDefense responseMachine learningRandom forestBackground: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and characterization. However, the multi-domain nature, high sequence diversity and complex evolutionary history of disease resistance (DR) proteins make their prediction a real challenge. Here we present RFPDR, which pioneers the application of Random Forest (RF) for Plant DR protein prediction. Methods: A recently published collection of experimentally validated DR proteins was used as a positive dataset, while 10x10 nested datasets, ranging from 400-4,000 non-DR proteins, were used as negative datasets. A total of 9,631 features were extracted from each protein sequence, and included in a full dimension (FD) RFPDR model. Sequence selection was performed, to generate a reduced-dimension (RD) RFPDR model. Model performances were evaluated using an 80/20 (training/testing) partition, with 10- cross fold validation, and compared to baseline, sequence-based and state-of-the-art strategies. To gain some insights into the underlying biology, the most discriminatory sequence-based features in the RF classifier were identified. Results and Discussion: RD-RFPDR showed to be sensitive (86.4 ± 4.0%) and specific (96.9 ± 1.5%) for identifying DR proteins, while robust to data imbalance. Its high performance and robustness, added to the fact that RD-RFPDR provides valuable information related to DR proteins underlying properties, make RD-RFPDR an interesting approach for DR protein prediction, complementing the state-of-the-art strategies.PeerJSimón Diego, Universidad de la República (Uruguay). Facultad de Ciencias. Centro de Investigaciones Nucleares.Borsani Omar, Universidad de la República (Uruguay). Facultad de Agronomía.Filippi Carla V., Universidad de la República (Uruguay). Facultad de Agronomía2024-04-11T12:33:24Z2024-04-11T12:33:24Z2022Artículoinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersion20 h.application/pdfSimón, D, Borsani, O y Filippi, C. "RFPDR: a random forest approach for plant disease resistance protein prediction". PeerJ. [en línea] 2022, 10: e11683. 20 h. DOI: 10.7717/peerj.11683.2167-8359https://hdl.handle.net/20.500.12008/4341110.7717/peerj.11683reponame:COLIBRIinstname:Universidad de la Repúblicainstacron:Universidad de la RepúblicaenengPeerJ, 2022, 10: e11683.Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)info:eu-repo/semantics/openAccessLicencia Creative Commons Atribución (CC - By 4.0)oai:colibri.udelar.edu.uy:20.500.12008/434112026-04-14T10:10:38Z |
| spellingShingle | RFPDR: a random forest approach for plant disease resistance protein prediction Simón, Diego Disease resistance Plant immunity Defense response Machine learning Random forest |
| status_str | publishedVersion |
| title | RFPDR: a random forest approach for plant disease resistance protein prediction |
| title_full | RFPDR: a random forest approach for plant disease resistance protein prediction |
| title_fullStr | RFPDR: a random forest approach for plant disease resistance protein prediction |
| title_full_unstemmed | RFPDR: a random forest approach for plant disease resistance protein prediction |
| title_short | RFPDR: a random forest approach for plant disease resistance protein prediction |
| title_sort | RFPDR: a random forest approach for plant disease resistance protein prediction |
| topic | Disease resistance Plant immunity Defense response Machine learning Random forest |
| url | https://hdl.handle.net/20.500.12008/43411 |