Enhancing web application attack detection using machine learning
Despite all effort of the security community, for example initiatives as the OWASP Top 10, it is a known fact that web applications are permanently being exposed to attacks that exploit their vulnerabilities. Some web applications vulnerabilities can only be discovered as a result of a process of tr...
Saved in:
| Main Author: | |
|---|---|
| Format: | masterThesis |
| Language: | English |
| Published: |
2019
|
| Online Access: | https://hdl.handle.net/20.500.12008/29278 |
| Tags: |
No Tags, Be the first to tag this record!
|
| Summary: | Despite all effort of the security community, for example initiatives as the OWASP Top 10, it is a known fact that web applications are permanently being exposed to attacks that exploit their vulnerabilities. Some web applications vulnerabilities can only be discovered as a result of a process of trial and error performed by an attacker. The identification and determination of a user’s behavior using attack detection techniques become crucial, these techniques assist in aspects such as preventing attackers to identify/verify successfully the existence of vulnerabilities in applications and to minimize the number of false positives (non-malicious activity identified as such). A technological alternative for performing real-time attack analysis is the use of a Web Application Firewall (WAF), systems that intercepts and inspects all traffic between the web server and its clients, searching for attacks in the communication’s content. Most WAF works by using a set of statics rules defined to identify attacks. In this thesis, we analyze the use of machine learning techniques to enhance web applications attack detection in MODSECURITY, an open source WAF that has became a de facto standard implementation. We first propose a characterization of the problem by defining different scenarios depending on whether we have application’ specific or generic data, as well as, valid and/or attack traffic available for training. We also analyze existing dataset to use in this context and we have created our own dataset by capturing real traffic to a real life application. We finally present two supervised machine learning solutions. The first is a classic discrimination approach between two classes (valid traffic and attacks). The second is a one-class classification solution for a more realistic scenario when only valid data is available. In the one-class classification approach it is assumed that one of the classes can be properly modeled using data from the training set (in our case the valid traffic) while the other class (in our problem attacks) can not be modeled by total or partial lack of training samples. We present results using both approaches and compare them with MODSECURITY configured with the OWASP Core Rule Set out of the box, which is the most widely deployed set of rules. |
|---|