Welcome to SP.A.D.A. (SPark-based Anomaly Detection Ace) Project.
SPADA is a fast network anomaly detection system, which efficiently detects anomalous activities in high-speed networks. In order to provide a near real-time response, we exploit the computational capabilities of a Big Data Analytic framework (i.e. Apache Spark).
This page contains the results of SPADA applied to MAWI Archive traffic traces. SPADA is a flow-features based anomaly detector designed to detect various kinds of network anomalies (i.e. Net Scan, Port Scan, and DoS).
About the Algorithm
An IP address is considered as a source of an anomalous activity if
The most important contribution to detect an anomaly is the ratio between the number of flows generated by the IP address and the number of those received by the same IP address. Then, SPADA adds or subtracts to this contribution other quantities based on:
- Packets per flow average number (weighted by α): if α > 0 this contribution helps to detect scanning anomalies that use low packets per flow; if α < 0 this contribution helps not to detect false DoS anomalies that use low packets per flow.
- Bytes per packet average number (weighted by γ): if γ > 0 this contribution helps not to detect false scanning anomalies that use packets with a big payload; if γ < 0 this contribution helps to detect DoS anomalies that use packets with a big payload.
- Ratio between the number of flows generated by the IP address and number of contacted IP addresses (weighted by β): this contribution helps not to detect false network scan anomalies.
SPADA compares this value to a threshold every 30 seconds. Alpha, beta, gamma and the threshold are the algorithm parameters. SPADA does not work on destination addresses of anomalies yet and on ICMP-based anomaly neither, due to the filtering applied to the ICMP-packets. Parameters tuning and performance evaluation are obtained comparing our results with MAWILab anomaly reports.
In order to evaluate SPADA's performance we compare it with MAWILab detectors. MAWILab processes MAWI traces and annotates the anomalies detected in two different files: anomalous/suspicious report and notice report. MAWILab combines the results of four different detection algorithms: alerting just one of these detectors an IP address activity is annotated in the notice report; alerting more than one detector it is annotated in the anomalous/suspicious report. More detalis at MAWILab Documentation. We use both anomalous/suspicious and notice reports in order to compare SPADA with each MAWILab detector.
We perform three comparisons:
- Firstly, we consider the results of MAWILab as ground truth. This comparison does not evaluate properly SPADA performance since in the MAWILab reports there are also anomalies of other taxonomies. We define this comparison as "raw".
- Secondly, we compare only the anomalies with Port Scan, Net Scan or DoS (except small and ICMP-based) taxonomies, which are those SPADA has been designed for. In notice reports, we consider only the rows with "attack" heuristic. Those with "traffic" heuristic are normal communications. We define this comparison as "filtered".
- Finally, after performing the previous comparison, we apply the following rule to false positives: if the address contacts more than 20 different IP addresses in the same subnet, it is considered actually anomalous and then moved into true positives (these were also manually inspected for further verification). We define this comparison as "post-processed".
The following boxplots show SPADA performance measured on 100 traffic traces with different values of the parameters. Different set of parameters maximize recall, or precision, or both.
Last 7/30 days SPADA Performance.
For each traffic trace, there are:
- The link to the MAWI traffic trace.
- The set of IP addresses that are sources of anomalous activity according to SPADA, annotated with the corresponding taxonomy according to MAWILab. When the anomaly is detected by our post-processing rule explained above but isn't in any MAWILab report (or at least, there isn't with Net Scan, Port Scan or DoS taxonomy), the anomaly is annotated with "our_ntsc_signature" taxonomy. On the other hand, if the anomaly is not detected by our post-processing rule explained above and isn't in any MAWILab report (or at least, isn't annotated with Net Scan, Port Scan or DoS taxonomy), the anomaly is annotated with "false_positive" taxonomy.
- The unfiltered MAWILab anomaly and notice reports merged into a single CSV file (not always available). When none of them is available, we cannot make any comparison and therefore we only provide our anomalies without the performance report. When only the MAWILab anomalous/suspicious report is available (and the notice report is not), we still make the three comparisons, but the performances cannot be properly evaluated. There is a special label to distinguish which of the two reports is the anomaly.
- For all of the three types of comparisons performed, we provide the performance report in terms of: number of true positives, number of false positives, number of false negatives, precision and recall.