Advanced passive operating system fingerprinting using machine learning and deep learning
Journal article, Peer reviewed
Accepted version
Permanent lenke
https://hdl.handle.net/10642/10001Utgivelsesdato
2020-09-30Metadata
Vis full innførselSamlinger
Originalversjon
Hagos, Løland MV, Yazidi, Kure, Engelstad. Advanced passive operating system fingerprinting using machine learning and deep learning. Computer Communications and Networks. 2020:1-11 https://doi.org/10.1109/ICCCN49398.2020.9209694Sammendrag
Securing and managing large, complex enterprise network infrastructure requires capturing and analyzing network traffic traces in real-time. An accurate passive Operating System (OS) fingerprinting plays a critical role in effective network management and cybersecurity protection. Passive fingerprinting doesn't send probes that introduce extra load to the network and hence it has a clear advantage over active fingerprinting since it also reduces the risk of triggering false alarms. This paper proposes and evaluates an advanced classification approach to passive OS fingerprinting by leveraging state-of-the-art classical machine learning and deep learning techniques. Our controlled experiments on benchmark data, emulated and realistic traffic is performed using two approaches. Through an Oracle-based machine learning approach, we found that the underlying TCP variant is an important feature for predicting the remote OS. Based on this observation, we develop a sophisticated tool for OS fingerprinting that first predicts the TCP flavor using passive traffic traces and then uses this prediction as an input feature for another machine learning algorithm for predicting the remote OS from passive measurements. This paper takes the passive fingerprinting problem one step further by introducing the underlying predicted TCP variant as a distinguishing feature. In terms of accuracy, we empirically demonstrate that accurately predicting the TCP variant has the potential to boost the evaluation performance from 84% to 94% on average across all our validation scenarios and across different types of traffic sources. We also demonstrate a practical example of this potential, by increasing the performance to 91.3% on average using a tool for TCP variant prediction in an emulated setting. To the best of our knowledge, this is the first study that explores the potential for using the knowledge of the TCP variant to significantly boost the accuracy of passive OS fingerprinting.