A Machine Learning-based Tool for Passive OS Fingerprinting with TCP Variant as a Novel Feature
Peer reviewed, Journal article
MetadataShow full item record
Original versionD. H. Hagos, A. Yazidi, Ø. Kure and P. E. Engelstad, "A Machine-Learning-Based Tool for Passive OS Fingerprinting With TCP Variant as a Novel Feature," in IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3534-3553, 1 March1, 2021, doi: 10.1109/JIOT.2020.3024293. https://doi.org/10.1109/JIOT.2020.3024293
With the emergence of Internet of Things (IoT), securing and managing large, complex enterprise network infrastructure requires capturing and analyzing network traffic traces in real-time. An accurate passive Operating System (OS) fingerprinting plays a critical role in effective network management and cybersecurity protection. Passive fingerprinting doesn’t send probes that introduce extra load to the network and hence it has a clear advantage over active fingerprinting since it also reduces the risk of triggering false alarms. This paper proposes and evaluates an advanced classification approach to passive OS fingerprinting by leveraging state-of-the-art classical machine learning and deep learning techniques. Our controlled experiments on benchmark data, emulated and realistic traffic is performed using two approaches. Through an Oracle-based machine learning approach, we found that the underlying TCP variant is an important feature for predicting the remote OS. Based on this observation, we develop a sophisticated tool for OS fingerprinting that first predicts the TCP flavor using passive traffic traces and then uses this prediction as an input feature for another machine learning algorithm for predicting the remote OS from passive measurements. This paper takes the passive fingerprinting problem one step further by introducing the underlying predicted TCP variant as a distinguishing feature. In terms of accuracy, we empirically demonstrate that accurately predicting the TCP variant has the potential to boost the evaluation performance from 84% to 94% on average across all our validation scenarios and across different types of traffic sources. We also demonstrate a practical example of this potential, by increasing the performance to 91.3% and 95.22% on average using a tool for loss-based and a combination of loss and delay-based TCP variant prediction in an emulated setting. To the best of our knowledge, this is the first study that explores the potential for using the knowledge of the TCP variant to significantly boost the accuracy of passive OS fingerprinting.