Identifying the presence of VPN software is crucial for a variety of real-world use cases. Governments may wish to monitor VPN usage for the purpose of identifying cyber criminals or protecting their digital borders, while content providers and ISPs may want to detect VPN usage to enforce license agreements or regional content restrictions.
Various strategies for a VPN detection system for enterprises are possible, with approaches ranging from traditional network traffic analysis to ML-based methods. Among the 47 studies in Table 1, almost all focus on passive detection methods, with the exception of two papers that use active probes (i.e., techniques that involve sending probes to acquire information on the VPN software).
VPN Detection System for Enterprises: Safeguarding Corporate Networks
The majority of these studies rely on block-listing of known VPN servers or data center IPs. This approach is insufficient, as VPN providers frequently change their IPs, and users often resort to residential proxies or distributed VPN services. Furthermore, it ignores the fact that many VPN software solutions can also be deployed behind NAT (i.e., in the LAN environment).
Data imbalance is another common challenge in VPN detection systems. In a typical scenario, a target user generates multiple different network flows while not using a VPN and a single flow when a VPN is active on their system. This behavior skews any dataset collected on the topic and, in turn, can lead to inflated and non-representative evaluation metrics. It is therefore essential to address this issue when designing VPN detection solutions for real-world applications.