CERT
 
Publications CatalogHistorical Documents Research Staff Biographies CMU Heinz School CMU School of Computer Science CERT Statistics US-CERT CyLab
 

Port and Payload

Agnostic Application Indentification

Problem Addressed

In order to properly understand the threats that a network faces, network administrators need access to current and accurate information about the composition of their networks. One solution to this problem is to continuously audit the network by monitoring traffic. However, deep packet examination is expensive; a method that can identify applications without relying on payload is therefore desirable.

This logistical problem is complicated by evasion: both attackers and certain users have an active interest in hiding their network activities. The most notable example of this evasive behavior involves peer-to-peer applications. Several implementations of peerto- peer tools incorporate encryption in order to evade detection [1].

The focus of this effort is application identification on a network without reliance on payload or dependence on the port number. By developing behavioral tests, we can create accurate methods for classifying traffic and inventorying the network’s behavior, even when the payload or port number is untrustworthy. In addition, these techniques may lead to engineering insights about how to construct a network that is unfriendly to attack.

Technical Approach

The premise of this approach is that an application’s design has definite goals that influence its behavior over a network. For example, given that bandwidth is a finite resource and often constrained at the endpoints of a network, an application will not use more bandwidth than is necessary to accomplish its task. Therefore, a chat application (such as AOL Instant Messaging, Internet Relay Chat, or Zephyr Online) will not use a 64 KB packet to send a 10- character message.

For our analyses, we use network flow data. The flows are divided into various flow classes using the classification tree in Figure 1; these classes define what type of information we can reasonably acquire from a flow. For example: in a Short Flow, there is little, if any, payload, but the flag combinations are less ambiguous than in a message or a file transfer.

The classified flows are fed into various behavioral tests that are calibrated to identify existing applications. For example, such a test might determine the number of client attempts to access a service that result in failed connection (that is, a SYN packet without any payload-bearing interaction). Automated connections for services such as Simple Mail Transfer Protocol (SMTP) or BitTorrent are more likely to generate failed connections than a user-driven application such as a web browsing.

Figure 1: Flow Classification Tree

Figure 2 shows this behavior in depth; in this figure, we compare the probability that a high degree of failed connections involves BitTorrent traffic as compared to FTP, HTTP, or SMTP traffic. As this figure shows, BitTorrent is easily distinguished from both FTP and HTTP traffic, while the SMTP traffic is not as easily distinguishable.

Figure 2: Percentage of Failed Connections in Several Common Protocols vs. BitTorrent

2006 Accomplishments

In 2006, CERT developed a small behavioral dictionary for identifying BitTorrent clients using this methodology [1]. The resulting system was tested on netflow data collected on a large (multiple/8) network instrumented by CERT. The resulting technique was highly effective.

Figure 3 shows the results of this decision mechanism in depth. It compares the results of using a multiple-vote classification mechanism for distinguishing BitTorrent from a limited set of applications: email, HTTP, and FTP. Email and HTTP were chosen on the grounds that they were the two most common services crossing the client’s network. FTP was chosen on the assumption that as a protocol primarily concerned with transferring files, it would closely resemble BitTorrent. The tests used involved examining the estimated bandwidth during large file transfers (BitTorrent should presumably be slower than the single server protocols), the number of failed connections (as above), a comparison of message sizes, and the maximum amount of data transferred in a single session.

As this figure shows, the resulting mechanism is capable of distinguishing between BitTorrent and other applications with a high degree of success: 90% of BitTorrent applications are identified by at least two tests, while nearly 100% of the other applications are eliminated after two affirmative tests. This mechanism, which involves no payload and which does not rely on the port number to identify the application, is consequently a feasible method for distinguishing applications.

CERT has developed similar behavioral tests for identifying and distinguishing spam. The behavioral methodology is very attractive for identifying spam sources as such efforts involve distinguishing abuse of a service from the service itself, meaning that the port number is no longer a relevant detection mechanism.

Figure 3: Integrated ROC Curve for Identifying BitTorrent

2007 Plans

In 2007, the CERT Network Situational Awareness Group plans to expand this methodology in two ways: we will expand the behavioral “dictionary” by examining and testing other forms of classification mechanisms. The methods used so far in this work are used to identify file-transfer protocols (e.g., BitTorrent, FTP, SMTP, and HTTP). Other protocols (e.g., AIM and RealVideo) will not be as easily identified using the current body of tests. In addition, as more tests are developed they will naturally overlap and contradict each other; consequently a more sophisticated decision-making mechanism will be required to identify applications. Another venue for research is examination of the impact that these behavioral filtering mechanisms will have on application design. For example, identifying and filtering attributes that are part of the BitTorrent protocol will naturally encourage developers to modify the protocol in order to evade detection. It is conceivable that certain behaviors may be constrained to the point that the protocol is no longer attractive for its target audience. This is a particularly relevant question with spam, where a definite profit motive can be attached to sending spam emails.

References

[1] TorrentFreak Weblog. Encrypting BitTorrent to take out Traffic Shapers. http://torrentfreak.com/encrypting-BitTorrent-to-take-out-traffic-shapers (2006).

[2] Collins, M. & Reiter, M. “Finding Peer-To-Peer File-sharing Using Coarse Network Behaviors.” Proceedings of the 2006 European Symposium on Research in Computer Security (ESORICS) Conference. Hamburg, Germany, Sept. 18-20, 2006. Lecture Notes in Computer Science 4189, Berlin/Heidelberg, Germany: Springer, 2006. http://www.springerlink.com/content/u265q75n3568.


Disclaimers and copyright information

Last updated May 5, 2007