Botnet paper in NDSS

Saturday, March 1st, 2008

Another paper from this year’s NDSS is discussed in blogs. It looks like academic conferences get attention from industry and writers. On the other hand, these researchers have released their tool for anybody to try out. This rarely happens but this is definitely a good thing. The software integrates into existing IDS setup.

Can botnets be beaten?

New Internet attacks

Saturday, March 1st, 2008

It looks like DNS infrastructure is the next threat for Internet. New attacks that leverage DNS are disclosed at enormous pace. Here is an NDSS paper titled

Corrupted DNS Resolution Paths: The Rise of a Malicious Resolution Authority

Paper reading – dataflow attacks

Sunday, February 24th, 2008

Application security is changing its focus after buffer overflow attack problem was solved. Indeed, most up-to-date operating systems have a defense mechanism such as address space randomization in Red Hat Linux and non-executable pages in Windows Vista. A typical buffer overflow attack aims at hijacking control of an application. On the other hand, there are dataflow attacks that do not overwrite any control-sensitive data structures. Instead, they modify application’s state so that it continues its normal execution but in a new environment. For example, if the goal of dataflow attack is to update a variable responsible for user authentication status then the program will think that the attacker has logged in successfully.

The claim is that these attacks have not been seen in wild but – how do you know? There are no practical method to detect such attacks. The papers that I am reviewing in this post have a high overhead. Buffer overflow attacks taught us an important lesson: a research prototype with a high overhead will not make it to the market. You have to improve its performance, really. As a result, the proposed tools for signature generation, program rollback, etc. are in the research phase whereas low overhead attack detection systems are in production.

  • S. Chen, J. Xu, E. C. Sezer, P. Gauriar, and R. K. Iyer. Non-Control-Data Attacks Are Realistic Threats.
  • S. Bhatkar, A. Chaturvedi, R. Sekar. Dataflow Anomaly Detection.
  • M. Castro, M. Costa, T. Harris. Securing software by enforcing data-flow integrity.

The first paper is a visionary explanation of looming threats. It gives a number of examples of dataflow attacks:

  1. Format String Attack against User Identity Data
  2. Heap Corruption Attacks against Configuration Data – this attack overwrites the name of a utility defined in a config file. The program invokes that utility at a certain time. Modifying the name allows to execute an arbitrary program.
  3. Stack Buffer Overflow Attack against User Input Data – very nice example! A popular HTTP daemon ghttpd that we have also used in our research is attack to overwrite a stack value that gets loaded into a register in function epilog.
  4. Integer Overflow Attack against Decision-Making Data

Sekar et al. solve dataflow attack problem using behavior models. In a few words, they first learn which arguments are used in syscalls during normal execution. When program uses unexpected arguments an attack is detected. For example, if the program was using utility called /usr/bin/login and out of nothing it tries to execute /bin/sh then this is certainly an attack.

I did not realize the power of learning methods until reading this paper.

The third paper aims at detecting dataflow attacks using static analysis methods. As always, static analysis is showing its limitations. At compile time, each memory location is simulated and source code statements accessing each location are recorded. That is, the program builds a map memory location -> instruction addresses. At run-time it assures that this relationship holds. If it does not then this is an attack. However, this approach is inferior to the learning approach as the former does not detect the register attack when an overwritten register gets popped off the stack.

Research library update

Saturday, February 23rd, 2008

I have realized while reading new research papers that those papers reviewed last year are very appropriate. The reviews were posted on a web 2.0 communications site NearTime which started asking for a pay. I had archived the content and my account got deleted soon after that. I have restored my reviews, this time in a public free library CiteULike.

It is quite interesting how digitizing information affects people’s lives. When I was a graduate student I was reading a lot of papers. I printed them, then scribed notes using a pen. When I was relocating, I left that beautiful collection in the lab. Could I have taken a hundred of papers with me? I doubt I could. In a few words, those reviews are lost. Here is an overview of that paper-based library. The papers were categorized using a number of tags.

The conclusion is that digitizing thoughts, that is blogging is so essential for a researcher. Build digital libraries rather than paper-based.

Web users – beware

Monday, February 18th, 2008

A followup on a recent post on DNS binding attacks that are able to infect an unsuspecting user watching a Flash ad. Here is a Google paper describing other threats.

InformationWeek

Collection of security and dependability papers

Sunday, February 17th, 2008

I have found a very comprehensive collection of security and dependability papers at Ohio State. I decided to use Exhibit to represent this collection. Here is the result. I am going to add papers that I have read. Also, it is an interesting direction on how to merge different similar services together, in this case Exhibit and CiteUlike. Basically, the exhibit is a reference desk, whereas CiteUlike is a social framework for interchanging opinions.

I guess D. E. Knuth description of differences between humanities and engineering libraries is appropriate. A library has two main components mentioned above. The difference is in the relative position of these components. Working desks around the reference desk or vice versa. A computer scientist works with a large number of recent articles, whereas people in humanities work with a few reliable books. In order to find out book’s worthiness you have to read reviews. This is when you need CiteUlike which has a lot of reviews. A computer scientist needs to find relevant papers in a given area of concentration, for example Host Security. This is when Exhibit is really useful.

Knuth’s theory matches perfectly with real life. CiteUlike is popular among people in traditional science such as Math, Chemistry. It does have a few Computer Science works but this area is not most active in CiteUlike. On the other hand, Exhibit is very convenient for a computer scientist.

Paper reading – Application security

Saturday, February 16th, 2008
  • W. Cui, M. Peinado, H. J. Wang, M. E. Locasto. ShieldGen: Automatic Data Patch Generation for Unknown Vulnerabilities with Informed Probing.
  • Y. Younan, D. Pozza, F. Piessens, W. Joosen. Extended protection against stack smashing attacks without performance loss.
  • H. Wang, S. Jha, V. Ganapathy. NetSpy: Automatic Generation of Spyware Signatures for NIDS.
  • H. Chen, F. Hsu, J. Li T. Ristenpart Z. Su. Back to the Future: A Framework for Automatic Malware Removal and System Repair.

The first two papers discuss stand-alone binaries whereas the remaining papers describe how to deal with malicious browser plugins.

W. Cui et al. discuss how to build a signature using the knowledge of protocol format. The idea is that given an attack instance, it is useful to generate its variations and see if this is an attack as well. Protocol knowledge enables to experiment with individual fields. After enough probes are generated, a generalizing signature is created. The prerequisite of this work is protocol description created either manually or automatically. I guess this is an interesting application for the CMU work on automatic protocol format generation.

The buffer overflow attack paper tries to prevent attacks using a separated stack. However, there was previous work on separating data and control stacks. Finally, ASR is a generalization of these two approaches. Either of them has low overhead.

The malware papers are following the industry standard, I guess. That is, there have been an huge number of anti-malware programs out there. Do these papers improve any of those programs? The automatic signature generation paper takes a different approach toward signature generation. It records packets when system is working normally, then if it finds a difference this means an attack has been detected. Instead of using just one attack instance as a signature, it tries to build a generalized signature. To achieve this, it takes a number of samples and applies LCS algorithm to find commonalities. For example, if malware logs URLs that user has entered, its packets would look as follows:

GET http://track.malware.com/index.php?url=http://google.com

That is, the user is trying to load Google and malware captured this. If the user loads another page the malware will issue a packet with another url parameter. The generalization algorithm will replace the user input with a wildcard, thus generating final signature:

GET http://track.malware.com/index.php?url=*

I am wondering if our protection tool could supply a URL consisting of a wildcard to the malware as if the user has entered this URL. After that, the malware packet will have this wildcard character in place of user input. Thus, we will get wildcards in proper places which gives us a generalized signature immediately.

Paper reading: web vulnerabilities

Friday, February 15th, 2008
  • D. Balzarotti, M. Cova, V. V. Felmetsger, and G. Vigna. Multi-Module Vulnerability Analysis of Web-based Applications.
  • A. Moshchuk, T. Bragin, D. Deville, S. D. Gribble, and H. M. Levy. SpyProxy: Execution-based Detection of Malicious Web Content.
  • S. Chen, D. Ross, Y.-M. Wang. An Analysis of Browser Domain-Isolation Bugs and A Light-Weight Transparent Defense Mechanism.

Papers of interest not just for researchers but for Internet users in general. Similar to DNS binding attacks discussed in the previous post, there are pitfalls that unsuspecting people are likely to get trapped into.

It looks like Internet is becoming a dangerous place to hang out at. Years ago, when applications were distributed using CDs and floppies there were a number of safety issues like viruses. However, anti-virus software was available. If you used it you were on the safe side. The threats were more obvious, though. For example, if a program tried to format your hard drive that was a virus for sure.

The goal of today’s malware is information stealing. Because of that, modern viruses are called spyware. DNS rebinding tries to access documents on your intranet, while a domain-isolation attack tries to find out your private information displayed in the other frame of the browser. It is often impossible to tell if the information is leaking or it is the user typing the password trying to log in to a legitimate site. The protection techniques are different. Filtering out viruses was signature-based, whereas it is impossible to generate a signature for any password. Thus, behavior-based methods are being used. In practice, the system implementing these techniques becomes paranoid which makes it a tough sell. It looks like most vendors decided to play nice with customers leaving them unprotected against these new threats.

What scares me though is low level of public awareness of these new attacks. Without reading these high-tech papers written for academia researchers I would unlikely become aware of these new threats either.

The first two papers improve on the previous work that the authors have published last year. However, these papers elaborate on how malware behaves. It often exploits vulnerabilities in additional software such as image rendering library rather than in browser itself. In this example,
as the image is rendered in the browser, SpyProxy detects an unauthorized creation of ten helper processes. Just think that you can visit a social web site, view a profile and get infected. It is worse than flu.

Another example of vulnerability is a complex Web application such as a content management system. It is possible to skip authentication only to jump to the user private page directly.

The domain isolation attacks exploit tabbed browsing implementation. Imagine a user opening bank’s web page in one frame, a malicious site in another. The latter can execute Javascript in the former’s frame to steal information from that window, for example.

Despite high-tech nature of these threats, there are rules of thumb that will help you stay on the safe side. First, do not waste your time on social sites not to say look at other people’s profiles! Also, do not use tabbed browsing. Open a web site, do what you need to do, close it, then move on to the next. If you found an interesting link on that web page, write it down on a piece of paper. Who said that our society is paper free? Of course, everybody needs those pesky NoteIt stickers.

Paper reading – DNS rebinding and pharming

Wednesday, February 13th, 2008

A number of very interesting papers that describe DNS rebinding and pharming attacks. Surprisingly, these type of attacks have eluded my attention. They look very dangerous, though. In a few words, DNS rebinding allows to get a resource from behind the firewall if a victim just visits a web site hosting a malicious advertisement, not to say (s)he clicks it. That is, the attacker can craft a Flash advertisement, sign a contract with an ad network and off (s)he goes.

  • C. Jackson, A. Barth, A. Bortz, W. Shao, D. Boneh. Protecting Browsers from DNS Rebinding Attacks.
  • C. K. Karlof, U. Shankar, D. Tygar, D. Wagner. Dynamic pharming attacks and the locked same-origin policies for web browsers.

Reading papers from Carnegie-Mellon

Monday, February 11th, 2008
  • J. Caballero, H. Yin, Z. Liang, D. Song. Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis.
  • H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda. Panorama: Capturing System-wide Information Flow for Malware Detection and Analysis.
  • D. Brumley, J. Caballero, Z. Liang, J. Newsome, D. Song. Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation.
  • J. Caballero, S. Venkataraman, P. Poosankam, M. G. Kang, D. Song, A. Blum. FiG: Automatic Fingerprint Generation.

A number of papers that address the following question: given a significant number of implementations of a network protocol, for example HTTP, DNS, etc., what are the differences between these implementations? Are these implementations complete? How to tell remotely which software is on the other side?

The answers are quite comprehensive, but first I would like to evaluate their practical impact. It is well known that different implementations do not look the same. Even more, there are proprietary versions of the same specification. Discovering this simple truth is not the goal of those papers. What is the goal then? In addition, I see nothing wrong with an implementation skipping certain extensions. A protocol is a lengthy document whereas a piece of software addresses needs of a specific community. Do you expect MiniHTTP to fully implement the protocol? As a protocol evolves, optional extensions are added. Do you expect Apache to adopt those extensions immediately?

There are two parts in protocol analysis. The first is studying individual messages as CMU group does. The other side is trying to build a finite state automaton and this is what Ras Bodik is doing at Berkeley. This latter part looks more important to me.