A first impression of Google's Skipfish scanner for web applications

According to a Google security blog post by developer Michal Zalewski, Google's new, free Skipfish scanner is designed to be fast and easy to use while incorporating the latest in cutting-edge security logic. Felix 'FX' Lindner examines Skipfish to see how well it compares to other tools used to check web site integrity.
When checking the security of web applications, developers use automated scanners to gain an overview which can then be refined by further manual testing. Depending on the user's requirements and expertise, this may involve minimalist basic tools such as Nikto, or comprehensive commercial software such as Rational AppScan. The main aim of such a scan is to reveal configuration and implementation flaws in the interaction between web servers, application servers, application logics and other components of modern web applications. Typical vulnerabilities detected this way include SQL injection, cross site scripting, cross site request forgery and code injection.

The curtain goes up

Skipfish, released in March, supports all the features one could wish for when doing a generic web page scan: It can handle cookies, it can process authentication details and the values of HTML form variables, and it can even use a single session token to navigate the target page as an authenticated user. One of Skipfish's specialities is to run through all the possible file and directory names in order to detect items like script back-ups the admin has forgotten about, compressed archives of entire web applications, or SSH/subversion configuration files that may have accidentally been left behind. As such items can only be tracked down by trial and error, the scanner combines various known file extensions with all the file names it detects by checking the normal links on the web page.
It also tries out a few hand-picked keywords (probably extracted from Google's search index) as directory and file names. Skipfish is especially noteworthy in this respect because it actually establishes and checks all possible combinations. This means that every keyword will be combined with every file extension and every actual file found on the web server, and that the result will be tested as a file name, a directory name, and as an argument for a HTTP POST request. This approach generates a very large number of  combinations which could prove overwhelming. Thankfully, Skipfish provides predefined dictionaries of varying sizes, allowing users to determine the extent of the request flood generated. However, even the minimal starter dictionary recommended by Zalewski includes 2,007 entries and produces about 42,000 requests for each directory tested.
To gain an impression of Skipfish, we briefly investigated a few typical scenarios: We used a Microsoft Internet Information Server 7.5 under Windows 7 with a ScrewTurn Wiki 3.0.2 for ASP.NET as our typical interactive web application. A Linux-based Apache 2.2.3 server with traditional CGI scripts written in Perl represented the more dated approach. An old HP LaserJet printer with ChaiServer 3.0 was used as a typical cause of false alarms and scanning problems.

Practical use

Skipfish is only available as source code and must be compiled for the target platform. This requires the libidn library for encoding and decoding Internationalised Domain Names (IDN) to be installed – for example, under Ubuntu 9.10 this is easily done by running sudo apt-get install libidn11-dev in a terminal window. After successfully compiling the source code, users select a dictionary from the dictionary subdirectory and copy it to the directory which contains the Skipfish binary as skipfish.wl. Alternatively, the path can also be added as an option. It should be noted that Skipfish will automatically extend this dictionary file, so it is always advisable to use a copy.
./skipfish -o example-log http://www.example.com/ starts the server scan and places a scan report in the example-log directory after the scan is completed. To only check the "blog" subdirectory on the server, enter "./skipfish -I /blog -o blog-report http://www.example.com/blog".
Testing Google Skipfish_Skipfish
Once started, Skipfish provides network statistics, but these are of no specific benefit to users. Testing Google Skipfish_Google_02
Skipfish produces an enormous number of requests which it processes simultaneously at an impressive speed. During the first run in a local Gigabit Ethernet segment against a sufficiently scaled system, it achieved 2796 HTTP requests per second. This immediately pushed the load of one of the scanning system's CPU cores to 100 per cent. Although the Linux TCP/IP stack can handle this load it needs to utilise all available means to do so, as soon becomes evident when looking at the netstat data returned.
Once a scan has been started there is no indication of how long Skipfish will actually take to complete it. In our test, we had to wait for more than four hours for the scanner to complete its check of the neighbouring IIS 7.5 with the ScrewTurn wiki. In the process, the scanner transferred 68GB of data to send 40 million HTTP requests. One of the scanning system's CPU cores was working to full capacity almost all of the time, and Skipfish used 140MB of working memory when combing through the wiki's moderate 1,195 files in 43 directories. At the same time, the IIS used the full capacity of two Intel XEON 2.5GHz cores as well as a further 100MB to 500MB of working memory, producing a log file of 1.5GB in its default configuration.

Interpretation

By way of an experiment, we disabled the dictionary function for checking the Apache web server. However, Skipfish was still allowed to learn new words during the scan, which produced 304 new entries. This reduced the scanning time to 20 minutes, during which 300,000 HTTP requests were made, generating 232MB of network traffic.
Skipfish generally produces a relatively large number of results and saves them in the defined directory as HTML, JavaScript with JSON and raw data files. Users can then view the report in a JavaScript-enabled browser or evaluate the raw data themselves. Unfortunately, the number of false alarms was considerably higher than that produced by tools such as Nikto or the Burp Suite, which we used for comparison. For instance, some regular ASCII text files were interpreted as JSON responses without XSSI (Cross Site Script Inclusion) protection. Skipfish attaches particular importance to well-formed MIME type and character set responses from the server. Every deviation will cause an "increased risk" rating, but only some of them actually have any substance.
Skipfish did not highlight the possibility of listing directory contents for any of the targets we tested. The content of the robots.txt file, which can be of particular importance for identifying interesting server areas, are also left uncommented and presented to the user as "interesting file" results for individual interpretation.

The curtain falls

Four hours of scanning IIS 7.5 with the ScrewTurn wiki yielded 8 high risk results, 264 medium risk results, 55 low risk results, 123 warnings and 254 informational entries. The total data volume stored on disk was 851MB, and the web browser should have one Gigabyte of working memory available to allow the results page to be viewed.
Testing Google Skipfish_Skipfish_03
All three SQL injection alerts for the Typo3 system prove to be false alarms when examined manually. Testing Google Skipfish_Google_02
Unfortunately, the 8 most important results – all presumed to be integer overflows in HTTP-GET parameters, turned out to be false. In the next group of results, "Interesting Server responses" are HTTP 404 errors (resource not found) and HTTP 500 errors (internal server error) which are jumbled together, making it necessary to manually investigate the 130 displayed results one by one. Since IIS displays a generic error message for "HTTP 500" to avoid providing potential attackers with further information, the remaining requests would have to be individually correlated with the web server log data or retested manually. However, a look at the server logs reveals that this effort would be wasted because the flaw is always the same and has no security relevance. That false alarms are not the exception also became apparent during a subsequent analysis of a Typo3 system were Skipfish warned of a critical SQL injection hole.
It is worth mentioning that Skipfish detects the presence of an intrusion prevention system (IPS) and lists this under "Internal Warnings". During our tests, it detected the HttpRequestValidationException, which is issued by ASP.NET for very obvious SQL injection and cross-site scripting attacks. This triggered an HTTP 500 error.

Evaluation

On the whole, Skipfish's results when scanning the IIS 7.5 and ScrewTurn combination weren't too impressive. However, it is worth pointing out that the scanner's brutal dictionary use detected the /wiki/ URI by itself, which allowed the URI to be examined without any manual intervention.
When examining the Apache server and CGI scripts, and with the dictionary function restricted to learning only, Skipfish scored slightly better. There were only 47 medium risk results, 6 low risk results, 1 warning and 65 informational entries, which only required 83MB. Interestingly, Skipfish isn't capable of handling classic Perl-generated HTML forms. As a result, it doesn't enter any learned keywords into form fields for testing. However, the scanner correctly detects that the forms have no protection against Cross Site Request Forgery (CSRF) attacks.
Both the "Interesting File" category and the "Incorrect or missing MIME type" category list a URL at which flaws in the server configuration actually display the source code of a CGI script rather than launch the script itself. This result was achieved by combining file names and extensions (in this case an empty extension).
Incidentally, the printer scan didn't produce any results because we had to abort scanning after 18 hours with no end in sight. In such situations, it's an inconvenience that Skipfish only writes the report about its findings to the directory once a scan has been completed, or been aborted via Control-C. There is no way of finding out what the tool has already detected while the scan is in progress.

Assessment

Skipfish excels at discovering forgotten data and provoking unexpected server behaviour. The permutation of keywords and file extensions as well as learned elements is very similar to the fuzzing process, producing similar results in terms of data volume.
In large computer centres, such as Google's, which harbour a very heterogeneous landscape of web applications and have access to almost unlimited processing resources and network bandwidth, using Skipfish in its present form is already bound to help discover unexpected functions and content.
However, using Skipfish makes less sense if there is direct access to a web server's file system – especially when scanning via an internet connection. The amount of data transmitted by the scanner is almost equivalent to a complete file system p_w_picpath, which makes it preferable to analyse an application's scripts and files on site and with well-established tools in one's own time.
Testing Google Skipfish_Testing_05
While Skipfish's report is very clear, the problems listed require considerable interpretation. Testing Google Skipfish_Google_02
Skipfish in its current form has only very limited professional uses. Investigating the test results, which is always required, soon becomes a momentous task due to the flood of reports. Furthermore, results such as "Incorrect or missing charset", "Incorrect caching directives" or "Incorrect or missing MIME type" are very abstract and far from non-ambiguous. Even experienced pen testers often need to research to find the actual cause of the problem. Consequently, Skipfish is completely unsuitable for non-professional users or "quick check-ups" and will probably prompt unjustified panic rather than provide reassurance.
In many cases, using the dictionary function is prohibitive just because of the amount of data produced and the resulting target system loads created. A fully mature Java-based business web application would probably not stand up to a Skipfish scan with minimal dictionary because the scanner's CPU and main memory requirements would paralyse the system before the scan is completed. Firewalls with complex inspection modules and IDS/IPS systems would also drown the recipients of their log messages in extreme amounts of data or simply quit functioning.

http://www.h-online.com/security/features/Testing-Google-s-Skipfish-1001315.html