Shocker: Internet files are searchable!
Last week Google released a code search engine that makes it easier to search through programming code that has been made publicly available on web and FTP sites. It’s nothing new… other code-specific search engines like koders.com have been around for a while and it’s always been possible to use Google and other search engines to search specific file types for strings, which is really all this is.
However, Google code search has made this capability more public and the predictable has happened:Â a flood of articles in industry publications and blogs about how this “new” search can reveal vulnerabilities in applications, followed by predictions of how “hackers” will use this to launch a flood of exploits and accusations by security “experts” of irresponsible behavior on the part of Google (by the way If you want to have fun finding vulnerabilities in published code, Gadi Evron over at Securiteam is maintaining a list of queries, as is the Bugle project).
Using using search engines to find vulnerabilities has been well documented for years, yet every once in a while someone notices that search engines actually (gasp!) index the contents of publicly available files on the Internet and that (shock!) some of that content probably shouldn’t have been made public. Google and other search engines aren’t cracking into private intranets and password-protected file repositories… they are simply indexing files made accessible to the general public. That is, after all, what they do.
The real story, if there is one, highlighted by this new code search is that too many developers still do stupid things… like hard-coding passwords in source code and not validating input. Yet most stories are focusing on how Google code search makes such stupidity easier to find. You’d think by now that most people understand what the Internet is and how search engines function. How is it possible to be surprised that files made publicly accessible via web, ftp, or anonymous CVS are found and indexed by search engines? Is this yet another security blind spot?
In some cases, surprise is understandable. People generally don’t think of CVS and other code repositories as being indexable by search engines… and many, many people put up public web and FTP servers for internal use and think search engines can’t find them because no web page links to the URL (of course eventually someone always saves a link to it in del.icio.us or a similar public site, or a web master sees the link in their server’s http_refer log and follows it, etc).
My favorite experience with search engines was when I worked at a certain large multinational. They were rolling out an intranet search engine for the first time and every night all the records in certain databases got deleted. No one could figure out who was doing it. It turned out that Filemaker Pro at that time had a simple “publish to web” function that created a simple web-based front end automatically. Unfortunately, management functions like “insert”, “modify” and “delete” were provided as standard HREF links on each page. Everyone assumed that only humans would visit intranet web pages and no employee would ever maliciously click “delete”, so no steps were taken to secure those functions. Each night the search engine crawler dutifully “clicked” each link it found, including “delete”, wiping out each database one record at a time.
I guess knowledge of search engines is a basic topic that needs to be covered in a security awareness session for developers and managers. The points for such a session might look something like this:
- The Internet is a global collection of interconnected public networks.
- Everything that can be accessed from the Internet will be, unless you restrict that access using authentication, firewalls, or by some other means.
- Search engines follow every link they find, and eventually all links are found. See point 2.
- It’s not just humans accessing web pages.
Update: This article is a good example of the drivel some security “experts” have been saying about Google code search.