Skip to content

Google Hacking – How to Find Vulnerable Data Using Nothing but Google Search Engine

Technology

Jan 13, 2022 - 5 minute read

Template Blog
Grzegorz Zawalnicki Quality Engineer
See all Grzegorz's posts

2988 HC Digital Transformation 476X381

Share

Very short introduction to Google web indexing

Google uses a proces called crawling (or fetching) to index new or updated pages. The program responsible for the crawling is called Googlebot (also known as a robot, bot, or spider). Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often, and how many pages to fetch from each site. Googlebot uses two types of crawling:

  • Deep crawl - when Googlebot fetches a page, it culls all the links appearing on the page and adds them to the queue for subsequent crawling. By harvesting links from every page it encounters, Googlebot can quickly build a list of links that can cover broad reaches of the web. Because of their massive scale, deep crawls can reach almost every page on the web. Due to the number of pages existing on the web, this can take some time, so some pages may be crawled only once a month.

  • Fresh crawl - to keep the index current, Google continuously rescans popular and frequently changing web pages at a rate roughly proportional to how often the pages change. Newspaper pages are downloaded daily, pages with stock quotes are downloaded much more frequently. Of course, fresh crawls return fewer pages than the deep crawl.

What Exactly is Google Hacking?

Google Hacking is a technique that uses Google’s search engine to find vulnerable or sensitive data. To help refine search results, you can use Advanced Search Operators and Special Search Characters. Advanced operators use the following syntax:

operator:search_term
Operator Purpose Mixes with other Operators? Can be used alone?
intitle Search page title yes yes
allintitle Search page title yes yes
inurl Search URL yes yes
allinurl Search URL no yes
filetype Search specific files yes no
allintext Search text of page only yes yes
site Search specific site yes yes
link Search for links to pages no yes
inanchor Search links anchor text yes yes
numrange Search numbers within a desired range. yes yes
daterange Search in date range yes no

 

Character Purpose
+ forced inclusion of something common
- exclude a search term
“ ” use quotes around search phrases
. a single wildcard
* any word
| Boolean ‘OR’
(“master card” | mastercard) Parenthesis group queries

 

Examples of Google Hacking

So what exactly can you find in Google and why is it vulnerable? Let's take a look at a few examples.

Directory Listings

Directory listings provide a list of files and directories in a browser window instead of the typical text-and graphics mix generally associated with web pages. Directory listings are often placed on web servers on purpose to allow visitors to browse and download files from a directory tree. Many times, however, directory listings are not intentional and there’s a good chance that an attacker may find something interesting inside a directory listing. Query:

intitle:index.of

A basic query that returns a large number of false-positive results But those queries return some more interesting stuff: Query:

intitle:index.of "parent directory"

or Query:

intitle:index.of name size

Index of /backup

Web Server Detection

A Security Tester can use this information to determine the version of the web server, or to search Google for vulnerable targets. In addition, this indicates whether the web server is well maintained or not. Query:

intitle:index.of server.at

- This query focuses on the term “index of” in the title and “server at” appearing at the bottom of the directory listing.

Index of server at Query:

intitle:index.of "Apache/2.4.7 Server at"

- This query will find servers with directory listings enabled that are running Apache version 2.4.7.

index of apache/2.4.7 - directory listings

Files containing usernames and / or passwords

Yes, it's possible to find files containing logins and passwords which still work! Query:

xamppdirpasswd.txt filetype:txt

return password files for XAMPP Server.

passwords files for XAMPP Server Query:

site:github.cominurl:sftp-config.json

FTP login/password credentials on github.com

FTP login/password credentials on github

Query:

filetype:passwordjmxremote

 Passwords for Java Management Extensions (JMX Remote) used by jconsole.

Passwords for Java Management Extensions (JMX Remote) used by jconsole Query:

“# Dumping data for table” (user | username | pass | password)

Dumping data for table

Sensitive Directories

Query:

 inurl:8080 intitle:"Dashboard [Jenkins]"

Access to Jenkins Dashboard. At the beginning, you’re not going to see much, but if you go deeper you may find some more interesting stuff.

inurl 8080 intitle Dashboard Jenkins

Sample screen of one of the latest build.

Jenkins

Query:

 “.git" intitle:"Index of"

Shows access to publicly browsable .git directories.

git intitle index of

Various Online Devices

Query:

“inurl:system_device.xml”

Displays public status page for Konica Minolta Printer.

KonicaMinolta sytem device inurl

As you can see, there’s nothing unusual so far, but from here, you can go to the login screen, and switch to an administrator account.

KonicaMinolta2

At this point, you will still need a password. On the previous page, you could have seen the specific printer model, so maybe the default password is going to work? You can always try asking Google. You don’t need a sophisticated query to do so, and the result is:

KonicaMinolta3

Now, let’s put it to the test.

KonicaMinolta4

As you can see, in this case it worked.

KonicaMinolta5

How to remain safe?

I’ve already talked about several examples of finding vulnerable data using Google. Now, let’s take a look at what you can do to avoid falling victim to those methods:

  • Disable directory browsing on the webserver. Directory browsing should only be enabled for the web-folders that you want to be accessible for anyone on the Internet.
  • Don’t put critical and sensitive information on servers without any proper authentication system. If you do it, they can be directly accessible to anyone on the Internet.
  • Always install latest security patches for your applications and latest operating system on your servers.
  • Disable anonymous access in the webserver through the Internet to the restricted systems directory.
  • If you find any links to your restricted server or sites in Google search results, then it should be removed.
  • Google also took some steps to monitor suspicious searches of vulnerable data.

ipv4 google com

Conclusion

Google hacking can be a very useful tool in penetration testing. Tools like Metasploit and Nmap now have automated scripts that search Google for useful information related to a particular site or organisation. Google hacking also finds excellent use in social engineering attacks and carrying out phishing campaigns. Although Google search hacking is an old technique, it remains effective even to this day. That's the case, because new misconfigured servers, various online devices and vulnerable websites, are arriving every day all over the internet, and Google monitors it all.

Want to know more?

Feel free to take a look at those articles and links:

2988 HC Digital Transformation 476X381
Grzegorz Zawalnicki Quality Engineer
See all Grzegorz's posts

Related posts

You might be also interested in

Contact

Start your project with Objectivity

CTA Pattern - Contact - Middle

We use necessary cookies for the functionality of our website, as well as optional cookies for analytic, performance and/or marketing purposes. Collecting and reporting information via optional cookies helps us improve our website and reach out to you with information regarding our organisaton or offer. To read more or decline the use of some cookies please see our Cookie Settings.