Receiving private data does not always mean hacking - sometimes it is published publicly. Knowledge of Google settings and a little ingenuity will allow you to find a lot of interesting things - from credit card numbers to FBI documents.
WARNING All information is provided for informational purposes only. Neither the editors nor the author are responsible for any possible harm caused by the materials in this article.
Today, everything is connected to the Internet, with little concern for restricting access. Therefore, many private data become the prey of search engines. Spider robots are no longer limited to web pages, but index all content available on the Internet and constantly add non-public information to their databases. Finding out these secrets is easy - you just need to know how to ask about them.
Looking for files
In skillful hands, Google will quickly find everything that is not found on the Internet, for example, personal information and files for official use. They are often hidden like a key under a rug: there are no real access restrictions, the data simply lies on the back of the site, where no links lead. The standard Google web interface provides only basic settings for advanced search, but even these will be sufficient.
You can limit your search to a specific type of file in Google using two operators: filetype and ext. The first specifies the format that the search engine determined from the file title, the second specifies the file extension, regardless of its internal content. When searching in both cases, you only need to specify the extension. Initially, the ext operator was convenient to use in cases where the file did not have specific format characteristics (for example, to search for ini and cfg configuration files, which could contain anything). Now Google's algorithms have changed, and there is no visible difference between the operators - the results in most cases are the same.
Filtering the results
By default, Google searches for words and generally any entered characters in all files on indexed pages. You can limit the search area by top-level domain, a specific site, or by the location of the search sequence in the files themselves. For the first two options, use the site operator, followed by the name of the domain or selected site. In the third case, a whole set of operators allows you to search for information in service fields and metadata. For example, allinurl will find the specified in the body of the links themselves, allinanchor - in the text tagged <a name>, allintitle - in the page titles, allintext - in the body of the pages.
For each operator there is a lightweight version with a shorter name (without the prefix all). The difference is that allinurl will find links with all words, and inurl will find links with only the first of them. The second and subsequent words from the query can appear anywhere on web pages. The inurl operator also differs from another one with a similar meaning - site. The first also allows you to find any sequence of characters in a link to the searched document (for example, /cgi-bin/), which is widely used to find components with known vulnerabilities.
Let's try it in practice. We take the filter allintext and make the request produce a list of numbers and verification codes of credit cards that will expire only in two years (or when their owners get tired of feeding everyone).
allintext: card number expiration date /2017 cvv
275 thousand current credit cards, fakes and honeypots for freebie loversWhen you read in the news that a young hacker “hacked the servers” of the Pentagon or NASA, stealing classified information, then in most cases we are talking about just such a basic technique of using Google. Suppose we are interested in a list of NASA employees and their contact information. Surely such a list is available in electronic form. For convenience or due to oversight, it may also be on the organization’s website itself. It is logical that in this case there will be no links to it, since it is intended for internal use. What words can be in such a file? At a minimum, the “address” field. It's easy to check all these assumptions.
Using two operators, you can obtain “secret” NASA documents in 0.36 seconds
Write
inurl:nasa.gov filetype:xlsx "address"
and get links to files with lists of employees.
Addresses and telephone numbers of key NASA employees in an Excel file
Using bureaucracy
Discoveries like this are a nice touch. A truly solid catch is provided by a more detailed knowledge of Google's operators for webmasters, the Network itself, and the peculiarities of the structure of what is being sought. Knowing the details, you can easily filter the results and refine the properties of the necessary files in order to get truly valuable data in the rest. It's funny that bureaucracy comes to the rescue here. It produces standard formulations that are convenient for searching for secret information accidentally leaked onto the Internet.
For example, the Distribution statement stamp, required in the office of the US Department of Defense, means standardized restrictions on the distribution of the document. The letter A denotes public releases in which there is nothing secret; B - intended only for internal use, C - strictly confidential, and so on until F. The letter X stands out separately, which marks particularly valuable information representing a state secret of the highest level. Let those who are supposed to do this on duty search for such documents, and we will limit ourselves to files with the letter C. According to DoDI directive 5230.24, this marking is assigned to documents containing a description of critical technologies that fall under export control. You can find such carefully protected information on sites in the .mil top-level domain, reserved for the US Army.
"DISTRIBUTION STATEMENT C" inurl:navy.mil
Example of a stamp in a document of security level C
It is very convenient that the .mil domain contains only sites from the US Department of Defense and its contract organizations. Search results with a domain restriction are exceptionally clean, and the titles speak for themselves. Searching for Russian secrets in this way is practically useless: chaos reigns in the .ru and .рф domains, and the names of many weapons systems sound like botanical ones (PP "Kiparis", self-propelled gun "Akatsia") or even fabulous (TOS "Buratino").
Drawing from the manual for the TH-57C Sea Ranger combat training helicopter
By carefully studying any document from a site in the .mil domain, you can see other markers to refine your search. For example, a reference to the export restrictions “Sec 2751”, which is also convenient for searching for interesting technical information. From time to time it is removed from the official sites where it once appeared, so if you cannot follow an interesting link in the search results, use Google’s cache (cache operator) or the Internet Archive website.
Climbing into the clouds
In addition to accidentally declassified government documents, links to personal files from Dropbox and other data storage services that create “private” links to publicly published data occasionally pop up in Google's cache. It’s even worse with alternative and homemade services. For example, the following query finds data for all Verizon customers who have an FTP server installed and actively using their router.
allinurl:ftp:// verizon.net
There are now more than forty thousand such smart people, and in the spring of 2015 there were many more of them. Instead of Verizon.net, you can substitute the name of any well-known provider, and the more famous it is, the larger the catch can be. Through the built-in FTP server, you can see files on an external storage device connected to the router. Usually this is a NAS for remote work, a personal cloud, or some kind of peer-to-peer file downloading. All contents of such media are indexed by Google and other search engines, so you can access files stored on external drives via a direct link.
Serials, documents and another forty thousand files from private clouds
Looking at the configs
Before the general migration to the cloud, simple FTP servers ruled as remote storage, which also had enough vulnerabilities. Many of them are still relevant today. For example, the popular WS_FTP Professional program stores configuration data, user accounts and passwords in the ws_ftp.ini file. It is easy to find and read, since all records are saved in text format, and passwords are encrypted with the Triple DES algorithm after minimal obfuscation. In most versions, simply discarding the first byte is sufficient.
One of the ws_ftp.ini files is publicly available
It is easy to decrypt such passwords using the WS_FTP Password Decryptor utility or free web service.
Decrypting the password takes about a second
When talking about hacking an arbitrary website, they usually mean obtaining a password from logs and backups of configuration files of CMS or e-commerce applications. If you know their typical structure, you can easily indicate the keywords. Lines like those found in ws_ftp.ini are extremely common. For example, in Drupal and PrestaShop there is always a user identifier (UID) and a corresponding password (pwd), and all information is stored in files with the .inc extension. You can search for them as follows:
"pwd=" "UID=" ext:inc
Revealing passwords from the DBMS
In the configuration files of SQL servers, user names and email addresses are stored in clear text, and their MD5 hashes are written instead of passwords. Strictly speaking, it is impossible to decrypt them, but you can find a match among the known hash-password pairs.
Password selection using MD5 hash
There are still DBMSs that do not even use password hashing. The configuration files of any of them can be simply viewed in the browser.
intext:DB_PASSWORD filetype:env
The database password is publicly stored in the configuration file
With the advent of Windows servers, the place of configuration files was partially taken by the registry. You can search through its branches in exactly the same way, using reg as the file type. For example, like this:
filetype:reg HKEY_CURRENT_USER "Password"=
Don't forget the obvious
Sometimes it is possible to get to classified information using data that was accidentally opened and came to the attention of Google. The ideal option is to find a list of passwords in some common format. Only desperate people can store account information in a text file, Word document or Excel spreadsheet, but there is always enough of them.
filetype:xls inurl:password
National Research Institute of Health named after. Lee Tenghui accidentally exposed his password list
On the one hand, there are a lot of means to prevent such incidents. It is necessary to specify adequate access rights in htaccess, patch the CMS, not use left-handed scripts and close other holes. There is also a file with a list of robots.txt exceptions that prohibits search engines from indexing the files and directories specified in it. On the other hand, if the structure of robots.txt on some server differs from the standard one, then it immediately becomes clear what they are trying to hide on it.
The White House welcomes robots
The list of directories and files on any site is preceded by the standard index of. Since for official purposes it must appear in the title, it makes sense to limit its search to the intitle operator. Interesting things are in the /admin/, /personal/, /etc/ and even /secret/ directories.
Goolge helps you get to the root of the directory list
Follow the updates
There are so many leaky systems today that the problem is no longer finding one of them, but choosing the most interesting ones (to study and improve your own security, of course). Examples of search queries that reveal someone's secrets are called Google dorks. One of the first utilities to automatically check the security of sites for known queries on Google was McAfee SiteDigger, but its latest version was released in 2009. Now there are a lot of other tools to simplify the search for vulnerabilities. For example, SearchDiggity by Bishop Fox, as well as updated bases with a selection of current examples. p>
Relevance is extremely important here: old vulnerabilities are closed very slowly, but Google and its search results are constantly changing. There is a difference even between the “last second” filter (&tbs=qdr:s
at the end of the request URL) and the “real time” filter (&tbs=qdr:1< /code>).
The time interval of the last update date of the file is also indicated implicitly by Google. Through the graphical web interface, you can select one of the standard periods (hour, day, week, etc.) or set a date range, but this method is not suitable for automation.
From the look of the address bar, you can only guess about a way to limit the output of results using the construction &tbs=qdr:. The letter y after it sets the limit of one year (&tbs=qdr:y), m shows the results for the last month, w - for the week, d - for the past day, h - for the last hour, n - for the minute, and s - in a second. The latest results that have just become known to Google can be found using the filter &tbs=qdr:1.
If you need to write a clever script, it will be useful to know that the date range is set in Google in Julian format using the daterange operator. For example, this is how you can find a list of PDF documents with the word confidential, downloaded from January 1 to July 1, 2015.
confidential filetype:pdf daterange:2457024-2457205
The range is specified in Julian date format without taking into account the fractional part. Translating them manually from the Gregorian calendar is inconvenient. It’s easier to use date converter.
Target and filter again
In addition to specifying additional operators in the search query, they can be sent directly in the body of the link. For example, the clarification filetype:pdf corresponds to the construction as_filetype=pdf. This makes it convenient to ask any clarifications. Let's say that the output of results only from the Republic of Honduras is specified by adding the construction cr=countryHN to the search URL, and only from the city of Bobruisk - gcs=Bobruisk. In the section for developers you can find full list.
Google automation tools are designed to make life easier, but they often add problems. For example, the user’s city is determined by the user’s IP through WHOIS. Based on this information, Google not only balances the load between servers, but also changes the search results. Depending on the region, for the same request, different results will appear on the first page, and some of them may be completely hidden. The two-letter code after the directive gl=country will help you feel like a cosmopolitan and look for information from any country. For example, the code of the Netherlands is NL, but the Vatican and North Korea do not have their own code in Google.
Search results are often littered even after using several advanced filters. In this case, it is easy to clarify the request by adding several exception words to it (a minus sign is placed in front of each of them). For example, banking, names and tutorial are often used with the word Personal. Therefore, cleaner search results will be shown not by a textbook example of a query, but by a refined one:
intitle:"Index of /Personal/" -names -tutorial -banking
Final example
A sophisticated hacker is distinguished by the fact that he provides himself with everything he needs on his own. For example, VPN is a convenient thing, but either expensive, or temporary and with restrictions. Signing up for a subscription for yourself is too expensive. It's good that there are group subscriptions, and with the help of Google it's easy to become part of a group. To do this, just find the Cisco VPN configuration file, which has a rather non-standard extension PCF and a recognizable path: Program Files\Cisco Systems\VPN Client\Profiles. One request and you join, for example, the friendly team of the University of Bonn.
filetype:pcf vpn OR Group
Getting into the University of Bonn is much more difficult than connecting to their VPN
INFO Google finds configuration files with passwords, but many of them are written in encrypted form or replaced with hashes. If you see strings of a fixed length, then immediately look for a decryption service.
Passwords are stored in encrypted form, but Maurice Massard has already written a program to decrypt them and provides it for free through thecampusgeeks.com.Hundreds of different types of attacks and penetration tests are performed using Google. There are many options, affecting popular programs, major database formats, numerous vulnerabilities of PHP, clouds, and so on. Knowing exactly what you're looking for will make it much easier to find the information you need (especially information you didn't intend to make public). Shodan is not the only one that feeds with interesting ideas, but every database of indexed network resources!