3 – Using Google for Reconnaissance
A. Scenario (Google)
While conventional information reconnaissance used to begin in a public library, it now begins with public online materials. In this case, many large public organizations have either home web sites and/or wiki articles that include many different forms of information. Hospitals and government agencies are no different. The attackers started out with general searches to build a list of main web sites related to their targets, and also any news stories or wiki pages that could be found.
Using these basic techniques, it was possible to retrieve the following information:
-
• Maps of facility locations and directions to them.
• Maps of facility buildings and satellite images.
• Maps of building interiors showing departments and function of areas.
• Organizational charts of departments and staff positions.
• Lists of staff including contact information.
• Job listings with descriptions of technical skills needed.
• Help desk Frequently Asked Questions.
• News stories.
• Security policies.
The “Enterprise Information Security” page was especially helpful, yielding the name of a file integrity checker program and the name of a popular anti-virus vendor. Policies were posted that offered a list of ports and protocols that were allowed on both the wired network and the wireless network. The wireless networking policy explained that the facility was using both 802.11b and 802.11g and WEP encryption. In some cases the links were protected by an account login process, but just the name of the link provided the information needed. Two of the URLs in links that could not be reached without logging in, showed that the facility was using Tripwire software for file integrity checking and Microsoft’s SUS (System Update Server) to distribute patches and updates. A “Troubleshooting” section patiently explained that the password required to un-install the anti-virus software was the name of the manufacturer (note – this is a known default setting).
A “Network Configuration” page offered IP addresses for routers used as default gateways, DNS servers, SMTP servers, and a link to a separate page that had a list of printers, their locations, make and model, serial number, purchase data and amount of RAM contained in the printer. Another page linked to this one showed a list of maps that went floor by floor in some buildings and marked the location of every ethernet jack. There was also a list of ethernet jacks available in conference rooms for public use. At one facility, a list was discovered on a web page named “department servers – essential machines”. It contained IP addresses, host names, operating systems and version, make and model and serial number of hardware, primary function, physical room location, CPU speed and RAM amount and hard disk capacity.
By noticing that the keywords seemed to be “department servers”, the attackers drilled in some more google searches and found a lot more. Some more FAQ pages turned up and by exploring the links there, they produced more policy and procedure regarding computer use, network diagrams, and a section labeled “photos of A-Wing communication closets – category three punch-down block information”. Clicking on the link to view the photos returned a 403 Access Forbidden message, but the links contained the room numbers of all the wiring closets in the building. There was a list of network admin staff, their office locations and phone numbers. A “What’s New?” link offered a change-log of upgrades to application software and network services, including the version and date of the change. A “System Admin Utilities Software” page offered even more details on the current versions of many tools being used. Spinning off another search branch using the key words “network diagram” was also very fruitful.
A major windfall discovered on one of the web sites was the PHA 6601.1 Information Security Handbook in a .doc file. This document was over fifty pages long and included a lot of material that appeared to have been copied and pasted directly from NIST SP 800-53 and dealt mainly with policy. Several long appendices were also obtained and one of them contained security controls and configuration information. It referenced all of the 800-53 controls and although many of the descriptions of implementation of the controls were vague and generalized, some of them were quite explicit. The exact configuration of the password complexity policy was available. The controls that were vaguely worded could mean either that the policy makers or the front line defenders didn’t understand them very well, and might offer some opportunity for the attackers.
Another document retrieved from the same site seemed to define the PHA policy for Incident Response (IR), but it had not been updated since 1999. It did offer a good framework for general IR procedures and vaguely defined conditions under which the Information System Security Officer and the Facility Director were to be notified. Most of the focus on cyber threats seemed to be directed toward contamination by viruses. The PHA 6601.1 handbook also referenced this document for IR controls.
News stories about new hires provided tremendous biographical background information about key staff members. A local newsletter article showed a picture of a computer technology class engaged in a computer security training exercise. At first glance this seems innocuous, but it yielded details like the name of the professor of the class, the name of the campus building the class was held in, and several names of students in the class. Another news story told of the facility’s “Crisis Response Team” including some of their security measures and the name of the chairman of the team. This kind of material was collected in great volume and poured into a database for later correlation and reference for social engineering. Biographies for key staff members were developed that included home addresses and phone numbers and past jobs and contacts. Particular intensity was applied to this search whenever a biography was identified as being a possible “key” staff member (somebody that might have important knowledge or access rights to key infrastructure). Some professional networking sites were very helpful with this. Searching with keywords, “JCAHO” and “HIPAA” yielded hundreds of contacts who were selectively invited to link to a fictional account, which was then able to request invitations to colleagues of the links.
Application produced data files (.doc, .xls, .ppt, .pdf) were downloaded and analyzed and produced metadata information on the documents’ authors and editors and sometimes their contact information. A few even contained internal host names. Images from Google and Flikr also supplied many useful photographs of sites, buildings, office space interiors and people. Captions often provided the names of the places and people in the pictures. On the Facilities Management page, a fire alarm testing schedule was posted for months in advance of the testing dates. Who knows if this would be useful or not? It was added to the database.
Conventional reconnaissance was used as a follow up after the initial online searches. In some cases, the medical facilities were part of or associated with universities and the campus library became a helpful resource. Telephone queries were made using the contact information discovered above to confirm current staff status and even some technical details. Agents were sent onsite to perform physical observation and take pictures when it was needed to fill in the blanks. They also developed a list of local coffee shops, delis and restaurants within walking distance that were determined by following facility staff on foot.
More detailed reconnaissance searched through “google groups” postings, and found computer network security policies posted on web pages. Many email addresses were harvested using combinations of the site domain name plus “@gmail.com”, “@hotmail.com”, “@yahoo.com” … and so on.
Some more detailed google searches were made to find systems and vulnerabilities. In google advanced search, the results were narrowed to the domain name, then a search was done for “Apache/ server at”. Here are some of the results:
-
• Apache/2.2.4 (Fedora) Server at domain.name Port 80
• Apache/2.0.52 (Red Hat) Server at domain.name Port 80
• Apache/2.0.52 (CentOS) Server at domain.name Port 80
• Apache/2.0.49 (Unix) mod_ssl/2.0.49 OpenSSL/0.9.7c PHP/4.3.2 Server at domain.name Port 80
• Apache/2.0.47 (Unix) mod_perl/1.99_10 Perl/v5.8.0 mod_ssl/2.0.47 OpenSSL/0.9.6g Server at domain.name Port 80
• Apache/2.0.46 (Red Hat) Server at domain.name Port 80
• Apache/1.3.37 Server at domain.name Port 80
• Apache/1.3.33 Server at domain.name Port 80
• Apache/1.3.29 Server at domain.name Port 80
• Apache/1.3.27 Server at domain.name Port 80
• Apache/1.3.26 Server at domain.name Port 80
• Apache/1.3.20 Server at domain.name Port 80
• Apache/1.3.9 Server at domain.name Port 80
There are several server versions on this list that are vulnerable to the very old “Apache Web Chunked” exploit. Note that there may have been several or even many Apache servers for each listing above. Only one listing was shown for each version detected, regardless of how many were seen.
Leave a Reply
You must be logged in to post a comment.