Of all the NSA surveillance documents Edward Snowden leaked, some of the most important exposed the spy agency’s so-called XKEYSCORE program, a massive system for vacuuming up and sifting through emails, chats, images, online search activity, usernames and passwords, and other private digital data from core fiber optics cables around the world.
XKEYSCORE, which the NSA calls its “widest reaching” surveillance program, was established around 2008 and consists of more than 700 servers that store data sucked from the internet’s backbone and mine this data for patterns and connections.
Only a well-resourced party like the NSA could deploy such a grandiose surveillance program. But if your spy needs are more modest, there are a number of existing tools available that offer similar surveillance capabilities, albeit at a smaller scale, says Nicholas Weaver.
Weaver, a senior researcher at the International Computer Science Institute at UC Berkeley who focuses on network surveillance and security issues, developed a little hobby after the Snowden leaks in 2013: to build a bulk surveillance system in miniature that would be capable of performing all the primary tasks of an NSA spy system—but on a small, 100 Mbps-size network. Those capabilities had to include bulk data collection, search functionality, the ability to track cookies and identify anonymous users, a method for injecting malware into a surveillance target’s computer for more directed surveillance, and a friendly web interface. Luckily, Weaver realized, he already had off-the-shelf equipment that met the criteria.
“When the Snowden stuff came out, I looked at the documents and said, ‘Hey they’re doing what I do. It’s literally the same [as the security research] I’ve been doing for a decade,'” Weaver told WIRED.
Speaking to WIRED in advance of a presentation he’s giving today about his system at the Enigma security conference in San Francisco, he described the components needed to emulate the spy agency.
Surveillance Tech Is ‘Banal and Basic’
Although the US intelligence community likes to operate under the notion that its systems are NOBUS (Nobody But Us), meaning its technologies are unique to the United States, Weaver says the reality is the opposite when it comes to surveillance technology. “It’s very banal and very basic, it’s very well-understood technology, and … there’s really nothing new,” he says.
The NSA’s super-secret surveillance system, in fact, works very much the way off-the-shelf intrusion detection systems (IDS) function: With these systems, when a data packet arrives to a network, a high-volume filter separates garbage traffic from the important traffic and passes the latter to a load balancer, which distributes data to a number of servers. In this case, it distributes the data to network intrusion detection nodes or devices. The IDS nodes then parse the traffic to determine if it’s benign or malicious and make decisions about what to do based on those conclusions, such as blocking the traffic if it’s malicious and issuing an alert to administrators.
Following the same general design, Weaver developed a home-grown surveillance system that took less than a week to construct. To approximate a filter and load balancer, he used OpenFlow, a protocol for managing and directing traffic among routers and switches on a network. For his intrusion detection system, he used the Bro Network Security Monitor, an open-source framework developed by Vern Paxson, a fellow computer scientist at UC Berkeley. He had to write scripts to do things like extract the cookies in web traffic and parse out usernames from traffic, but this was minimal work.
Those looking to do more robust backbone monitoring and data parsing like the NSA does could opt instead for
With Weaver’s DIY system, in order to search through the collected data, he just did local searches. But if someone want to do broader federated searches, they could use Hadoop, an open-source framework for storing and processing large amounts of data spread among multiple systems. Hadoop can parse similar sets of data into so-called buckets to make processing or searching data more efficient. For example, IP addresses can be parsed out and categorized in one bucket, and cookies and usernames can be categorized in other buckets. To find, for example, every IP address that visited a certain web page, a search would only need to focus on data in the IP bucket. “Hadoop will allow me to search all the data [simultaneously], but most of my searches actually only need to look at a couple of buckets,” Weaver says.
Advanced, Targeted Spying
Weaver’s surveillance solution isn’t complete without a way to conduct targeted surveillance. That’s because bulk surveillance is all about trying to find needles in a haystack—those few data points among billions that merit further scrutiny. But once spies home in on those they need to conduct more efficient and pinpointed intelligence-gathering. They do this by hacking a target’s system. The NSA and its British spy partner the GCHQ use a system called QUANTUM Insert that involves a man-on-the-side attack and code injection. The system works by hijacking a browser as it’s trying to access a web page and forcing it to visit a malicious web page instead, where malware gets secretly downloaded to the target’s computer.
The spy agencies used QUANTUM Insert to hack into the machines of terrorist suspects in the Middle East as well as the machines of employees working for the Belgian telecom Belgacom.
Weaver’s low-rent alternative for doing malware injection is to use the built-in injection capabilities in Bro. But someone could also use AirPwn—a tool often used by hackers as a prank to hijack someone’s browser and display porn or other raunchy images on it. “This is an old technique; it’s used for jokes,” Weaver says.
The Anywhere, Anyone Spy
Weaver notes that his surveillance system can actually be made more compact and portable by using off-the-shelf ARM/Wi-Fi embedded systems, which would be perfect for nation-state spies looking to target government workers. The spies could easily take the system to a Starbucks frequented by State Department employees, lawmakers or military personnel and use it to extract metadata belonging to customers who use the cafe’s wireless network. The metadata can help identify targets worthy of further surveillance, who can then be tracked online after they’ve left Starbucks, through this and other metadata. Such a system could easily be disguised as a plug-in air freshener inserted into an electrical outlet, Weaver notes. It could also be designed to erase itself automatically if someone unplugs it from the socket to examine it.
“Any foreign intelligence agency could install surveillance devices in every downtown DC Starbucks, use bulk surveillance to identify all the network visitors and, for any visitor who meets their criteria, directly inject exploits into their web browsing,” Weaver wrote in a blog post last year about his system. “DC hotels are similarly vulnerable to slightly larger installations such as my demonstration box…. We need to act like every open wireless network or hotel in the Washington area is potentially compromised. And with the low cost of such installation, it doesn’t even need to remain the realm of foreign intelligence services.”
Weaver says it took him around 50 hours to assemble his surveillance system, including writing about 600 lines of code for his Bro monitoring system. The scale of the data collection increases with a little investment. It would run someone about $850 for a 100 Mbps system like the one Weaver built, for example. A 1 Gbps installation might cost $5,000.
“This sounds like some big super thing,” Weaver says. “But I spent more time preparing this talk than I did building this demo…. This is literally hobby-level stuff. When national security programs are hobby-level, you really have to worry that anybody else can do them.”