blog < /dev/random

Jeff McJunkin's thoughts on Penetration Testing, Systems Administration, and Network Defense

Category: Uncategorized

masscan

Often when doing penetration tests, clients will ask me to scan their external network presence[1]. For smaller companies, I can often use nmap from start to finish for all my scanning needs. However, for the sake of larger network ranges let’s separate out some of our scanning needs:

1. Network sweeping: Determining which IPv4 addresses have any listening services (finding “live” hosts)

2. Port scanning: Determining listening TCP and UDP ports on target systems

3. Version scanning: Determining the version of services and protocols spoken by open TCP and UDP ports

If the external IP range is roughly ten thousand hosts or fewer, nmap will work just fine for each of these needs. Often, though, larger companies can own tens or even hundreds of thousands of IPv4 addresses. How can we determine in a few hours which of these IPv4 addresses have a listening host? nmap’s default behavior only sends a few probe requests — if all of those probe requests fail, the host is marked offline and no further probes are sent. We can skip the network sweeping with the -Pn option, but then nmap will scan every single configured port for every single IP address. Since the large majority of external IPv4 ranges won’t have listening services, for large network ranges this could take weeks, months, or even years! What we need is some way to efficiently do a network sweep (find which IPv4 addresses have listening services) before handing that smaller list to nmap for further port and version scanning.

Why does nmap have a hard time with such huge network ranges? Fundamentally, nmap is a synchronous tool — that is, it tracks the connection requests and waits for replies. If a TCP connection request (a SYN) doesn’t get any reply, nmap will eventually timeout and declare that service filtered. nmap certainly runs many probe requests in parallel, but filtered services (and unassigned IPv4 addresses) can really slow it down.

In contrast to synchronous tools like nmap, there are several tools that don’t track connections — also known as asynchronous scanners. Examples include scanrand, ZMap, and my personal favorite masscan.

masscan is my favorite of the asynchronous scanning tools for several reasons. First and foremost, it uses the same syntax as nmap whenever possible, which makes it easier to pick up. Second, even amongst asynchronous scanning tools it’s really, really fast. Effectively, with proper network interfaces and drivers it’s limited only by your bandwidth. With two Intel 10 gigabit ethernet adapters it can scan the entire IPv4 internet in six minutes, transmitting over 10 million packets per second. If nmap is light speed, ZMap and scanrand are ridiculous speed, and masscan is ludicrous speed.

First, let’s look at masscan’s basic syntax for scanning the well-known TCP ports of a large network, such as Apple’s ~16 million IPv4 addresses:

masscan 17.0.0.0/8 -p0-1023

Scanning speed

By default, masscan will only send 100 packets per second. Counting 18 bytes for the Ethernet header, 20 bytes for a TCP header, and 20 more for the IPv4 header, that’s only 5,800 bytes per second, or ~46 kilobits per second. Because masscan scans ports and hosts evenly (that is, randomly), the scanning bandwidth you use will be evenly distributed across the hosts and ports you scan. Unintentional Denial-of-Service can be a concern with high-bandwidth scans on smaller network ranges, but a 1-10 megabits per second (or --rate 20000, twenty thousand packets per second) should be pretty safe. Virtual machines can safely go up to --rate 200000, which is 93 megabits per second of outgoing scanning traffic — but check with your client if you need to use these higher speeds.

Doing a network sweep

How can we determine if a given IPv4 address has any listening TCP services? Well, we could scan 65,536 (ports zero through 65,535) ports, but for larger network ranges that’ll make for long scanning times, even with a high --rate. More commonly, I’ll select nmap’s top 100 or 1,000 ports by popularity. If any IPv4 address responds to any SYN packet (whether it’s closed with a RST or open with a SYN-ACK), we’ll save out that host and scan it using more specialized tools such as nmap or even a vulnerability scanner like Nessus.

Let’s use a small trick to get nmap to tell us that list of ports. We’ll scan our own system and output XML format to STDOUT. The XML format of nmap shows the exact parameters used for a scan, but crucially it’ll also translate between --top-ports X and the actual list of ports in a concise fashion. Here I’ll choose to display the top hundred ports, but you could just as easily choose the top ten or the top thousand.

$ nmap localhost --top-ports 100 -oX - | grep services
<scaninfo type="connect" protocol="tcp" numservices="100" services="7,9,13,21-23,25-26,37,53,79-81,88,106,110-111,113,119,135,139,143-144,179,199,389,427,443-445,465,513-515,543-544,548,554,587,631,646,873,990,993,995,1025-1029,1110,1433,1720,1723,1755,1900,2000-2001,2049,2121,2717,3000,3128,3306,3389,3986,4899,5000,5009,5051,5060,5101,5190,5357,5432,5631,5666,5800,5900,6000-6001,6646,7070,8000,8008-8009,8080-8081,8443,8888,9100,9999-10000,32768,49152-49157"/>

Now we can copy that list of ports into masscan and scan our target range. We’ll keep using Apple as our victim example network. At 100,000 packets per second, this will use around 32 megabits per second of traffic.

$ sudo masscan 17.0.0.0/8 -oG apple-masscan.gnmap -p 7,9,13,21-23,25-26,37,53,79-81,88,106,110-111,113,119,135,139,143-144,179,199,389,427,443-445,465,513-515,543-544,548,554,587,631,646,873,990,993,995,1025-1029,1110,1433,1720,1723,1755,1900,2000-2001,2049,2121,2717,3000,3128,3306,3389,3986,4899,5000,5009,5051,5060,5101,5190,5357,5432,5631,5666,5800,5900,6000-6001,6646,7070,8000,8008-8009,8080-8081,8443,8888,9100,9999-10000,32768,49152-49157 --rate 100000

Note that masscan supports the same -oG filename.gnmap option as nmap does. We’ll read through that output list (the so-called “greppable” format) to find the list of hosts that are alive. Given 16 million target IPv4 addresses and 100 TCP ports each, this scan will take around five hours to complete — which is well within what I’d consider a “reasonable” timeframe. Let’s look at the first few lines of the resulting file:

# Masscan 1.0.3 scan initiated Thu Jul 20 22:24:40 2017
# Ports scanned: TCP(1;7-7,) UDP(0;) SCTP(0;) PROTOCOLS(0;)
Host: 17.179.241.56 ()  Ports: 443/open/tcp////
Host: 17.253.84.72 ()   Ports: 179/open/tcp////
Host: 17.188.161.148 () Ports: 8081/open/tcp////
Host: 17.188.161.212 () Ports: 8081/open/tcp////

We only need the IPv4 address, so we’ll use egrep to search for lines beginning with “Host: ” and cut to take the second field. We’ll also sort and make the lines unique with uniq, just in case masscan writes the same IPv4 address twice.

$ egrep '^Host: ' apple-masscan.gnmap | cut -d" " -f2 | sort | uniq > apple-alive

Now we have a much smaller list of IPv4 addresses to work with, one address per line. As a parting example, we can use this as an input list with nmap to do a more thorough scan:

# nmap -PN -n -A -iL apple-alive -oA apple-nmap-advanced-scan

Using our masscan-generated file, nmap will now be able to do its job much more quickly!

Please let me know in the comments if you find this useful for your workflow. I love nmap, but sometimes larger tasks call for more specialized tools.

Thanks for reading! – Jeff McJunkin

[1] Hopefully the client is risk-aware enough to consider a “assume breach” mentality and give penetration testers internal network access in addition to the external network scan, but that’s a separate story.

Spare: In fact, nmap is absolutely my favorite tool for service scanning (that is, differentiating between Apache 2.2 and IIS 8.0 on the listening port 80).

April 2013 SOU Presentation

Though long delayed, below is the slideshare for my April 2013 talk at SOU, entitled “Getting Involved In Network Security”:

 

Expect another post with my January 2014 presentation soon.

Introduction to Network Penetration Testing – Module 1, Networking Overview

I’m mentoring a highly-motivated high school student through a senior project, as he’s interested in network security and wants to do a penetration test of his high school. He’s got permission and I’ve got the spare cycles, so I agreed to mentor him.

What will hopefully follow is a series of blog posts of the compressed education I’ll give. I’m trying to constrain this to 40-ish hours, and also trying to pass enough education so he knows to a reasonable extent what he’s doing by the end, not just showing off a series of tools. With such a short time limitation, this will obviously be a whirlwind of topics, so advanced readers will have to forgive me glossing over some rather important details.

Module One – Overview of Practical Networking

I’m a big believer in learning networking from a practical point of view, as it helps with many aspects of troubleshooting. Troubleshooting, as I use the word, is having a goal and working through or bypassing the obstacles encountered, often using many different approaches to the problem. It’s okay to have eleven failed solutions, as long as you have twelve different ways to solve the problem. Penetration testing is just troubleshooting your way to Domain Admin, so it relates well.

Though in a classroom environment I’d also teach the OSI model, for the defined purpose of this class the TCP/IP model fits better, so we’ll work with that. The one-sentence descriptions of each layer below are my own attempt to sum up the intent of each layer.

I’d be foolish not to link to Wikipedia’s page on the TCP/IP model, because it is very well fleshed out. Specifically, the section on encapsulation, which I’ll quote below, explains the concept very well:

The Internet protocol suite uses encapsulation to provide abstraction of protocols and services. Encapsulation is usually aligned with the division of the protocol suite into layers of general functionality. In general, an application (the highest level of the model) uses a set of protocols to send its data down the layers, being further encapsulated at each level.

The layers of the protocol suite near the top are logically closer to the user application, while those near the bottom are logically closer to the physical transmission of the data. Viewing layers as providing or consuming a service is a method of abstraction to isolate upper layer protocols from the details of transmitting bits over, for example, Ethernet and collision detection, while the lower layers avoid having to know the details of each and every application and its protocol.

Layer 1 – Network Access Layer – “Physical network interface to network interface communication, within the same subnet”

Example protocols: Ethernet, 802.11{a,b,g,n}

The network access layer is scoped to just allowing hosts (or more precisely, their network interfaces) on the same network to communicate. This also includes the physical components (such as cabling and interfaces) and protocols for sending and receiving the physical signals. As an example protocol, an Ethernet address is assigned to the network card by the manufacturer and is supposed to be unique.

The link light on an Ethernet card, for example, is solely indicating whether the involved interface believes it is connected to another device speaking the same protocol. A laptop connected via Ethernet to a switch with no other hosts attached, for example, would still have an active link light, because both the switch and network interface speak Ethernet.

Moving packets from one interface to another at the link layer is called switching. A switch is a network device that connects hosts within the same subnet (or “switches packets”), and therefore operates at layer one.

Layer 2 – Internet Layer – “Logical host to host communication, across separate subnets (or within a subnet)”

Example protocols: IPv4, IPv6

As a building block above layer one, the Internet layer allows hosts in different subnets to communicate. Note that while network access layer addresses are hardware addresses assigned by the manufacturer, Internet layer addresses are assigned by the user to particular hosts. As such, a machine can have different Internet layer addresses while in different subnets (such as a laptop with a different IP address at home and at a coffee shop), whereas a network access layer device will always have the same address (assuming no MAC spoofing shenanigans).

Moving packets from one subnet to another is called routing. A router is a network device that connects multiple subnets (or “routes packets”), and therefore operates at layer two.

Layer 3 – Transport Layer – “Service to service communication”

Example protocols: TCP, UDP

The transport layer builds upon the Internet layer (starting to get the theme, here?) by allowing multiple services on a single host. Look at the IPv4 header, for example. There’s a field for destination address, but how do you speak to a particular service on a host? Imagine a server that runs both HTTP and FTP. How do I, as a client, tell the server which service I want to talk to? Using solely IPv4, I can only send a packet to the host as a whole — there isn’t a field for which service I mean to talk to. The transport layer, at a minimum, provides this service through the concept of ports, of which there are 65,535 (or 2^32 – 1, since port zero is reserved).

Particular services are by convention found on particular ports, and vice versa (see IANA Assigned Port Numbers). If you see port 80 is open, for example, you’d expect a web server (HTTP) to be running on that port. However, there is no “Internet police” regulating this,  so people can and do run services on non-standard ports, for a multitude of reasons. High ports are commonly used for a client to connect from (i.e., as an ephemeral port), so it’s very common to see a client connect from port 49,273 (for example) to port 80, in order to connect to a web server.

As for the two major protocols used at this layer, TCP (Transport Control Protocol) and UDP (User Datagram Protocol), there are some fundamental differences.

User Datagram Protocol (UDP)

UDP is connectionless, meaning messages are sent directly, without any of the overhead that TCP has and as minimal as possible, barely providing anything beyond source and destination ports. For IPv4, even the checksum field is optional. In general, the protocols that emphasize low latency (Voice over IP, real-time video) or extremely high throughput with low overhead (DNS primarily) will tend to prefer UDP. However, since reliability isn’t guaranteed, protocols choosing UDP must be able to work despite occasional loss of packets. Voice over IP, for example, will just skip a few phonemes, where a missed DNS request will just result in sending the request again after a short timeout.

Transmission Control Protocol (TCP)

TCP is more complex, and provides reliable delivery of data in proper order. Even if some packets are dropped, if the overall connection *can* pass packets successfully, then eventually the two application-layer protocols above TCP will get their data. On a lossless network (which almost all local networks should be), the overhead is very minimal and high-speed communications are in no way hampered by TCP. As shown in the TCP header, all these features mean the protocol header itself has many more fields to track these options. We’ll skip going over the TCP fields at this level, for now.

Layer 4 – Application Layer – “Application to application communication, often on behalf of a user”

Example protocols: HTTP, DNS, FTP

As the top-most layer in the networking stack, the application layer is the one that carries traffic from a particular application. As such, there’s a ton of variation in this layer, and many, many protocols. I’ll touch on this concept in a further blog post, but application layer packets are, in blunt terms, the entire point of the communication. Though there’s a lot of scaffolding in lower layers to get an HTTP client connected to an HTTP server, the point of all of those connections it the HTTP (which is application layer) communication. The HTTP communication (in this example) is what the user requested, which is an exceedingly common theme.

Building a virtual lab for security testing

UPDATE – if you’re looking for my article on “Building A Pen Test Lab”, it’s located on the SANS Pen Test blog, not here.

tl;dr — Building a lab like the following is very useful:
Virtual Lab Diagram

It’s not a debate that most IT professionals should have a lab environment in which they can practice their trade. Many don’t have one at work, though, and don’t make one at home. Those of us in network security (whether offense or defense) aren’t an exception, either. Ed Skoudis (of SANS and InGuardians fame) posted on this recently, and a DEFCON 20 talk from Trustwave featured their testing labs heavily.

The purpose of this design is to go into more detail than most security labs, to more closely simulate a standard small business network. You can learn some basics of Metasploit, for example, by using a BackTrack or Kali VM as well as Metasploitable, but more comprehensive attacks and defenses need a more realistic network.

The Active Directory domain controller, file server, and external blog in this lab all represent unique (and common) attack opportunities. Client desktops are almost always of multiple security levels and OS levels, which explains both the Windows XP and 7 workstations. The DMZ is slightly unusual for a small business, but is reasonable in simulating a larger environment. The larger environments, by the way, are the ones that have money for vulnerability assessments and penetration tests, so they’re certainly the networks worth studying.

Not included in the lab diagram is a Security Onion VM for intrusion detection capabilities, and a Splunk server (for now — Graylog2 might replace this) allowing all kinds of logs (syslogs and Windows event logs, to start) to be collected.

Though the hardware I used to put together this lab certainly wasn’t free, it was less expensive than you might think. I’ll put up another post about it shortly, but for now, know that it was based on this fine gentleman’s home lab. One awesome resource that I checked into heavily, by the way, can be found at www.reddit.com/r/homelab. If you have any quick questions, you can also reach some of those folk at #r_homelab on Freenode IRC.

In further posts, I’ll go into how and why I designed the lab this way, what licensing I used, and how I went about building it from a practical point of view.