Masscan is a fast network scanner that is good for scanning a large range of IP addresses and ports. We’ve adapted it to our needs by giving it a little tweak.
The biggest inconvenience in the original version was the inability to collect banners from HTTPS servers. And what is a modern web without HTTPS? You can’t really scan anything. That’s what motivated us to modify masscan. As it usually happens, one little improvement led to another one, with some bugs being discovered along the way. Now we want to share our work with the community. All the modifications we’ll be talking about are already available in our repository on GitHub.
What are network scanners for
Network scanners are one of the universal tools in cybersecurity research. We use them to solve such tasks as perimeter analysis, vulnerability scanning, phishing and data leak detection, C&C detection, and host information collection.
How masscan works
Before we talk about the custom version, let’s understand how the original masscan works. If you are already familiar with it, you may be interested in the selection of useful scanner options. Or go straight to the section “Our modifications to masscan.”
The masscan project is small and, in our opinion, written scrupulously and logically. It was nice to see the abundance of comments — even deficiencies and kludges are clearly marked in the code:
Logically, the code can be divided into several parts as follows:
- implementation of application protocols
- implementation of the TCP stack
- packet processing and transmission threads
- implementation of output formats
- reading raw packets
Let’s look at some of them in more detail.
Implementation of application protocols
Masscan is based on a modular concept. Thus, it can support any protocol, all you need is to register the appropriate structure and specify its use everywhere you need it (ha-ha):
Here’s a little description of the structure.
The protocol name and the standard port are informative only. The
сtrl_flags field is not used anywhere.
init function initiates the protocol,
parse is the method responsible for processing the incoming data feed and generating response messages, and
cleanup is the cleanup function for the connection.
transmit_hello function is used to generate a hello packet if the server itself does not transmit something first, and the data from the
hello field is used if the function is not specified.
The function that tests the functionality can be specified in the
Through this mechanism, for example, it’s possible to write handlers in Lua (the option
--script). However, we never got around to checking if it really works. The thing we came across with masscan is that most of the interesting options are not described in the documentation, and the documentation itself is scattered in different places, partially overlapping. Part of the flags can only be found in the source code (
--script option is one of them, and we have collected some other useful and interesting functions in the section "Useful options of the original masscan."
Implementation of the TCP stack
One of the reasons why masscan is so fast and can handle many simultaneous connections is its native implementation of the TCP stack*. It takes about 1,000 lines of code in the file
* A native TCP stack allows you to bypass OS restrictions, not to use OS resources, not to use heavier OS mechanisms, and to shorten the packet processing path
Packet processing and transmission threads
Masscan is fast and single-threaded. More specifically, it uses two threads per each network interface, one of which is a thread to process incoming packets. But no one really runs on more than one interface at a time.
- reads raw data from the network interface.
- processes this data by running it through its own TCP stack and application protocol handlers.
- forms necessary data to be transmitted.
- stacks them in the
The other thread takes the messages prepared for transmission from
transmit_queue and writes them to the network interface (Fig. 1). If the messages sent from the queue do not exceed the limit, SYN packets are generated and sent for the next scanning targets.
Implementation of output formats
This part is conceptually similar to the modular implementation of protocols: it also has the
OutputType structure that contains the main serialization functions. There's an abundance of all possilble output formats: custom binary, the modern
NDJSON, the nasty
XML, and the grepable. There's even the option of saving data to Redis. Let us know in the comments if you've tried it :)
Reading raw packets
Masscan provides the ability to work with the network adapter through the PCAP or PFRING libraries, and to read data from the PCAP dump. The
rawsock.c file contains several functions that abstract the main code from specific interfaces.
To select PFRING, you have to use the
--pfring parameter, and to enable reading from the dump, you have to put the
file prefix on the adapter name.
Useful options of the original masscan
Let’s take a look at some interesting and useful options of the original masscan that are rarely talked about.
--nmap, --helpDescription: Help
Comment: Even combined, these options give very little useful information. The documentation also contains incomplete information and is scattered in different files: README.md, man, FAQ. There’s also a small HOWTO on how to use the scanner together with AFL (american fuzzy lop). If you want to know about all the options, you can find the full list of them only in the source code (main-conf.c)
Comment: Gigabytes of line-by-line
NDJSONfiles are much nicer to handle than
JSON.And the status output in
NDJSONformat is useful for writing utilities that monitor masscan performance
--output-format redisDescription: Ability to save outputs directly to Redis
Comment: Well, why not?:) If you haven’t worked with this tool, read about it here
--range fe80::/67Description: IPv6 support
Comment: Everything’s clear here, but it would be interesting to read about real use cases in the comments. I can think of scanning a local network or only a small range of some particular country obtained through BGP
--http-*Description: HTTP request customization
Comment: When creating an HTTP request, you can change any part of it to suit your needs: method, URI, version, headers, and/or body
--hello-[http, ssl, smbv1]Description: Scanning protocols on non-standard ports
Comment: If masscan hasn’t received a hello packet from the target, its default setting is to send the request first, choosing a protocol based on the target’s port. But sometimes you might want to scan HTTP on some non-standard port
Comment: Masscan knows how to delicately stop and resume where it paused. With
Ctrl+C (SIGINT)masscan terminates, saving state and startup parameters, and with
--resumeit reads that data and continues operation
--rotate-sizeDescription: Rotation of the output file
Comment: The output can contain a lot of data, and this parameter allows you to specify the maximum file size at which the output will start to be written to the next file
--shardDescription: Horizontal scaling
Comment: Masscan pseudorandomly selects targets from the scanned range. If you want to run masscan on multiple machines within the same range, you can use this parameter to achieve the same random distribution even between machines
--top-portsDescription: Scanning of N popular ports (array
Comment: This parameter came from nmap
--scriptDescription: Lua scripts
Comment: I have doubts that it works, but the possibility itself is interesting. Is there anyone who uses it? Let me know if you have any interesting examples
--vuln [heartbleed, ticketbleed, poodle, ntp-monlist]Description: Search for certain known vulnerabilities
Comment: We cannot say anything about its correctness and efficiency, since this mechanism of vulnerability detection is a kind of kludge scattered throughout the code and conflicts with many other options, and we did not have to apply it in real tasks
Just to remind you of an important point everyone stumbles upon: masscan probably won’t work if you just run it to collect banners. The documentation does say this, but who cares to read it, right? Since masscan uses its own network stack, the OS knows nothing about the connections it creates and is rather surprised when it receives a packet
(SYN, ACK) from somewhere in the network in response to a SYN request from the scanner. And then, depending on the type and settings of OS and firewall, the OS transmits an ICMP or RST packet, which is extremely adverse to the output. So you need to read the documentation and take this point into account.
Our modifications to masscan
We’ve added HTTPS support
The Internet is quite the fortress these days, even the most backward scammers have already given up on unencrypted HTTP. Therefore, it’s rather inconvenient without HTTPS support — this feature makes investigation, such as searching for C&C servers and phishing, much easier. There’re other tools besides masscan, but they are slower. We wanted to have a universal tool that would cover HTTPS and still be fast.
The first thing to do was to implement a full-fledged SSL. What the original masscan has is the ability to send a predefined hello packet then fetch and process a server certificate. Our version can establish and maintain an SSL connection and analyze the contents of nested protocols, which means it can collect HTTP banners from HTTPS servers.
Here’s how we achieved that. We added a new application-layer protocol to the source code and used the standard solution, OpenSSL, to implement SSL. Here we needed to do some fine-tuning, and the structure describing the application-layer protocol in the custom scanner looks like this:
We added handlers for protocol deinitialization, connection initiation and expanded the set of handler parameters. As a result, it became possible to handle nested protocols. We also managed to implement the change of application protocol handler more precisely. It is necessary when it’s impossible to process data with the current protocol or if such mechanism is embedded in the protocol itself, for example, when using STARTTLS.
Then we had some problems with performance and packet loss. SSL is heavy on the CPU. We had the option to try something faster than OpenSSL, but we went in the direction of processing incoming packets in several threads within one network interface. After implementing this, the packet processing pipeline looks like this:
th_recv_read thread is needed to read data from the network interface regardless of the data processing speed. The
q_recv_pb queue helps to detect cases when the data transmission speed is too high, and inbound packets cannot be processed in time. The
th_recv_sched thread dispatches messages based on the hashes of the outbound and inbound IP addresses and ports to the
th_recv_hdl_* threads so that the same connection falls into the same handler. The options related to this functionality are
--num-handle-threads—the number of handler threads, and
--tranquility—for automatic reduction of packet transmission speed when inbound packets cannot be handled fast enough.
HTTPS support is enabled with the parameter
--output-filename-ssl-keys can be used to save master keys.
You can also notice a small cosmetic improvement — namely, the names of the threads. In our version, it became clear which threads consume resources:
We’ve improved code quality
Masscan was found to have many strange things and errors. For example, the conversion of time to ticks** looked as follows:
** A unit of time measurement in which there’s enough accuracy, and which does not take up too much space
Network TCP connections were often handled incorrectly, resulting in broken connections and unnecessary repeat transmissions:
We also discovered errors in memory handling, including memory leaks. We managed to fix many of them, but not all. For example, when scanning
/0:80, we see a leak of several ranges of 2 bytes each.
These errors were detected thanks to our colleagues, who meticulously used our developments, static analyzers (GCC, Clang, and VS), UB and memory sanitizers. Separately, I want to thank PVS-Studio. Those guys are unparalleled in quality and convenience.
We’ve added a build for different OSs
To consolidate the outputs, we’ve written a build and a test for Windows, Linux, and macOS using GitHub Actions.
The build pipeline looks like this (Fig. 4):
- format check
- static clang analyzer check
- assembly debugging with sanitizers and running built-in tests
- assembly and sending data to SonarCloud and CodeQL services
You can download compiled binaries from the build or release artifacts:
We’ve added a few more features
Here are the rest of the less significant things that were introduced in our version:
--regex(--regex-only-banners)is data-level message filtering in TCP. A regular expression is applied to the contents of each TCP packet. If the regular expression is triggered, the connection information will be in the output.
--dynamic-set-hostis used to input the header
hostinto a HTTP request. The IP address of the target being scanned is taken as a value.
- Output of internal signature triggers on masscan protocols in the output.
- An option to specify URIs in HTTP requests. We removed it later because the author of the original masscan added the same functionality. This is part of the