Masscan with HTTPS support

BI.ZONE
10 min readMar 21, 2022

--

By Konstantin Molodyakov

Masscan is a fast network scanner that is good for scanning a large range of IP addresses and ports. We’ve adapted it to our needs by giving it a little tweak.

The biggest inconvenience in the original version was the inability to collect banners from HTTPS servers. And what is a modern web without HTTPS? You can’t really scan anything. That’s what motivated us to modify masscan. As it usually happens, one little improvement led to another one, with some bugs being discovered along the way. Now we want to share our work with the community. All the modifications we’ll be talking about are already available in our repository on GitHub.

What are network scanners for

Network scanners are one of the universal tools in cybersecurity research. We use them to solve such tasks as perimeter analysis, vulnerability scanning, phishing and data leak detection, C&C detection, and host information collection.

How masscan works

Before we talk about the custom version, let’s understand how the original masscan works. If you are already familiar with it, you may be interested in the selection of useful scanner options. Or go straight to the section “Our modifications to masscan.”

The masscan project is small and, in our opinion, written scrupulously and logically. It was nice to see the abundance of comments — even deficiencies and kludges are clearly marked in the code:

Logically, the code can be divided into several parts as follows:

  • implementation of application protocols
  • implementation of the TCP stack
  • packet processing and transmission threads
  • implementation of output formats
  • reading raw packets

Let’s look at some of them in more detail.

Implementation of application protocols

Masscan is based on a modular concept. Thus, it can support any protocol, all you need is to register the appropriate structure and specify its use everywhere you need it (ha-ha):

Here’s a little description of the structure.

The protocol name and the standard port are informative only. The сtrl_flags field is not used anywhere.

The init function initiates the protocol, parse is the method responsible for processing the incoming data feed and generating response messages, and cleanup is the cleanup function for the connection.

The transmit_hello function is used to generate a hello packet if the server itself does not transmit something first, and the data from the hello field is used if the function is not specified.

The function that tests the functionality can be specified in the selftest.

Through this mechanism, for example, it’s possible to write handlers in Lua (the option --script). However, we never got around to checking if it really works. The thing we came across with masscan is that most of the interesting options are not described in the documentation, and the documentation itself is scattered in different places, partially overlapping. Part of the flags can only be found in the source code (main-conf.c). The --script option is one of them, and we have collected some other useful and interesting functions in the section "Useful options of the original masscan."

Implementation of the TCP stack

One of the reasons why masscan is so fast and can handle many simultaneous connections is its native implementation of the TCP stack*. It takes about 1,000 lines of code in the fileproto-tcp.c.

* A native TCP stack allows you to bypass OS restrictions, not to use OS resources, not to use heavier OS mechanisms, and to shorten the packet processing path

Packet processing and transmission threads

Masscan is fast and single-threaded. More specifically, it uses two threads per each network interface, one of which is a thread to process incoming packets. But no one really runs on more than one interface at a time.

One thread:

  1. reads raw data from the network interface.
  2. processes this data by running it through its own TCP stack and application protocol handlers.
  3. forms necessary data to be transmitted.
  4. stacks them in the transmit_queue.

The other thread takes the messages prepared for transmission from transmit_queue and writes them to the network interface (Fig. 1). If the messages sent from the queue do not exceed the limit, SYN packets are generated and sent for the next scanning targets.

Fig. 1. Packet processing and transmission schematic

Implementation of output formats

This part is conceptually similar to the modular implementation of protocols: it also has the OutputType structure that contains the main serialization functions. There's an abundance of all possilble output formats: custom binary, the modern NDJSON, the nasty XML, and the grepable. There's even the option of saving data to Redis. Let us know in the comments if you've tried it :)

Some formats are compatible with (or, as the author of masscan puts it, inspired by) similar utilities, such as nmap and unicornscan.

Reading raw packets

Masscan provides the ability to work with the network adapter through the PCAP or PFRING libraries, and to read data from the PCAP dump. The rawsock.c file contains several functions that abstract the main code from specific interfaces.

To select PFRING, you have to use the --pfring parameter, and to enable reading from the dump, you have to put the file prefix on the adapter name.

Useful options of the original masscan

Let’s take a look at some interesting and useful options of the original masscan that are rarely talked about.

Options

  • --nmap, --help
    Description: Help
    Comment: Even combined, these options give very little useful information. The documentation also contains incomplete information and is scattered in different files: README.md, man, FAQ. There’s also a small HOWTO on how to use the scanner together with AFL (american fuzzy lop). If you want to know about all the options, you can find the full list of them only in the source code (main-conf.c)
  • --output-format ndjson, -oD, --ndjson-status
    Description: NDJSON support
    Comment: Gigabytes of line-by-line NDJSON files are much nicer to handle than JSON. And the status output in NDJSON format is useful for writing utilities that monitor masscan performance
  • --output-format redis
    Description: Ability to save outputs directly to Redis
    Comment: Well, why not?:) If you haven’t worked with this tool, read about it here
  • --range fe80::/67
    Description: IPv6 support
    Comment: Everything’s clear here, but it would be interesting to read about real use cases in the comments. I can think of scanning a local network or only a small range of some particular country obtained through BGP
  • --http-*
    Description: HTTP request customization
    Comment: When creating an HTTP request, you can change any part of it to suit your needs: method, URI, version, headers, and/or body
  • --hello-[http, ssl, smbv1]
    Description: Scanning protocols on non-standard ports
    Comment: If masscan hasn’t received a hello packet from the target, its default setting is to send the request first, choosing a protocol based on the target’s port. But sometimes you might want to scan HTTP on some non-standard port
  • --resume
    Description: Pause
    Comment: Masscan knows how to delicately stop and resume where it paused. With Ctrl+C (SIGINT) masscan terminates, saving state and startup parameters, and with --resume it reads that data and continues operation
  • --rotate-size
    Description: Rotation of the output file
    Comment: The output can contain a lot of data, and this parameter allows you to specify the maximum file size at which the output will start to be written to the next file
  • --shard
    Description: Horizontal scaling
    Comment: Masscan pseudorandomly selects targets from the scanned range. If you want to run masscan on multiple machines within the same range, you can use this parameter to achieve the same random distribution even between machines
  • --top-ports
    Description: Scanning of N popular ports (array top_tcp_ports)
    Comment: This parameter came from nmap
  • --script
    Description: Lua scripts
    Comment: I have doubts that it works, but the possibility itself is interesting. Is there anyone who uses it? Let me know if you have any interesting examples
  • --vuln [heartbleed, ticketbleed, poodle, ntp-monlist]
    Description: Search for certain known vulnerabilities
    Comment: We cannot say anything about its correctness and efficiency, since this mechanism of vulnerability detection is a kind of kludge scattered throughout the code and conflicts with many other options, and we did not have to apply it in real tasks

Just to remind you of an important point everyone stumbles upon: masscan probably won’t work if you just run it to collect banners. The documentation does say this, but who cares to read it, right? Since masscan uses its own network stack, the OS knows nothing about the connections it creates and is rather surprised when it receives a packet (SYN, ACK) from somewhere in the network in response to a SYN request from the scanner. And then, depending on the type and settings of OS and firewall, the OS transmits an ICMP or RST packet, which is extremely adverse to the output. So you need to read the documentation and take this point into account.

Our modifications to masscan

We’ve added HTTPS support

The Internet is quite the fortress these days, even the most backward scammers have already given up on unencrypted HTTP. Therefore, it’s rather inconvenient without HTTPS support — this feature makes investigation, such as searching for C&C servers and phishing, much easier. There’re other tools besides masscan, but they are slower. We wanted to have a universal tool that would cover HTTPS and still be fast.

The first thing to do was to implement a full-fledged SSL. What the original masscan has is the ability to send a predefined hello packet then fetch and process a server certificate. Our version can establish and maintain an SSL connection and analyze the contents of nested protocols, which means it can collect HTTP banners from HTTPS servers.

Here’s how we achieved that. We added a new application-layer protocol to the source code and used the standard solution, OpenSSL, to implement SSL. Here we needed to do some fine-tuning, and the structure describing the application-layer protocol in the custom scanner looks like this:

We added handlers for protocol deinitialization, connection initiation and expanded the set of handler parameters. As a result, it became possible to handle nested protocols. We also managed to implement the change of application protocol handler more precisely. It is necessary when it’s impossible to process data with the current protocol or if such mechanism is embedded in the protocol itself, for example, when using STARTTLS.

Then we had some problems with performance and packet loss. SSL is heavy on the CPU. We had the option to try something faster than OpenSSL, but we went in the direction of processing incoming packets in several threads within one network interface. After implementing this, the packet processing pipeline looks like this:

Fig. 2. Updated packet processing and transmission schematic

The th_recv_read thread is needed to read data from the network interface regardless of the data processing speed. The q_recv_pb queue helps to detect cases when the data transmission speed is too high, and inbound packets cannot be processed in time. The th_recv_sched thread dispatches messages based on the hashes of the outbound and inbound IP addresses and ports to the th_recv_hdl_* threads so that the same connection falls into the same handler. The options related to this functionality are --num-handle-threads—the number of handler threads, and --tranquility—for automatic reduction of packet transmission speed when inbound packets cannot be handled fast enough.

HTTPS support is enabled with the parameter --dynamic-ssl while --output-filename-ssl-keys can be used to save master keys.

You can also notice a small cosmetic improvement — namely, the names of the threads. In our version, it became clear which threads consume resources:

Before
After

We’ve improved code quality

Masscan was found to have many strange things and errors. For example, the conversion of time to ticks** looked as follows:

** A unit of time measurement in which there’s enough accuracy, and which does not take up too much space

Network TCP connections were often handled incorrectly, resulting in broken connections and unnecessary repeat transmissions:

Fig. 3. Example of incorrect handling of network TCP connections

We also discovered errors in memory handling, including memory leaks. We managed to fix many of them, but not all. For example, when scanning /0:80, we see a leak of several ranges of 2 bytes each.

These errors were detected thanks to our colleagues, who meticulously used our developments, static analyzers (GCC, Clang, and VS), UB and memory sanitizers. Separately, I want to thank PVS-Studio. Those guys are unparalleled in quality and convenience.

We’ve added a build for different OSs

To consolidate the outputs, we’ve written a build and a test for Windows, Linux, and macOS using GitHub Actions.

The build pipeline looks like this (Fig. 4):

  • format check
  • static clang analyzer check
  • assembly debugging with sanitizers and running built-in tests
  • assembly and sending data to SonarCloud and CodeQL services
Fig. 4. Assembly pipeline

You can download compiled binaries from the build or release artifacts:

Fig. 5. Release artifacts

We’ve added a few more features

Here are the rest of the less significant things that were introduced in our version:

  • --regex(--regex-only-banners) is data-level message filtering in TCP. A regular expression is applied to the contents of each TCP packet. If the regular expression is triggered, the connection information will be in the output.
  • --dynamic-set-host is used to input the header hostinto a HTTP request. The IP address of the target being scanned is taken as a value.
  • Output of internal signature triggers on masscan protocols in the output.
  • An option to specify URIs in HTTP requests. We removed it later because the author of the original masscan added the same functionality. This is part of the --http-* options family.

--

--

BI.ZONE

BI.ZONE: an expert in digital risks management. We help organizations around the world to develop their businesses safely in the digital age