Masscan with HTTPS support

How masscan works

Before we talk about the custom version, let’s understand how the original masscan works. If you are already familiar with it, you may be interested in the selection of useful scanner options. Or go straight to the section “Our modifications to masscan.”

  • implementation of application protocols
  • implementation of the TCP stack
  • packet processing and transmission threads
  • implementation of output formats
  • reading raw packets

Implementation of application protocols

Masscan is based on a modular concept. Thus, it can support any protocol, all you need is to register the appropriate structure and specify its use everywhere you need it (ha-ha):

Implementation of the TCP stack

One of the reasons why masscan is so fast and can handle many simultaneous connections is its native implementation of the TCP stack*. It takes about 1,000 lines of code in the fileproto-tcp.c.

Packet processing and transmission threads

Masscan is fast and single-threaded. More specifically, it uses two threads per each network interface, one of which is a thread to process incoming packets. But no one really runs on more than one interface at a time.

  1. reads raw data from the network interface.
  2. processes this data by running it through its own TCP stack and application protocol handlers.
  3. forms necessary data to be transmitted.
  4. stacks them in the transmit_queue.
Fig. 1. Packet processing and transmission schematic

Implementation of output formats

This part is conceptually similar to the modular implementation of protocols: it also has the OutputType structure that contains the main serialization functions. There's an abundance of all possilble output formats: custom binary, the modern NDJSON, the nasty XML, and the grepable. There's even the option of saving data to Redis. Let us know in the comments if you've tried it :)

Reading raw packets

Masscan provides the ability to work with the network adapter through the PCAP or PFRING libraries, and to read data from the PCAP dump. The rawsock.c file contains several functions that abstract the main code from specific interfaces.

Useful options of the original masscan

Let’s take a look at some interesting and useful options of the original masscan that are rarely talked about.

  • --nmap, --help
    Description: Help
    Comment: Even combined, these options give very little useful information. The documentation also contains incomplete information and is scattered in different files: README.md, man, FAQ. There’s also a small HOWTO on how to use the scanner together with AFL (american fuzzy lop). If you want to know about all the options, you can find the full list of them only in the source code (main-conf.c)
  • --output-format ndjson, -oD, --ndjson-status
    Description: NDJSON support
    Comment: Gigabytes of line-by-line NDJSON files are much nicer to handle than JSON. And the status output in NDJSON format is useful for writing utilities that monitor masscan performance
  • --output-format redis
    Description: Ability to save outputs directly to Redis
    Comment: Well, why not?:) If you haven’t worked with this tool, read about it here
  • --range fe80::/67
    Description: IPv6 support
    Comment: Everything’s clear here, but it would be interesting to read about real use cases in the comments. I can think of scanning a local network or only a small range of some particular country obtained through BGP
  • --http-*
    Description: HTTP request customization
    Comment: When creating an HTTP request, you can change any part of it to suit your needs: method, URI, version, headers, and/or body
  • --hello-[http, ssl, smbv1]
    Description: Scanning protocols on non-standard ports
    Comment: If masscan hasn’t received a hello packet from the target, its default setting is to send the request first, choosing a protocol based on the target’s port. But sometimes you might want to scan HTTP on some non-standard port
  • --resume
    Description: Pause
    Comment: Masscan knows how to delicately stop and resume where it paused. With Ctrl+C (SIGINT) masscan terminates, saving state and startup parameters, and with --resume it reads that data and continues operation
  • --rotate-size
    Description: Rotation of the output file
    Comment: The output can contain a lot of data, and this parameter allows you to specify the maximum file size at which the output will start to be written to the next file
  • --shard
    Description: Horizontal scaling
    Comment: Masscan pseudorandomly selects targets from the scanned range. If you want to run masscan on multiple machines within the same range, you can use this parameter to achieve the same random distribution even between machines
  • --top-ports
    Description: Scanning of N popular ports (array top_tcp_ports)
    Comment: This parameter came from nmap
  • --script
    Description: Lua scripts
    Comment: I have doubts that it works, but the possibility itself is interesting. Is there anyone who uses it? Let me know if you have any interesting examples
  • --vuln [heartbleed, ticketbleed, poodle, ntp-monlist]
    Description: Search for certain known vulnerabilities
    Comment: We cannot say anything about its correctness and efficiency, since this mechanism of vulnerability detection is a kind of kludge scattered throughout the code and conflicts with many other options, and we did not have to apply it in real tasks

Our modifications to masscan

We’ve added HTTPS support

The Internet is quite the fortress these days, even the most backward scammers have already given up on unencrypted HTTP. Therefore, it’s rather inconvenient without HTTPS support — this feature makes investigation, such as searching for C&C servers and phishing, much easier. There’re other tools besides masscan, but they are slower. We wanted to have a universal tool that would cover HTTPS and still be fast.

Fig. 2. Updated packet processing and transmission schematic
Before
After

We’ve improved code quality

Masscan was found to have many strange things and errors. For example, the conversion of time to ticks** looked as follows:

Fig. 3. Example of incorrect handling of network TCP connections

We’ve added a build for different OSs

To consolidate the outputs, we’ve written a build and a test for Windows, Linux, and macOS using GitHub Actions.

  • format check
  • static clang analyzer check
  • assembly debugging with sanitizers and running built-in tests
  • assembly and sending data to SonarCloud and CodeQL services
Fig. 4. Assembly pipeline
Fig. 5. Release artifacts

We’ve added a few more features

Here are the rest of the less significant things that were introduced in our version:

  • --regex(--regex-only-banners) is data-level message filtering in TCP. A regular expression is applied to the contents of each TCP packet. If the regular expression is triggered, the connection information will be in the output.
  • --dynamic-set-host is used to input the header hostinto a HTTP request. The IP address of the target being scanned is taken as a value.
  • Output of internal signature triggers on masscan protocols in the output.
  • An option to specify URIs in HTTP requests. We removed it later because the author of the original masscan added the same functionality. This is part of the --http-* options family.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
BI.ZONE

BI.ZONE

BI.ZONE: an expert in digital risks management. We help organizations around the world to develop their businesses safely in the digital age