← All posts

Reading the Bandit source code: Part 1

Bandit is a new HTTP library that isn't only written in Elixir, but written with readable source code. Let's see what we can learn!

From the Bandit README

Bandit has been built from the ground up for use with Plug applications; this focus pays dividends in both performance and also in the approachability of the code base.

Bandit exists to demystify the lower layers of infrastructure code. In a world where The New Thing is nearly always adding abstraction on top of abstraction, it’s important to have foundational work that is approachable & understandable by users above it in the stack.

Prioritize (in order): correctness, clarity, performance. Seek to remove the mystery of infrastructure code by being approachable and easy to understand

Sounds great! Let’s see what we can learn from a blind reading of the code.

This is a blog post, not a video, so you aren’t going to get a live reaction. But I will do my best to type out my thoughts as we explore the code. Let’s go!

First off, we need the code.

I store all of my code from github in a $HOME/github directory with subdirectories per user/org.

e.g.

$HOME/github/livebook-dev/livebook
$HOME/github/sdball/ex_duck
# etc.

So let’s start there.

$ cd $HOME/github
$ mkdir mtrudel && cd mtrudel
$ git clone git@github.com:mtrudel/bandit.git
$ cd bandit

Now let’s explore!

$ ls -lah
Permissions Size User   Date Modified Name
.rw-r--r--   117 sdball 28 Dec 22:02  .ackrc
drwxr-xr-x     - sdball 28 Dec 22:07  .elixir_ls
.rw-r--r--    97 sdball 28 Dec 22:02  .formatter.exs
drwxr-xr-x     - sdball 29 Dec 10:18  .git
drwxr-xr-x     - sdball 28 Dec 22:02  .github
.rw-r--r--   615 sdball 28 Dec 22:02  .gitignore
drwxr-xr-x     - sdball 28 Dec 22:07  _build
.rw-r--r--  5.2k sdball 28 Dec 22:02  CODE_OF_CONDUCT.md
drwxr-xr-x     - sdball 28 Dec 22:06  deps
drwxr-xr-x     - sdball 28 Dec 22:02  lib
.rw-r--r--  1.1k sdball 28 Dec 22:02  LICENSE
.rw-r--r--  1.5k sdball 28 Dec 22:02  mix.exs
.rw-r--r--  7.0k sdball 28 Dec 22:02  mix.lock
.rw-r--r--  8.0k sdball 28 Dec 22:02  README.md
.rw-r--r--   336 sdball 28 Dec 22:02  SECURITY.md
drwxr-xr-x     - sdball 28 Dec 22:02  test

All the usual friends for an Elixir application.

Let’s see what lib tells us.

lib
├── bandit
│  ├── application.ex
│  ├── clock.ex
│  ├── delegating_handler.ex
│  ├── exceptions.ex
│  ├── headers.ex
│  ├── http1
│  │  ├── adapter.ex
│  │  └── handler.ex
│  ├── http2
│  │  ├── adapter.ex
│  │  ├── connection.ex
│  │  ├── errors.ex
│  │  ├── flow_control.ex
│  │  ├── frame
│  │  │  ├── continuation.ex
│  │  │  ├── data.ex
│  │  │  ├── goaway.ex
│  │  │  ├── headers.ex
│  │  │  ├── ping.ex
│  │  │  ├── priority.ex
│  │  │  ├── push_promise.ex
│  │  │  ├── rst_stream.ex
│  │  │  ├── settings.ex
│  │  │  ├── unknown.ex
│  │  │  └── window_update.ex
│  │  ├── frame.ex
│  │  ├── handler.ex
│  │  ├── README.md
│  │  ├── settings.ex
│  │  ├── stream.ex
│  │  ├── stream_collection.ex
│  │  └── stream_task.ex
│  ├── initial_handler.ex
│  ├── phoenix_adapter.ex
│  ├── pipeline.ex
│  └── websocket
│     ├── connection.ex
│     ├── frame
│     │  ├── binary.ex
│     │  ├── connection_close.ex
│     │  ├── continuation.ex
│     │  ├── ping.ex
│     │  ├── pong.ex
│     │  └── text.ex
│     ├── frame.ex
│     ├── handler.ex
│     ├── handshake.ex
│     ├── permessage_deflate.ex
│     └── socket.ex
└── bandit.ex

A few things look interesting right away.

Ok we have some idea of what we have in this repo.

Now since Bandit is implementing defined protocols we can probably find some RFC references.

# count the files that contain "RFC"
$ rg -l RFC | wc -l
      36

Yes, 36 files have “RFC”

Let’s see what they are!

# count the lines that contain "RFC"
$ rg "RFC" | wc -l
     155

155 lines, cool cool.

Scrolling through the results of rg "RFC" it looks like some RFC mentions have a space like RFC 7692 and some do not like RFC6455§7.1.2

Let’s get a count of the specific RFCs and see what matters to Bandit.

First we’ll extract all the RFC references

$ rg -o "RFC[[:space:]]?\d+§?[.0-9]*" --no-filename --no-line-number | head
RFC7540§6.5.2
RFC7540§6
RFC7540§8.1.2.3
RFC7540§8.1.2.3
RFC3986
RFC7540§8.1.2.2
RFC7540§8.1.2.1
RFC7540§8.1.2.3
RFC7540§8.1.2
RFC7540§8.1.2.2

Then we’ll count them upgrade

$ rg -o "RFC[[:space:]]?\d+§?[.0-9]*" --no-filename --no-line-number | sort | uniq -
c | sort -n
   1 RFC 7540
   1 RFC 7540.
   1 RFC 7692
   1 RFC2616§13.5.1
   1 RFC2616§4.
   1 RFC3986
   1 RFC6455
   1 RFC6455§4.2
   1 RFC6455§4.2.1
   1 RFC6455§4.2.2
   1 RFC6455§5.5.1
   1 RFC6455§7.4.1
   1 RFC7230§4.1
   1 RFC7540§11
   1 RFC7540§3.5
   1 RFC7540§5.1.
   1 RFC7540§5.1.1
   1 RFC7540§6
   1 RFC7540§6.5.2
   1 RFC7540§6.9.
   1 RFC7540§6.9.1
   1 RFC7540§8.1
   1 RFC7540§8.1.2
   1 RFC7540§8.1.2.2.
   1 RFC7540§8.1.2.6
   1 RFC7692
   1 RFC7692§7
   1 RFC9110§5.6.7.
   1 RFC9110§8.6
   1 RFC9112§3.2
   1 RFC9112§3.2.1
   1 RFC9112§3.2.3
   1 RFC9112§6.3.3
   2 RFC6455§5.2
   2 RFC7540§4.2
   2 RFC7540§5.3.1
   2 RFC7540§8.1.2.1
   2 RFC7540§8.2
   2 RFC9112§3.2.4
   2 RFC9112§6.3.5
   3 RFC6455§8.1
   3 RFC7540§8.1.2.2
   3 RFC7540§8.1.2.5
   3 RFC9112§3.2.2
   4 RFC6455§5.5.3
   4 RFC7540§6.1
   4 RFC7540§6.10
   4 RFC7540§6.3
   4 RFC7540§6.4
   4 RFC7540§6.7
   4 RFC7540§6.8
   5 RFC6455§5.5
   5 RFC6455§5.5.2
   5 RFC7540§6.2
   5 RFC7540§8.1.2.3
   6 RFC7540§6.9
   8 RFC7692§6.1
   9 RFC6455§5.4
  10 RFC6455§7.1.2
  16 RFC7540§6.5

Gosh that’s a lot of sections. Let’s skip those and count the RFCs themselves. Let’s also eliminate any space between “RFC” and the RFC number so we count consistently.

$ rg -o "RFC[[:space:]]?\d+" --no-filename --no-line-number | sed -e 's/[[:space:]]//' | sort | uniq -c | sort -n
   1 RFC3986
   1 RFC7230
   2 RFC2616
   2 RFC9110
  11 RFC7692
  11 RFC9112
  44 RFC6455
  84 RFC7540

There. Clearly RFC7540 is super important to Bandit.

But maybe tests are skewing the results if there’s something like a bunch of tests mentioning specific RFC details. Let’s see what’s called out in lib

$ rg -o "RFC[[:space:]]?\d+" --no-filename --no-line-number lib | sed -e 's/[[:space:]]//' | sort | uniq -c | sort -n
   1 RFC3986
   1 RFC7230
   2 RFC2616
   2 RFC9110
   4 RFC9112
   6 RFC7692
  17 RFC6455
  55 RFC7540

Ok a similar ratio of results and this nicely demonstrates that the tests do, in fact, directly mention RFCs a fair number of times. That’s a good sign for the readability of the tests.

Let’s take a look at the repo’s Git history and see what we’re dealing with as far as authors, commits, and style.

Bandit Git history as of 2022-12-29

Note: git r calls a pretty git log script.

Looks like Bandit is either a stable, approachable project or a lot of work is being overly squashed into commits.

Let’s see what’s been most affected by, say, the last three months of commits using a custom git command: git churn

$ git churn --since="3 months ago" | tail
6	test/bandit/http2/plug_test.exs
6	test/bandit/http2/protocol_test.exs
6	test/bandit/websocket/http1_handshake_test.exs
6	test/support/simple_websocket_client.ex
10	lib/bandit.ex
10	mix.lock
14	test/bandit/http1/request_test.exs
15	mix.exs
16	README.md
16	lib/bandit/http1/adapter.ex

That count there is the number of commits affecting those files. We can see that most of the work is going on in the http1 adapter. But it also looks like the README is being kept updated. Awesome. And that tests are getting as much attention as code. Great!


Bandit is a new HTTP library that isn't only written in Elixir, but written with readable source code. Let's see what we can learn!