Kee

Written by on February 4th, 2024.

What is it?

Kee is a CLI monitoring tool for your backend systems. While it doesn't replace well known monitoring services such as CloudWatch, Grafana, Datadog... it aims to report on a broad range of layers and services of your backend systems. Check the overall health of your systems at a glance, from the cloud providers themselves, to your orchestrator and application, but also build-time systems such as your version control platform.

Kee is easy to configure, use, and is designed to run on your laptop, not on the server. It's written in Go and can simply be installed with go install gitlab.com/mwwaa/[email protected]. Configuration files are written with Terraform's' HCL, and supports variables.

Why?

Saturday, 2am

The first notification barely brought your body to a lighter sleep phase. But the second notification, 3 minutes later, really woke you up. No doubt, an alarm was triggered from one of your backend systems. You barely find your phone without dropping it, the backlight hurts your eyes. You acknowledge the alarm. Your significant other is significantly annoyed. You try to put on some minimum amount of clothes without making too much noise or light, but it's pointless since even your cat is on alert by now.

You found your way to your home office. You take another look at the alert from your phone, but you have to dig a bit too much to find the human readable description of the alarm. You wake up your laptop, and maybe you have to try your password twice. You're still not sure what happened and what you should check first. Should you start with k9s to check your Kubernetes cluster? Or log in your cloud provider console and look at dashboards? Maybe try a few requests from Postman, see how much of the user-facing things are impacted? Wait, does the alerting app's link to the cause even works?

You remember a tool that your coworker mentionned. This tool is Kee, and will show you what's up as clearly, quickly and simply as possible.

Tuesday, 10am

Today's a big day. You're about to upgrade a database using a blue/green deployment. This will take most of your day. There is a short downtime scheduled for your app, but it should mostly go unnoticed. Still, you want to keep an eye on everything during the whole procedure: making sure your cloud provider doesn't go down while running the critical steps, that your app goes down and back up as expected, and if anything unexpected happens.

Yes, Kee is great for that purpose as well.

Thursday, 3pm

Your coworker blesses you a with Slack message. The deployment pipeline is stuck. It could be your version control platform, your self-hosted CI runners, maybe the build artifact registry? Or even your orchestrator? Did someone click-opsed change that super important TXT record again?

Because Kee is designed to monitor 3rd parties as well as your own cloud resources and systems, you get a holistic view of ... everything. With the complexity and ever growing inter-dependencies of systems today, having a custom, simplified, overall view of it all appears to be... useful.

Look here to get started or read more about how Kee is built below.

Architecture

Probe providers registry

The probe provider system is very much Terraform-inspired. Not everyone runs the same things, so I wanted the ability to configure different monitoring clients. A probe provider is basically a name (string) and a function. The function, provided adequate configuration, returns a object that implements a single-function interface, Prober.

Contrary to how Terraform works, all probes are currently statically compiled in the single Kee binary. Terraform has a main binary, but then downloads different providers for different clouds and services as needed, verifies their signature, runs them and communicates with them in order to provision the infrastructure. Kee on the other hand, has an internal dictionary (map[string]provider) and when the executable is run, each probe registers itself on that map. Then the configuration is read and the probe providers are found from the map.

There are, however, benefits to the Terraform approach of downloading separate specific binaries, that would be useful if Kee grows more:

Of course this would also be a lot of work:

There is a third solution, even more static: instead of a Go map, use a switch statement. Since finding probes by name is only done once at startup, it's unlikely that a switch will bring any noticeable improvement. It also feels less flexible and future-proof than a map: you can range over all items in a map, but not on a switch. Below I write about more automation, and being able to list all providers seems like a important capability to make progress towards that goal.

Concurrency

It's not an accident if Kee's design seems to fit the goroutine/channel model perfectly: I like playing with those. Basically each configured probe runs in its own goroutine. The Probe() method is run at the configured interval, and its results are sent and routed through the app pipeline via Go channels. Keystrokes are intercepted from the UI library and their meaning sent from the view goroutine to the controller goroutine.

Shortly put, goroutines are used to run multiple Go functions concurrently. Their execution is scheduled on threads by the Go runtime, which is built in your binaries by the compiler. Goroutines are lightweight because they only have a stack; the heap is shared and not protected against concurrent use at all. So to safely share data between them, the chan is the promoted way.

A channel is both a synchronisation mechanism and a data sharing mechanism. Unless specified otherwise, both goroutines are blocked at the send/receive expression until they are both ready for the exchange to happen. A third concept, Contexts, will help you carry cancellation signals across goroutines (signals when a infinite-loop goroutine should break its loop and cleanup the resources it allocated) as well as arbitrary configuration-like data or shared threadsafe resources (context.WithValue).

While I feel channels are a good fit in Kee, you can end up with channel spaghettis, and in some places it's just easier to use well known Mutexes. I did use chans all the way as an experiment to see if it would work and still be maintainable. But there are interesting reads about the naive use of channels.

The Go runtime also handles non-blocking I/O on the side by managing an event-loop thread, and you can read more about this here and here. These articles provide more background about the Go scheduler and its management of network operations.

MVC

As hinted above, Kee is organised following the well known Model-View-Controller paradigm. Each of these functions are performed in their own goroutine, except for the model:

Currently the view is entirely built from scratch by the controller everytime there is an update, but this could be optimised by sending targeted updates to the view component, if needed.

Less is more

The Go language prides itself with being stubbornely simple, even boring. Who never got tired of writing if err != nil, even with editor autocompletion? But the underlying philosophy, less is more, is very powerful. Because you don't have many keywords, grammar rules and concepts in the language, and because the features that do exist are working together in a very orthogonal way - no surprises where two cool features can't be used together - the programmer can really focus on building the concepts and primitives that are important in his use case.

In Kee, I used both object oriented and functional style approaches, for different purposes. Data structures that are allocated at program startup, used throughout the life of the process, and not shared across goroutines are objects (pointers to structs).

But short-lived, passed around threads data structures are values. This may be slightly inefficient (they have to be copied in memory everytime they are passed to a function/sent on a chan/returned from a function...) but leaves no possibility of data races. When operating on those values (common model methods like filtering, mapping, ...), we return copies instead of mutating objects. Go's garbage collector works, and in practice these memory copies don't seem to affect performance nor memory usage.

One optimisation I use as much as possible is preallocating slices (aka vectors) to a known capacity. I did not use any of the newer generics features of the language.

It's true that with any language, no matter how complicated, you can try to limit the subset of features you use, and find yourself on Go territory. I remember Javascript: The good parts book by Douglas Crockford - its thinness compared to the Definitive guide being a recurring joke and even a meme.

But such an approach of using subsets of a language isn't always practical. You may need to use a 3rd party library that forces you to use features you didn't want to. Or you may not agree with your coworkers on which parts are the good ones. With Go you don't have many parts, so it's harder to debate which ones you won't use.

Refactoring & Tests

Go code is easy enough to refactor. Because of the language simplicity, the straightforward approach to packages and the ability to split a single-file package to multiple files, you don't need much more than a global find/replace functionnality to rework a codebase. Of course having tests will help you check that you didn't break anything.

Kee's testsuite is not covering the whole codebase, but I take pride in testing the providers as close to reality as possible: the AWS CloudWatch probe really sets up testing alarms in AWS and verifies behaviour from the real cloud provider. Same goes for Kubernetes and ping: they are tested against the real things. The HTTP test uses the handy Go test server, while the DNS probe test spins up a local DNS server.

Librairies

HCL

Kee was not built from scratch. An important library that I use is Hashicorp's HCL. It was very interesting to dig into the internals of the configuration language for Terraform, and it brings nice benefits:

tview

The tview package handles the UI. It's used by other tools I love such as k9s. It works great so far, but I'm overriding the controls keyboard shortcuts for simplicity - so it's mostly used to display information, and not so much to navigate in the app.

Clients

Probes are mostly wrappers around a client library, for example the AWS Go SDK or the Kubernetes client. Some are easier to deal with than others. Kubernetes has complex data structures. The ping library I use wants to continuously run in its own goroutine. DNS somehow seems to be always more complicated than what I thought I understood of it.

Next

Probes

There is still much work to do, and many improvements to bring. One area of course is the probe providers:

Features

I also have lots of ideas for global features:

Automation, pipeline, documentation

Give it a try, and happy monitoring!