Understanding the firezone architecture to package it on NixOS

oddlama · January 21, 2025, 2:54pm

I’m packaging firezone for NixOS and need some help understanding the internals to make some design decisions. I’ve read Architecture: Overview • Firezone Docs and the respective subpages but some things still elude me - I’d be very grateful if someone can explain some specifics to me:

Thomas commented on my draft PR in NixOS that you refer to the api, domain and web elixir “packages” as the control plane and portal, which are also the only terms I read about in your docs. But just from reading your code, I cannot infer which component is which. From my limited understanding I would assume the control plane is the api package and the portal consists of both domain and web. But as I require distinct packages for each elixir component I cannot make a single package for the portal. If I should follow the upstream terminology, what would be your recommendation to call the packages?

Another thing that is currently unclear to me is what the domain executable really does and how it communicates with the api and web servers. Does it use a postgres based message queue? I don’t see any ports exposed on the domain docker-compose example. Your documentation sometimes mentions a policy engine, is this what the domain is supposed to do?

jamil · January 21, 2025, 3:22pm

Hi @oddlama, thanks for the thoughtful questions.

What we typically refer to as the “control plane” refers to the entire elixir/ directory. This directory is structured as an Elixir umbrella with the following children:

api: Provides both Websocket endpoints for all components to connect to, and also a REST API for managing configuration in your account.
web: Runs the admin portal web UI
domain: Provides an interface to interact with the Postgres DB (through Ecto) and also provides mechanisms for running background and periodic jobs.

The api and web applications talk to the Postgres DB through the domain application. While you’re correct that domain does not have any public listening ports, the three applications are joined into a single Erlang cluster which allows them to communicate with each other using the standard RPC mechanisms provided by the BEAM runtime.

This community-provided Helm chart may yield more insights as to the service requirements and how they interact: GitHub - Intuinewin/helm-charts

Hope that helps!

oddlama · January 23, 2025, 12:47am

Thank you, this makes much more sense to me now.

The api and web applications talk to the Postgres DB through the domain application.

I noticed that all three applications still require the database connection details and actually do connect to the postgres database. Is there any way I can prevent this?

In general, a lot of the configuration variables that seem domain-specific to me will result in a startup error when executing the web or api container without them (e.g. OUTBOUND_EMAIL_FROM). Is there any specific reason for why this is the case?

jamil · January 23, 2025, 1:28am

Sorry, that’s correct. The api and web call the modules in domain, but do talk directly to the postgres DB as opposed to proxying data across the Erlang cluster, which wouldn’t be ideal from a performance perspective.

Mix Configs for umbrella apps are umbrella-wide. Even though the config entries in elixir/config use app-specific keys, the config needs to compile no matter which application you start up. This could likely be improved such that the amount of config is minimized for each app to start, but we haven’t gotten around to that.

In general you’ll find that the primary reasons we use separate Elixir applications for api, web and domain are:

to split scaling requirements between the three
to expose only the required ports publicly for each application

oddlama · January 23, 2025, 1:46am

Thank you for the swift response, very insightful!

oddlama · January 24, 2025, 1:13am

If I may, I’d like to ask a few more questions:

Should every component run the migrations or is it sufficient to run them only on one of them (e.g. just domain)? It looks like they all try to run the same migrations.
Since you said mix configs are umbrella-wide, would it be an issue if I omit the secrets (*key_base, *salt) and some other settings from the web and api processes? Or may the web and api components internally access this?
I’ve currently configured a test cluster that spins up fine, but then fails to send signup mails via the Swoosh SMTP adapter. Unfortunately there are no errors logged, so I’d like to add some in the code. Could you point me at the place where the emails are sent so I can debug this?

oddlama · January 24, 2025, 2:52pm

Unfortunately there are no errors logged, so I’d like to add some in the code. Could you point me at the place where the emails are sent so I can debug this?

Found it in the meantime, this function discards the reason. Apparently the Swoosh.SMTP client has a ton of issues connecting to TLSv1.3 enabled servers (TLS + SMTP Error when using OTP/26 · Issue #785 · swoosh/swoosh · GitHub) and the workaround looks nasty… . I’ll probably add mua to my local instance.

jamil · January 26, 2025, 2:40pm

Should every component run the migrations or is it sufficient to run them only on one of them (e.g. just domain)?

Migrations only need to be run from one app, though there’s no harm in running them multiple times. It’ll just be a no-op.

Since you said mix configs are umbrella-wide, would it be an issue if I omit the secrets (*key_base, *salt) and some other settings from the web and api processes? Or may the web and api components internally access this?

Hm I’m not sure I’m following. Could you clarify what you mean by omitting them from the process and internally accessed?

Do you mean omit them from the environment? No, you shouldn’t do that for production. These secrets are used to protect cookies and prevent CSRF. The secrets should be generated fresh for each new instance of Firezone. The ones you see in config/config.exs are defaults used for development.

I would highly recommend reading this excellent guide for anyone wishing to package an Elixir application:

https://hexdocs.pm/mix/Mix.Tasks.Release.html

I’m not too familiar with NixOS, but the generally accepted distribution mechanism for Elixir applications is Docker. This is because mix releases require the same runtime environment as the system that the release was created on. If you don’t want to use Docker, the only other option is to run the mix release on the same OS+version you plan to run Firezone from.

You might find our old Omnibus build system a helpful resource: firezone/omnibus at legacy · firezone/firezone · GitHub

This is what we did in Firezone <= 0.7 (a completely different architecture, but still an Elixir app).

oddlama · January 27, 2025, 1:45pm

Migrations only need to be run from one app, though there’s no harm in running them multiple times.

Awesome, I was just worried it might lead to a data race.

Hm I’m not sure I’m following. Could you clarify what you mean by omitting them from the process and internally accessed?

Okay lets look at an example. From what I can tell, the OUTBOUND_EMAIL_ADAPTER option is required to be set so that either of web, api and domain starts up correctly. This is where you said it had to do with the option being defined “globally” in the elixir project. But I was thinking about whether the api ever needs to send an email itself. If not, then I could omit that information and instead provide a dummy value for OUTBOUND_EMAIL_ADAPTER.

I just want to reduce the information given to each component as much as possible. If that is a silly thing to do, please tell me. I didn’t plan to omit information that is actually necessary, so the tokes might have been a bad example.

I’m not too familiar with NixOS, but the generally accepted distribution mechanism for Elixir applications is Docker. This is because mix releases require the same runtime environment as the system that the release was created on.

Fortunately you get similar guarantees about the environment with NixOS, so this usually just works out of the box. I also had no real issues packaging the elixir components (apart from a mismatched ex_cldr_numbers dependency which I needed to update).

Due to the way Nix builds stuff, the environment (libraries, binaries, …) is more or less captured implicitly. The only thing that can really differ at runtime are environment variables, which we pass via a systemd service instead of a docker compose file. The end result is actually pretty similar to what you get with docker. But I don’t want to bore you with the details

Thank you for all your help!

jamil · January 27, 2025, 3:20pm

Ah, thanks for clarifying. Yes, the config needs to be compilable by each application, but when the application launches, it’s set for that particular application’s environment only.

In general, I would try to keep config drift between applications minimal, unless you’re sure that the particular config is not used in the application at all.

Bear in mind that the web app sends emails directly, as does the domain app. The api app currently does not, but we may need to in the future (imagine an API call triggering an email to be sent).

I think the main takeaway is to not think of the separate applications as microservices, but rather different processes of the same application, if that makes sense. They do not communicate internally or over the network like microservices would, rather, they can each live “on their own” and each have the dependencies and configuration bundled so as not to rely on the other being up for some critical function.

oddlama · January 27, 2025, 3:37pm

I think the main takeaway is to not think of the separate applications as microservices, but rather different processes of the same application, if that makes sense.

Absolutely, then I’ll include all of those variables.

One more question: I’m trying to run some automated E2E tests that ensure my deployment strategy keeps working when the package receives updates in the future. To do that I need to provision an account in my VM. Thomas already pointed me toward firezone/elixir/apps/domain/priv/repo/seeds.exs at main · firezone/firezone · GitHub which you seem to use for your E2E tests, and I found that all the functionality I would need already exists.

Can I use those functions for my own provisioning at runtime or would you consider them internal/unstable?

jamil · January 27, 2025, 11:19pm

Yes, the functions listed in seeds.exs should be mostly stable, but may still be tweaked over time. If you hit an issue with those however I think the problem should be obvious and quick to resolve.