46 KiB

Raw Blame History

+++ title = "Decentralized forge: distributing the means of digital production" date = 2020-11-20 +++

This article began as a draft in february 2020, which i revised because the Free Software Foundation is looking for feedback on their high-priority projects list.

Our world is increasingly controled by software. From medical equipment to political repression to interpersonal relationships, software is eating our world (TODO: i don't remember this article was it good?). As the luddites did in other times, we've been wondering whether a particular piece of software empowers, or controls its users. This question is often phrased through a software freedom perspective, as defined by the GNU project (TODO): am i free to study the program, modify it according to my needs and distribute original or modified copies of it?

However, individual programs cannot be studied out of their broader context. In the physical world, the immense human and ecological impact of a product simply cannot be imagined by looking at the final outcome. In the digital world, a binary program tell us very little about its goals and the social conditions in which it was conceived and produced.

There is a growing monopoly (privatisation) on the means of digital production. This can be illustrated by Adobe and other publishers abandoning their licence schemes in favor of monthly subscriptions. Many services now refuse to setup or operate properly without Internet access, so that a remote party can decide at any time to revoke your access to your programs, whether you've paid for them or not. Other examples from many domains can be taken from Cory Doctorow's talk about the current war on general computation (TODO).

A forge is another name for a software development platform. The past two decades have been rich in progress on forging ecosystems, empowering many projects to integrate tests, builds, fuzzing, or deployments as part of their development pipeline. However, in this field too, there is a growing centralization in the hands of a few nefarious corporations.

In this article, i argue decentralized forging is a key issue for the future of free software and non-profit software development that empowers users. It will cover:

what's wrong with Github and other centralized forging platforms
why traditional git workflows (git-email) are not for everyone
how selfhosted forges as they are (Gitlab/Gitea) limit cooperation
different forms of truth: centralized, concensus and gossip
how to deal with bad actors spamming your forge
real-world projects for federated or peer-to-peer forging
a call for interoperability between decentralized forges
nomadic identity and censorship-resilience
code-signing and secure project bootstrapping

Github considered harmful

More and more, software developers and other digital producers are turning to Github to host their project and handle cooperation across users, groups and projects. Github is a web interface for the git decentralized version control system (DVCS). If you're unfamiliar with git or DVCS in general, i recommend reading an introduction before proceeding with reading this article.

From this web interface, Github enables you to browse issues (sometimes called tickets or bugs) and propose patches (changes) to other projects, or accept/deny those proposed to yours. Github also allows you to define permissions so that only certain groups of users may access the code or push changes. What lies beneath those features is actually handled by the git software itself, which is in no way affiliated to Github.

Github is a web forge, and as such was inspired by many others that came before like Sourceforge and Launchpad. But these previous forges were simple web interfaces exposing git internals naively: they were great, as long as you were familiar with Decentralized Acyclic Graphs and git-specific vocabulary (refs, HEAD, branches..). Github, on the other hand, managed to design their interface from the beginning as a collaboration tool that is usable by people unfamiliar with git, if only to submit tickets.

A minor but significant difference between their approach to user experience, is reflected in that a project's main page on Github displays the repository's folder as well as the README file. In contrast, other solutions displayed a lot more information, but not those. For example, on Inkscape's Launchpad project, it takes me two clicks from the homepage to reach the README file.

While displaying a lot of technical information on the homepage seems like a good idea at first, it's really confusing to whoever has no clue what a tree or a ref is. Technical details bring only moderate value for developers familiar with the tooling (who can use git CLI), but may considerably reduce the signal/noise ratio for newcomers. Moreover, every project is architectured differently, and a rigid, uniform page cannot possibly reflect that.

Placing the README file on the project's homepage is not a secondary concern. It enables direct communication from project developers to end-users, as README files were invented for. Whether those end-users need prior knowledge of other tools and jargon to understand the README is up to you, but Github does not place the bar higher than it needs to be. Also, rendering Markdown to HTML allows for simpler projects to do without a dedicated website, and use their project repository as documentation.

But not all is so bright with Github, and from its very first days some serious critiques have emerged. First and foremost, Github is a private company trying to make money on other people's work. They are more than happy to leverage the whole free-software ecosystem and to provide forging services for it, but always refused to publish their own sourcecode. Such hypocrisy has long been criticized by the Free Software Foundation (TODO).

Github has also been involved in very questionable political decisions. This is especially true since they were bought by Microsoft, one of the worst digital empires conspiring against our freedom. ⁽¹⁾ If your program is in any way interacting with copyrighted materials (such as Popcorn Time or youtube-dl), it may be taken down from Github at any time. The same goes if you or your project displeases a colonial empire (or pretend-republic) such as the United States or Spain: Iranian people have been entirely banned from Github, and an application promoting catalan independance has been removed. ⁽²⁾

As we can observe, giving an evil multinational life-or-death power over many projects is doomed to catastrophe. No single entity should ever have the power to dictate who can participate to a project, or what this project may be.

git is already decentralized, but

To answer the enthusiasm and critique of Github, much of the free-software community argued that git is already decentralized, and therefore we don't need a platform like Github. git is a decentralized version control system: that means every contributor has a complete copy of the repository, and can work on their own without a central server, contrary to previous systems such as svn (TODO).

Due to this, git was conceived to synchronise from/to multiple remote repositories, called "remotes". So even if you use Github, you can at any time mirror or import your repository to another git host, or even setup your own.

Setting up your own git server doesn't have to be complicated, and offers a lot of flexibility in the workflow, thanks to the exposed git hooks. What's more complicated is providing a decent and coherent user experience for other contributors. The git server does not manage patches and tickets: it's a simple versioned file store where a defined set of people can push changes, and little more.

Historically, git was developed for the Linux kernel community, which places email at the core of their cooperation workflow. Bugs are submitted and commented on a Mailing List, and so are patches. git even has a few embedded subcommands for such workflow: git-am, git-send-mail..

However, using git with email is not for everyone. Beyond simple CLI commands you could just memorize, the email workflow is intended to be integrated in your development environment. But most mail clients do not integrate such workflow and the best solution seems to use a dedicated mail client for this purpose, such as aerc.

While CLI-savvy users might just install a new mail client just for collaborating with other folks on a project, this is beyond the reach of most people. To be clear, i don't think there's anything wrong that specific tools tailor to a specific audience. But DVCS like git are employed by people from all backgrounds including designers, translators, editors.. Many of these people will not bother to learn two tools and a bunch of keyboard shortcuts at the same time just to start contributing to a project.

Actually, they may do so if they understand the benefits of this approach, and find a tutorial in their own language that explains just how to achieve a consistent git email workflow that's well-integrated into their usual environment. Maybe if there was a well-known file within a repository describing ticketing/patching workflow in a machine-readable manner, then we could have many mail clients implementing this standard to facilitate cooperation.

But as of now, I don't know of any simple starter kit for newcomers to get started with git-email workflows without having to learn a whole bunch of other concepts, and this places the bar to contribution higher than it should be. Implicitly expecting new contributors to understand a developer-oriented workflow, or have a couple hours to learn how to send a simple patch, simply means fewer contributors.

# Smaller walled gardens is a burden for users

Over the past decade, selfhosted alternatives to Github have been developed. Gitlab, Gogs and Gitea (to name a few) have contributed a great deal to the free-software ecosystem, by empowering communities who were critical of Github to setup their own modern forging platform. For example, Debian has an ongoing love affair with Gitlab on their Salsa instance.

Turnkey selfhosted forges are a great tool to avoid Github for big organizations, with a lot of subprojects and contributors, who need to selfhost their whole infrastructure for reliability or security concerns. However, as these solutions were modeled for this specific usecase, they have one major drawback compared to a git-email workflow.

As the forge is usually shut off from the outside world, cooperation between users is only envisioned in a local context. Two users on Codeberg may cooperate on a project, but a user from tildegit or 0xacab may not. As it stands, a user from another forge would have to create a new account on your forge to participate to your project. Some may argue that this approach is a feature and not a bug, for three reasons.

First, because it makes it easier to enforce server-wide rules/guidelines and access roles. While this may be an advantage in corporate settings (or for big community projects), it does not apply to popular hacking usecases, where all users are treated equally, and project owners individually setup access roles for their project (not server-wide).

Second, having many accounts over many servers would make it harder to loose all your accounts at once, due to censorship, compromission or other discontinuation of server activities. This is a good point, but i would argue can be tackled in more comprehensive and user-friendly ways through easier project migration and nomadic identity systems, as will be explained further in this article.

Third, because having one account per server is usually not a problem, because you only contribute to so many projects. This last argument, in my view, is very misinformed. Although a person may only contribute seriously and frequently to a certain number of projects, we all use many more software projects in our day-to-day lives. Bug reporting outside of privacy-invading telemetrics is often a tedious process: create a new account on a bugtracker, read bug reporting guidelines, figure out the bugtracker's syntax for links/screenshots, and finally submit a bug.

The bug reporting workflow as achieved by Github and mailing lists is more accessible: use your usual account, and the interface you're used to, to submit bugs to many projects. Any project you'd like to contribute to that uses Mailing Lists for cooperation, you can contribute to from your usual mail client. Your bug reports and patches may be moderated before they appear publicly, but you don't have to create a new account and learn a new workflow just to submit a bug report.

Also, different projects may expect different formats of bug reports. For example, the debbugs bugtracker for Debian expects email formatted in a specific way in order to assign them to the corresponding projects/maintainers. In that, it is a bug-reporting protocol built on top of mail technology, but to my knowledge there is no standard document describing this protocol and no other implementation.

Github projects, on the other hand, worry more about the actual contents of the bug report. That is because semantic actions (bugreport metadata) are already handled by Gitub at the web request level. For bug report formatting, Github lets you write a BUG.md (TODO) template at the root of your project (alongside the README file). This template is presented to users submitting a bug. This lets users figure out what you're expecting from a bug report in certain circumstances (version/stacktrace/etc), but still allows them to write their own text (disregarding the template) when they feel it's more appropriate.

So, while selfhosted forges have introduced a lot of good stuff, they have broken the user expectation that you can contribute to any project from your personal account. Creating a new account for every piece of software you'd like to report bugs (or submit patches) to is not a user-friendly approach.

This phenomenon was already observed in a different area, with selfhosted social networks: Elgg could never replace Facebook entirely, nor could Postmill/Lobsters replace Reddit, because participation was restricted to a local community. In some cases it's feature: a family's private social network should not connect to the outside world, and a focused and friendly community like raddle.me or lobsters may wish to preserve itself from nazi trolls from other forums.

But in many cases, not being able to federate across instances (across communities) is a bug. I would argue such selfhosted services tailor to the niche usecases, not because they're too different from Facebook/Reddit, but because they're technically so similar to them. In copying/reimplementing "upstream" features ⁽³⁾, the aspects of user management were also carried over to be that of a centralized system.

So, instead of dealing with a gigantic walled garden (Github), or a wild jungle (mailing lists), we now end up with a collection of tiny closed gardens. The barrier to entry to those gardens is low: you just have to introduce yourself to the frontdoor and define a password. But this barrier to entry, however low it is, is too high for most non-technical users to feel comfortable to submit bug reports to all projects they make use of.

I suspect that for smaller volunteer-run projects, the ratio of bug reporters to code committers is much higher on Github and development mailing lists than it is on smaller, selfhosted forges. If you think that's a bad thing, try shifting your reasoning: if only people familiar with programming are reporting bugs, and your project is not only aimed at developers, it means most of your users are either taking bugs for granted, or abandoning your project entirely.

Centralized trust, consensus and gossip

One of the hard problems in computing is establishing trust. When we're fetching information from a project, how to ensure we have the correct information?

In traditional centralized and federated systems, we rely on location-addressed sources of trust. We define where to find reliable information about a project (such as a git remote). To ensure authenticity of the information, we rely on additional security layers:

Transport Layer Security (TLS) or Tor's onion services to ensure the remote server's authenticity, that is to make it harder for someone to impersonate a forge to serve you malicious updates
Pretty Good Privacy (PGP) to ensure the document's authenticity, that is to make it harder for someone who took control of your account/forge to serve malicious updates

How we bootstrap trust (from the ground up) for those additional layers, however, is not a simple problem. Traditional TLS setup can be abused by any member of the Certificate Authorities cartel, while onion services and PGP require prior knowledge of authentic keys (key exchange). With the DANE protocol, we can bootstrap TLS keys from the DNS instead of the CA cartel. However, this is still not supported by many clients, and in any case is only as secure as DNS itself. That is, very insecure even with DNS security extensions (DNSSEC). For a location-based system to be secure, we need a secure naming system like the GNU Name System to enable further key exchange.

These difficulties are inherent properties of location-addressed storage, in which we describe where is a valid source of the information we're looking for, which requires additional security measures. Centralized and federated systems are by definition location-addressed systems. Peer-to-peer systems, on the other hand, don't place trust in specific entities. In decentralized systems, trust is established either via consensus or gossip.

Consensus is an approach in which all participating peers should agree on a single source of truth. They take votes following a protocol like Raft, and the majority wins. In smaller, closed-off systems where a limited number of people control the network, consensus is achieved by acknowledging a limited set of peers. These approved peers can either be manually defined (static configuration), or be bootstrapped from a centralized third party such as a local certificate authority controlled by the same operators.

But these traditional consensus algorithms do not work for public systems. If anyone can join the network and participate to the consensus establishment (not just a limited set of peers), then anyone may create many peers to try and take control of the consensus. This is often known as Sybil attack (TODO) or 51% attack.

This problem has sprung two approaches: Proof of Work and gossip.

Proof-of-Work (PoW) is consensus achieved through raw computational power. PoW systems such as Bitcoin consider that, out of the global computing power in a given network, the peers representing the majority of the computing power must be right. While this approach is very interesting conceptually and was a mathematical achievement, it leads to terrible consequences like Bitcoin using on its own more electricity than many countries. I repeat, a single application is responsible for several percents of global electricity usage.

Gossip is a conceptual shift, in which we explicitely avoid global consensus, because as we've seen, establishing consensus is hard. Instead, each peer has their own truth/view of the network, but can ask other peers for more information. Gossip is closer to how actual human interactions work: my local library may not have all books every printed, but whatever i find in there i can share with my friends and neighbors.

Authenticity in gossip protocol is also enabled by asymmetric cryptography (like PGP), but gossip protocols usually employ cryptographic identifiers (tied to a public key) to designate users. In addition, all messages across the network are signed, so that any piece of content can be mapped to a unique identity and authenticated.

Gossip can be achieved through any channel. Usually, it involves USB keys and local area networks (LAN). But nothing prevents us from using well-known locations on the Internet to exchange gossiped information, much like a newspaper or community center would achieve in the physical world. That's essentially what the Secure ScuttleButt (SSB) protocol is doing with its hubs, or what the PGP Web of Trust is doing with keyservers.

In my view, gossip protocols include IPFS and Bittorrent. Don't be surprised: a Distributed Hash Table (a distributed content-discovery database) is a form of globally-consistent gossip. It's rather similar to a blockchain, in that you can query any peer about specific information. However, in a Bitcoin-style blockchain, every peer needs to know about everything in order to ensure consistency. In a DHT, no peer knows about everything (reducing requirements to join the DHT), and consistency is ensured by content addressing (checksumming the information stored).

That means although a DHT is partitioned across several peers who each have their view of the network, it is built so that peers will help you find information they don't know about, and checking information correctness (detecting bad actors) is not hard. When you load a magnet: URL in your Bittorrent client, it loads a list of DHT peers and asks them about what you're looking for (the checksum in the magnet link). If these peers have no idea what piece of content you're talking about, they may point you to other peers who may help you find it. In that, i consider DHTs to be some form of global crypto-gossip protocols.

Although it's not a widely-researched topic, it seems [IPv6 multicast](TODO: link to conf) could be used to make gossiping a network-level concern. This would avoid the overhead of keeping a local copy of all content you encounter simply to propagate it (a common criticism of SSB). In a such hypothetical setup, one could advertise all new content to a broader audience, while choosing to keep a local archive of select content they may want to access again later. If you're interested in this, be sure to check out a talk called Privacy and decentralization with Multicast.

Dealing with bad actors and malicious activity

One may object that favorizing interaction across many networks will introduce big avenues for malicious activity. However, i would argue this is far from the truth. In practice, we have decades of experience from the email community about how to protect users from spam and malicious activities.

Even protocols who initially discarded such concerns as secondary are eventually rediscovering rate-limiting, webs of trust, and user-overridable allow/deny lists on the server level. A talk entitled Architectures of Robust Openness from the latest ActivityPub conference touches on those topics.

Another concern with authenticated, decentralized forging is repudiation and plausible deniability. For example, if you commit a very secret piece of information to your forge, how can you take it back? Or if you publish some information that displeases higher powers, how to pretend you are not responsible for it? This is a hard problem to tackle.

In secure instant messaging systems (such as OTR/Signal/OMEMO encryption), encryption keys are constantly rotated, and previous keys are published. By design, this allows anyone to impersonate anyone from the past. This way, there is no way you can be proven to be responsible for a past message (plausible deniability), because anyone could have forged it. Lately, people from the email ecosystem have called server operators to publish their previous DKIM keys. This would enable plaintext emails to be plausibly denied, while retaining authenticity for PGP-signed emails.

However, what works for private communications may not be suited to public cooperation. I do not know of any way to achieve plausible deniability, repudiation and authentication in a public context. If you have readings on this subject, please send me an email and i will update the article.

There is currently

past public keys are discarded

how to deal with repudiation in a p2P system? once you've signed something away, there's no turning back?

PLUS: forbidding tor or certain IP ranges is NOT protection from malicious activity because the most malicious actors have almost unlimited resources

What's happening in the real world

So, i think we've talked enough about theoretical approaches to federated/decentralized systems and some of their properties. Now, it's time to take a look at projects people are actually working on.

Federated authentication

Some forges like Gitea propose [OpenID Connect] federated authentication: you can use any OpenID Connect account to authenticate yourself against a selfhosted forge. Previous OpenID specifications required a specific list of servers you allowed authentication from: "login with microfacegoople". OpenID Connect is a newer standard which features a well-known endpoint discovery mechanism, so the software can detect the authentication server for a given domain, and start authenticating a user against it.

So, whether you're signing up for the first time or signing in, you need to give the forge your OpenID Connect server. You will then be redirected to this OpenID server, authenticated (if you are not yet logged in) and prompted whether you want to login on this forge. If you accept, you will be redirected to the forge, and it will know the OpenID server vetted for your identity.

This approach to decentralized forging is brilliant because it's simple and focuses on the practicality for end-users. However, it does not solve the problem of migrating projects between forges.

Federated forging

Federated forging relies on forging vocabulary exchanged over established federation protocols. That means, a whole ecosystem of clients, servers and protocols is reused as a basis for forging systems. This is exemplified by the ForgeFed and Salut-à-Toi projects.

ForgeFed is a forging vocabulary for the ActivityPub federation protocol (the fediverse). It has a proof-of-concept implementation (TODO) and aims to be implementable for any web forge. However, despite some interesting discussions on their forums, there seems to be little implementation-focused work at the moment.

Salut-à-Toi (TODO) on the other hand, is an actual suite of clients (a library with several frontends) for the Jabber federation (XMPP protocol). It was a client project from the beginning, and only two years ago started to implement forging features. While it's still a proof-of-concept, it's reliable enough for the project to be selfhosted. In this context, selfhosted means that salut-à-toi is the forging software used to develop the project itself.

TODO: salut à toi forging screenshot

While such features are not implemented yet, the fact that these federated forge rely on standard vocabulary would theoretically enable migration between a lot of forges, without having to use custom APIs for every forge, as is common for Github/Sourceforge/etc migrations.

Also, as the user interactions themselves are federated, and not just authentication, folks may use their client of choice to contribute to remote projects. There would be lesser concerns for color themes or accessibility on the server side, because all of these questions would be addressed on the client side. This is a very important property for accessibility, ensuring your needs are covered by the client software, and that a remote server cannot impact you in negative ways.

If your email client is hard for you to use, or otherwise unpleasant, you may use any email-compatible client that better suits your needs. With selfhosted, centralized forges, where the client is tightly-coupled to the server, every forge needs to make sure their service is accessible. Every forge you join to contribute to a project can make your user experience miserable. Imagine if you had to use a different user interface for every different server you're sending emails to?!

The same would apply to federated forging, in which your favorite client would allow you to participate to many projects. The server provides function, and your client provides usability on your own terms.

Blockchain consensus

Apart from sketchy Silicon Valley startups, the consensus approach is only experimented by Radicle as far as i know. A blockchain is a strange approach for a community-oriented project. However, it appears they attempt to exploit crypto-speculation to benefit contributors to the free software ecosystem.

TODO: radicle screenshot

I'm tempted to just say How could this possibly go wrong?!. After all, remember Bitcoin was envisioned as a popular value-exchange system outside of the reach of bad actors (States, banks), which could be used to empower local communities in their day-to-day business. And look what we got: a global speculation ring consuming vasts amount of ressources, controlled by fewer actors as time goes, and unusable for the common people (because of transaction costs/delays).

But in the end, it all boils down to political and technical decisions Radicle will make as a community. As this specific community seems entirely comprised of good-faith enthusiasts, i wish them all the best, and can only encourage inspiration and cooperation from the broader decentralized forging ecosystem.

On the more exciting side of things, radicle uses strong cryptography to sign commits in a very integrated and user-friendly way. That's a strong advantage over most systems in which signatures are optional and delegated to third-party tooling which can be hard to setup for newcomers.

Gossip

Gossip systems are a common approach for decentralized forging. The most recent and most polished attempt at gossiped forging is git-ssb over the Secure ScuttleButt protocol. Other examples are git-ipfs and Gittorrent.

TODO: git-ssb screenshot.

Although they're less polished for day-to-day use, these projects are very interesting. They helped pave the way for research into decentralized forging by showing that git and other decentralized version control systems (DVCS) play well with content-addressed storage, given that the commits themselves are content-addressed (a commit name is a mathematical checksum of everything it contains).

Not covered here

Many more projects over the years have experimented with storing forging interactions (metadata like bugs and pull requests) as well-known files within the repository itself. Some of them are specific to git: git-dit, ticgit, git-bug, git-issues. Others intend to be used with other versioning systems (DVCS-agnostic): artemis, bugs-everywhere, dits, sit.

I will not go into more details about them, because these systems only worry about the semantics of forging (vocabulary), but do not emphasize how to publicize changes. For example, these tools would be great for a single team having access to a common repository to update tickets in an offline-first setting, then merging them on the shared remote when they're back online. But they do not address cooperation with strangers, unless you give anyone permission to publish a new branch to your remote, which is probably a terrible idea. However, that's just my personal, uninformed opinion: if you have counter-arguments about how in-band storage of forging interactions could be used for real-world cooperation with strangers, i'd be glad to hear about it!

Lastly, i didn't mention Fossil SCM because i'm not familiar with it, and from reading the docs, i'm very confused about how it approaches cooperation with strangers. It appears forging interactions are stored within the repository itself, but then does that mean that Fossil merges every interaction it hears about? Or is Fossil only intended for use in a closed team? Let me know if you have interesting articles to learn more about Fossil.

With interoperability, please

After this brief review of the existing landscape of decentralized forging, i would like to argue for interoperability. If you're not familiar with this concept, it's a key concern for accessibility/usability of both physical and digital systems: interoperability is the property when two systems addressing the same usecases can be used interchangeably. For example, a broken lightbulb can be replaced by any lightbulb following the same socket/voltage standards, no matter how it works internally to produce light.

In fact, interoperability is the default state of things throughout nature. To make fire, you can build any sort of wood that burns. If your window is broken and you don't have any glass at hand, you can replace it with any material that will prevent air flowing through. Interoperability is a very political topic, and a key concern to prevent the emergence of monopolies. If you'd like to know more about it, i strongly recommend a talk called We used to have cake, now we've barely got icing.

So while these approaches of decentralized forging we've talked about are very different in some regards, there is no technical reason why they could not play well together and inteoperate consistently. As a proof of concept, git-issue we've mentioned in the previous section can actually synchronise issues contained within the repository with Github and Gitlab issues. It could as well synchronise with any selfhosted forge (federated or not), or publish the issues on the radicle blockchain.

The difference between federated and p2p systems is big, but [hybrid p2p/federated systems have a lot of value](TODO: my own article to finish). If we develop open standards, there is no technical barrier for a peer-to-peer forge to synchronise with a federated web/XMPP forge. It may be hard to wrap one's head around, and may require a lot of work for implementation, but it's entirely possible. Likewise, a federated forge could federate both via ForgeFed, and via XMPP. And it could itself be a peer in a peer-to-peer forge, so that pull requests submitted on Radicle may automatically appear on your web forge.

Not all forges have to understand each other. But it's important that we at least try, because the current fragmentation across tiny different ecosystems is hostile to new contributions from people who are used to different workflows and interfaces.

Beyond cooperation, interoperability would also ease backups, forks and migrations. Migrating your whole project from a forge to another would only take a single, unprivileged action. When forking a project, you would have a choice whether to inherit all of its issues and pull requests or not. So if you're working on a single patch, you would discard it. But in case you want to take over an abandoned project, you would inherit all of the project's history and discussions, not just the commits.

You may have noticed i did not mention the email workflow in this section about interoperability. That's because email bugtracking and patching is far from being standardized. An issue tracker like debbugs could rather easily be interoperated with, because it has a somewhat-specified grammar for interacting with tickets. But what about less specified workflows? My personal feeling is that these different workflows should be standardized.

Many vocabulary and security concerns expressed in this article would equally apply to email forging. But to be honest, i'm not knowledgeable enough about email-based forging to provide a good insight on this topic. I'm hoping people from the sourcehut forge community and other git-email wizards can find inspiration in this call to decentralized forging, come around the table, and figure out clever ways to integrate into the broader ecosystem.

Code signing and nomadic identity

So far, i've talked about different approaches to decentralized forging and how they could interoperate. However, one question i've left in the cupboard is how to ensure authenticity of interactions across different networks?

Code signing in forging usually uses PGP keys and signatures to authenticate commits and refs. In most cases, it is considered a DVCS-level concern and is left untouched by the forge, except maybe to display a symbol for valid signature alongside a commit. While we may choose to trust the forge regarding commit signatures, we may also verify these on our end. The tooling for verifying signatures is lacking, although there is recent progress with the GNU Guix project releasing the amazing guix git authenticate command for bootstrapping a secure software supply chain.

However, forging interactions such as issues are typically unsigned, and cannot be verified. In systems like ActivityPub and radicle, these interactions are signed, but with varying levels of reliability. While radicle has strong security guarantees because every client owns their keys, email/ActivityPub lets the server perform signatures for the users: a compromised server could compromise a lot of users and therefore such signatures are unreliable from a security perspective. We could take this into consideration when developing forging protocols, and ensure we can embed signatures (like PGP) into interactions.

For interoperability concerns, each forge could implement different security levels, and let maintainers choose the security properties they expect for external contributions, depending on their practical security needs. A funny IRC bot may choose to emphasize low-barrier contribution across many forges over security, while a distribution may enforce stricter security guidelines, allowing contributions only from a trusted webforge and PGP-signed emails. In any case, we need more user-friendly tools for making and verifying signatures.

Another concern is how to deal with migrations. If my personal account is migrated across servers, or i'm rotating/changing keys, how to let others know about it in a secure manner? In the federated world, this concern has been addressed by the ZOT protocol, which as initially developed for Hubzilla's nomadic identity system. ZOT lets you take your content and your friends to a new server at any given moment.

This is achieved by adding a crypto-identity layer around server-based identity (user@server). This crypto-identity corresponding to a keypair (think PGP) is bootstrapped in a TOFU manner (Trust On First Use) when federating with a remote user on a server that supports the ZOT protocol. The server will give back the information you requested, and let you know the nomadic identity keypair for the corresponding user. Then, you can fetch the corresponding ZOT profile from the server to discover other identities signed with this keypair.

For example, let's imagine for a second that tildegit.org and framagit.org both supported the ZOT protocol and some form of federated forging. My ZOT tooling would generate a keypair, that would advertise my accounts on both forges. When someone clones one of my projects, their ZOT-enabled client would save this identity mapping somewhere. This way, if one of the two server ever closes, the client would immediately know to try and find my project on the other forge.

In practice, there would be a lot more subtlety to represent actual mapping between projects (mirrors), and to map additional keypairs on p2p networks (such as radicle) to a single identity. However, a nomadic identity system doesn't have to be much more complex than that.

The more interesting implementation concern is how to store, update and retrieve information about a nomadic identity. With the current ZOT implementations (to my knowledge), identities are stored as signed JSON blobs, that you retrieve opportunistically from a remote server (TOFU). However, that means if all of your declared servers are offline (for instance, if there's only one of those) i cannot automatically discover your updated nomadic identity (with your new forge servers).

I believe a crypto-secure, decentralized naming system such as GNS or IPNS would greatly benefit the nomadic identity experience. DNS could also be used here, but as explained before, DNS is highly vulnerable to determined attackers. Introducing DNS as discovery mechanism for nomadic identities would weaken the whole system, and make it much harder to get rid of in the future (for backwards-compatibility).

With GNS/IPNS (or any other equivalent system), people would only need to advertise their public key on every forge, and the nomadic identity mapping would be fetched in a secure manner. Considering GNS is in fact a signed and encrypted peer-to-peer key-value store itself, we could use GNS itself to store nomadic identity information (using well-known keys). IPNS, on the other hand, only contains an updatable pointer to an IPFS content-addressed directory. In this case, we would use well-known files within the target directory.

So, migration and failover across forges should also be feasible, despite other challenges not presented here, such as how to ensure consistency across decentralized mirrors, and what to do in case of conflicts.

Conclusion

Decentralized forging is in my view the top priority for free-software in the coming decade. The Internet and free-software have a symbiotic relationship where one cannot exist without the other. They are two facets of the same software supply chain, and any harm done on one side will have negative consequences on the other. Both are under relentless attack by tyrants (including pretend-democracies like France or the USA) and multinational corporations (like Microsoft and Google).

Developing decentralized forging tooling is the only way to save free software and the Internet as we know them, and may even lower the barrier to contribution for smaller community projects.

Of course, decentralized forging will not save us from the rise of fascism in the physical and digital world. People will have to stand up to their oppressors. Companies and State infrastructure designed to destroy nature and make people's lives miserable will have to burn, as explained in a talk about Climate Change, Computing, and All our relationships. But we are not afraid of ashes, because we carry a new world, right here in our hearts. And this world is growing, at this very minute.

Finally, you may wonder how i envision my own contribution to the decentralized forging ecosystem. I may not be competent enough to contribute useful code to the projects i listed above, but i may articulate critical feedback from a user's perspective (as i did in this post). But to be honest with you, i have other plans.

In the past years, i've been struggling with shell scripts to articulate various repositories and trigger tasks from updates. From my painful experiences from automatically deploying this website (in which the theme is a submodule), i've come up with what i think is a simple, coherent, and user-friendly Continuous Integration and Delivery platform: the forgesuite, which contains two tools: forgebuild and forgehook.

On a high-level, forgehook can receive update notifications from many forges (typically, webhooks) and expose a standard semantic (ForgeFedà representation of this notification, indicating whether it's a push event, a comment on a ticket, on a new/updated pull request. forgehook then manages local and federated subscriptions to those notifications, filters them according to your subscription settings, and transmits them to other parts of your infrastructure. For example, maybe your IRC bot would like to know about all events happening across your project in order to announce them on IRC, but a CI test suite may be only interested in push and pull-request events.

forgebuild, on the other side of the chain, fetches updates from many remote repositories, and applies local settings to decide whether to run specific tasks or not. For now, only git and mercurial are supported, but any other version control system can be implemented by following a simple interface. Automated submodule updates are a key feature of forgebuild, to let you update any submodule, and automatically trigger corresponding tasks if submodule updates are enabled. forgebuild follows a simple CLI interface, and as such your CI/CD tasks can be written in your favorite language.

While the forgesuite is still in early stages, i believe it's already capable of empowering people. Less-experienced users who are somewhat familiar with the command line should find it very convenient to automate simple tasks, while power users should be able to integrate it with their existing tooling and infrastructure without concern. I know there is room for improvement, so if the forgesuite fails you somehow, i consider it's a bug! Don't hesitate to report critical feedback.

I will write more about the forgesuite in another blogpost, so stay tuned. In the meantime, happy hacking!

⁽¹⁾ Microsoft has tried to buy themselves a free-software friendly public image in the past years. This obvious openwashing process has consisted in bringing free software to their closed platform (Windows Subsystem for Linux), and open-sourcing a few projects that they could not monetize (VSCode), while packaging them with spyware (telemetry) for free platforms. Furthermore, Microsoft has been known for decades to cooperate with intelligence services (PRISM, NSAKEY) and oppressive regimes.

⁽²⁾ Catalonia has a long hsitory of repression by the spanish state. Microsoft is just the latest technological aid for that aim, just like Hitler and Mussolini in their time provided weapons to support Franco's coup, and crush the social revolution in Barcelona.

⁽³⁾ "upstream" here is meant as the source for inspiration, not source for code.

46 KiB Raw Blame History