staticadventures/content/blog/entering-the-static-web/index.md

14 KiB

+++ title = "Entering the static web" date = 2019-03-28 +++

You've probably heard that the static web is rising again, and that many websites are dropping CMS (content-managing systems, such as Wordpress) in favor of static-site generation (SSG) tools such as Jekyll, Nikola, hugo or zola. But what exactly does that mean? Let's take some time to think about it.

Some (incomplete) web history

The web started in the early 90's as a protocol (HTTP) and a language (HTML) to make documents accessible on the network, and let them link to one another. At the time, no language would define how the content would be rendered: it was entirely up to the web browser (the client). Cascading Style Sheets (CSS) were standardized only a few years later.

At the time, it was unconceivable for the browser to run code from the page (Javascript did not exist), and serving content from a database didn't really make sense when you just wanted to serve a bunch of HTML pages. Your website was just a folder containing files (some of which were HTML), and doing it any other way would have complicated things for no benefit (overengineering) given the usage of the web we had back then.

The rise of the CMS

But in the early 2000's, some people started building tools to make website management easier. They called them CMS for Content Managing Systems, and they were basically programs writing your HTML pages for you, through which you can post a new article from a nice web interface.

These CMS, usually, are connected to a database storing the raw content (page title, publication date, author), and assemble the page that you want to read upon request. So when your browser queries a web page managed by a CMS, there are good chances that the server will have to run a query against the database, then run some code to generate the HTML page you're trying to access.

This process is terribly unefficient, because most of the content doesn't change when two different persons access the page. So running the same queries and building the same pages over and over again is basically wasting computing power and therefore natural resources and energy.

As a response to this problem, many CMS or CMS plugins started implementing caching strategies to reduce the number of identical queries and instructions being run throughout the website. This does reduce the load on the server, but is the computing equivalent of putting a bandage on a deep wound: from the outside it looks better, but it's far from ideal if we're trying to fix the actual problem.

The great security disaster

Content Managing Systems are great at… managing content. No surprise here. However, just about anything else they do is disastrous. As mentioned earlier, they are really bad for saving energy. But they are even worse when it comes to security! Security experts could build CMS (like some did with Airship CMS), and they would still probably end up with a few bugs in their implementations. However, most CMS are not built by security experts and are just full of mines ready to explode.

Every year, most CMS have to release urgent security patches to avoid websites getting hacked. However, most website owners do not have time or the knowledge to update their websites… or just hear about it too late! There is something deeply wrong with the idea that your website which was doing fine last week could anytime become a danger to you and the people visiting it.

A CMS has the whole website as an attack surface, as every page is generated dynamically depending on the client request. Serving your website as simple HTML pages reduces the attack surface to the actual folder where your website resides. Which is really easier to secure than thousands of lines of spaghetti-code calling pseudo-PHP templating (Wordpress, I'm looking at you).

Introducing static-site generators (SSG)

In the wake of the many security issues (CVE) affecting CMS, people turned to the past to look for answers. Why did we start overengineering this all in the first place?

What's an SSG?

A static-site generator is a program that takes a bunch of content files following a hierarchy corresponding to the website's tree-structure, and applies HTML templates on these content files to produce a complete output, a process we call "building the site". So if you have a about folder in your content folder, there will be a /about/ page available on your website.

The templates, as defined in your theme or in your site root, are just HTML pages following some syntax to incorporate variable elements, such as child article, or website title. They allow to easily and safely part your website into smaller pieces that are easier to maintain. This way, if you'd like to change the way articles are displayed, you don't have to edit each and every article. You just need to edit a single page.html that will be applied to all articles when rebuilding the site.

So when is it appropriate to rebuild the site? Most of the time, we want to rebuild the website only when content files have changed. We'll see later about tools to automate this. However, sometimes we may need to rebuild the site more often. For example, we can rebuild a site every night to integrate today's agenda from an external tool (such as NextCloud) into the header or sidebar. However often we rebuild the website to update it, in most cases it will be less resource-wasting than having a whole CMS build every page on every request.

The bike analogy

Building a Wordpress-like CMS to make a website is the digital equivalent of building a nuclear-powered car to travel 2km everyday. Sure, it took you lots of time and energy to build it and you're really proud. Sure, at the moment it works just fine. But when things go wrong (and they always do), you really wish you had walked the 2km everyday instead of building that shit-atomic car of yours.

Back to the web. In between manually editing HTML pages (walking) and using a CMS (building a nuclear-powered car), there's options to explore. A static-site generator (SSG) is a tool that fits somewhere on this spectrum to make your life easier without overengineering, and limiting security risks. If you'd like to follow me on this analogy, a static-site generator is some kind of bike. Sure, it's not going to ride through a swamp/desert, and it's probably inappropriate to ride more than 100km/day, but it's simple and solid tech that brings you autonomy and isn't going to ruin your entire life the day it breaks down.

The future emerging from the past

Back in the 90's, many tech people already built static-site generators, although they weren't called that exactly. They were simple personal scripts taking files from a folder and applying additional markup here and there to get the pages ready for publishing… which is amazing! However, those tools often lacked proper documentation and an inclusive community to take them forward, and they were slowly replaced by the ways of the CMS.

Nowadays, some few free-software community-run projects are taking the web in this direction. Although they are not 100% compatible with one another, they share enough concepts and approaches to ensure it's really easy to switch from a generator to another. They let you work close enough to the HTML/CSS layer that you don't need expertise on any of these tools in particular: skills you learn as you go will be useful to you with any templating system (even Timber and Ganttry for Wordpress).

Some new tools

In the past few years, modern static-site generators emerged with each their specificities. Some favor speed, others favor customizability. Some try to integrate external tools, others reinvent the wheel. But surely you'll find a static-site generator to match your needs. If you're not sure which direction to go, don't hesitate to start with a tool you have plenty of documentation and support available for. If some comrades or friends of yours are working with Jekyll, you can always start down this path, learn on the go, and change your tools as your needs and desires evolve.

A new ecosystem

Along with more modern static-site generators a whole ecosystem of tools has emerged to help us deal with our static sites. The biggest advance we made is probably to tie our static websites to version control systems (such as git), and building the site directly from the repository through Continuous Integration procedures (i.e. a script run on every commit). Now, you can have different branches of your website deploying automatically on different webroots, so you may have for instance the testing branch on your repo deploying to testing.example.com (protected by some basic auth) while the master branch deploys to www.example.com.

There's also some attention dedicated to static content managing systems, which are basically simplified user-interfaces to write and post files to your content folder. The most advanced examples of this are PubliiCMS and netlifyCMS (not to be confused with netlify.com the webhost), implemented as a client-side Javascript application. It'll ask you to authenticate on your forge (Github, Gitlab, or Bitbucket so far) and then let you upload files there from a sleek web interface. So that's nice, although running a whole web browser just to edit text is definitely not the best we can do.

Consequences

Latency gets better

As your website is served directly by the web server, your visitors get a better latency. This means the information takes less time to travel from the web browser to the server and back again.

This is true both because we reduced the global amount of computing applied on your data, and because web servers such as nginx apply incredible optimization tricks upon serving static files, that they can't when calling CGI scripts or acting as a reverse-proxy.

Have you ever wondered why some webapps/CMS ask you to configure your webserver to redirect all requests to the backend Python/Go app? Well I have, and I still haven't found an answer.

Backup & Migration

Have you ever had to deal with a missing database backup? Or a painful forced migration of services when you just don't find a honest hosting solution for your tech stack? With a static site, such worries are problems of the past! The fundamental principle of static sites is: your website is its own folder.

Whether we're talking about the source folder (usually your git repo) or the built website (the public folder), your website is a single folder to backup and move around, making it extremely convenient to work with. You would have a hard time finding a webhost that does not support static files!

Interoperability with different networks

Being just a folder means it's also shareable through other networks than the standard web. Your website could be distributed as a torrent, for instance. That would be easier on your bandwidth costs, however people wouldn't be able to browse your website with a proper client as they currently do. But take for instance IPFS, a p2p file-sharing system which supports a web of ipfs:// URIs. Then people can browse it using compatible browsers or browser plugins for Firefox/Chrome.

IPFS is not the limit. There's many different networks the web can reach through static sites that could never have been integrated in a CMS. Freenet, for example, only supports static sites contrary to Tor and I2P which have onion and garlic services (respectively). Well your favorite static website can be hosted on Freenet, too!

In regards to the social web (the web as an open-standards social networking platform), the static web doesn't have all the tools ready just yet. We just need to write them! For example, we can write an ActivityPub or a webmention endpoint that connects to the repository to store incoming interactions following templates that would be standardized to match each SSG's data model. We're not there yet, but soon we'll exchange comments between my static site and yours, and they'all all be self-hosted on our own forge ;)

Conclusion

The web isn't just about us webdevs. But building a more static web can make our lives a little better, by using tools that don't get in our way. Also, not having to constantly worry about security updates is a plus. The static web really offers hope for a better and more humane web, and opens the door to many positive innovations.

PS: Are dynamic web services irrelevant?

Server-side programming hasn't become irrelevant in any way. It's just not needed and not desirable in most cases. We just need to learn when building dynamic content from a database is appropriate. Doing this for a showcase website or a simple blog is fore sure overengineering it. But the web as a platform to exchange and link dynamic data is still a thing! I've mentioned ActivityPub and Indieweb before. They're really great examples of web-based federation protocols, and just one of the many uses of server-side computing.

The whole web doesn't have to be static. But most times, it just should be.

Note: This article was originally written in august 2018 but never got published. I updated it to reflect changes in the world around us (such as the Gutenberg SSG being renamed into Zola) but if you find outdated information, please contact me so i can correct it.