2014-11-29 15:52:08 +00:00
|
|
|
**Mu: making programs easier to understand in the large**
|
2014-11-25 07:59:55 +00:00
|
|
|
|
|
|
|
Imagine a world where you can:
|
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
1. think of a tiny improvement to a program you use, clone its sources,
|
|
|
|
orient yourself on its organization and make your tiny improvement, all in a
|
|
|
|
single afternoon.
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
2. Record your program as it runs, and easily convert arbitrary logs of runs
|
|
|
|
into reproducible automatic tests.
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
3. Answer arbitrary what-if questions about a codebase by trying out changes
|
|
|
|
and seeing what tests fail, confident that *every* scenario previous authors
|
|
|
|
have considered has been encoded as a test.
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-30 00:22:11 +00:00
|
|
|
4. Run first simple and successively more complex versions to stage your
|
|
|
|
learning.
|
2014-11-25 07:59:55 +00:00
|
|
|
|
|
|
|
I think all these abilities might be strongly correlated; not only are they
|
|
|
|
achievable with a few common concepts, but you can't easily attack one of them
|
2014-11-25 09:28:17 +00:00
|
|
|
without also chasing after the others. The core mechanism enabling them all is
|
|
|
|
recording manual tests right after the first time you perform them:
|
2014-11-23 18:03:45 +00:00
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
* keyboard input
|
|
|
|
* printing to screen
|
2014-12-13 02:02:30 +00:00
|
|
|
* website layout
|
2014-11-26 15:04:04 +00:00
|
|
|
* disk filling up
|
|
|
|
* performance metrics
|
|
|
|
* race conditions
|
|
|
|
* fault tolerance
|
|
|
|
* ...
|
2014-11-23 18:03:45 +00:00
|
|
|
|
2014-11-25 07:59:55 +00:00
|
|
|
I hope to attain this world by creating a comprehensive library of fakes and
|
|
|
|
hooks for the entire software stack, at all layers of abstraction (programming
|
|
|
|
language, OS, standard libraries, application libraries).
|
|
|
|
|
|
|
|
To reduce my workload and get to a proof-of-concept quickly, this is a very
|
|
|
|
*alien* software stack. I've stolen ideas from lots of previous systems, but
|
|
|
|
it's not like anything you're used to. The 'OS' will lack virtual memory, user
|
|
|
|
accounts, any unprivileged mode, address space isolation, and many other
|
|
|
|
features.
|
|
|
|
|
|
|
|
To avoid building a compiler I'm going to do all my programming in (virtual
|
|
|
|
machine) assembly. To keep assembly from getting too painful I'm going to
|
|
|
|
pervasively use one trick: load-time directives to let me order code however I
|
|
|
|
want, and to write boilerplate once and insert it in multiple places. If
|
|
|
|
you're familiar with literate programming or aspect-oriented programming,
|
|
|
|
these directives may seem vaguely familiar. If you're not, think of them as a
|
|
|
|
richer interface for function inlining.
|
|
|
|
|
|
|
|
Trading off notational convenience for tests may seem regressive, but I
|
|
|
|
suspect high-level languages aren't particularly helpful in understanding
|
|
|
|
large codebases. No matter how good a notation is, it can only let you see a
|
|
|
|
tiny fraction of a large program at a time. Logs, on the other hand, can let
|
|
|
|
you zoom out and take in an entire *run* at a glance, making them a superior
|
|
|
|
unit of comprehension. If I'm right, it makes sense to prioritize the right
|
|
|
|
*tactile* interface for working with and getting feedback on large programs
|
|
|
|
before we invest in the *visual* tools for making them concise.
|
2014-11-23 18:03:45 +00:00
|
|
|
|
2014-11-29 15:52:08 +00:00
|
|
|
**Taking mu for a spin**
|
2014-11-01 23:15:15 +00:00
|
|
|
|
2014-11-29 22:20:20 +00:00
|
|
|
First install [Racket](http://racket-lang.org) (just for the initial
|
|
|
|
prototype). Then:
|
2014-07-10 06:30:09 +00:00
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
```shell
|
2014-11-01 23:15:15 +00:00
|
|
|
$ cd mu
|
|
|
|
$ git clone http://github.com/arclanguage/anarki
|
2014-11-25 17:17:18 +00:00
|
|
|
```
|
2014-11-01 23:15:15 +00:00
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
As a sneak peek, here's how you compute factorial in mu:
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
```lisp
|
2014-12-13 02:07:30 +00:00
|
|
|
function factorial [
|
2014-11-25 07:59:55 +00:00
|
|
|
; allocate some space for local variables
|
2014-12-13 01:54:31 +00:00
|
|
|
default-scope/scope-address <- new scope/literal, 30/literal
|
|
|
|
; receive inputs in a queue
|
|
|
|
n/integer <- next-input
|
2014-11-25 07:59:55 +00:00
|
|
|
{
|
|
|
|
; if n=0 return 1
|
2014-12-13 01:54:31 +00:00
|
|
|
zero?/boolean <- equal n/integer, 0/literal
|
2014-11-25 07:59:55 +00:00
|
|
|
break-unless zero?/boolean
|
|
|
|
reply 1/literal
|
|
|
|
}
|
|
|
|
; return n*factorial(n-1)
|
2014-12-13 01:54:31 +00:00
|
|
|
tmp1/integer <- subtract n/integer, 1/literal
|
2014-11-25 07:59:55 +00:00
|
|
|
tmp2/integer <- factorial tmp1/integer
|
2014-12-13 01:54:31 +00:00
|
|
|
result/integer <- multiply tmp2/integer, n/integer
|
2014-11-25 07:59:55 +00:00
|
|
|
reply result/integer
|
|
|
|
]
|
2014-11-25 17:17:18 +00:00
|
|
|
```
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-26 16:30:26 +00:00
|
|
|
Programs are lists of instructions, each on a line, sometimes grouped with
|
|
|
|
brackets. Instructions take the form:
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-25 17:17:18 +00:00
|
|
|
```
|
2014-11-26 15:41:46 +00:00
|
|
|
oargs <- OP args
|
2014-11-25 17:17:18 +00:00
|
|
|
```
|
2014-11-25 07:59:55 +00:00
|
|
|
|
|
|
|
Input and output args have to be simple; no sub-expressions are permitted. But
|
2014-11-26 16:30:26 +00:00
|
|
|
you can have any number of them. In particular, instructions can return
|
|
|
|
multiple output arguments. For example, you can perform integer division as
|
|
|
|
follows:
|
|
|
|
|
|
|
|
```
|
2014-12-13 01:54:31 +00:00
|
|
|
quotient/integer, remainder/integer <- divide-with-remainder 11/literal, 3/literal
|
2014-11-26 16:30:26 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
Each arg can have any number of bits of metadata like the types above,
|
|
|
|
separated by slashes. Anybody can write tools to statically analyze or verify
|
|
|
|
programs using new metadata. Or they can just be documentation; any metadata
|
|
|
|
the system doesn't recognize gets silently ignored.
|
2014-11-25 07:59:55 +00:00
|
|
|
|
|
|
|
Try this program out now:
|
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
```shell
|
2014-11-25 07:59:55 +00:00
|
|
|
$ ./anarki/arc mu.arc factorial.mu
|
|
|
|
result: 120 # factorial of 5
|
|
|
|
... # ignore the memory dump for now
|
2014-11-25 17:17:18 +00:00
|
|
|
```
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-26 16:30:26 +00:00
|
|
|
(The code in `factorial.mu` looks different from the idealized syntax above.
|
2014-11-25 07:59:55 +00:00
|
|
|
We'll get to an actual parser in time.)
|
|
|
|
|
|
|
|
---
|
2014-11-01 23:15:15 +00:00
|
|
|
|
2014-11-26 16:30:26 +00:00
|
|
|
An alternative way to define factorial is by including *labels*, and later
|
|
|
|
inserting code at them.
|
|
|
|
|
|
|
|
```lisp
|
2014-12-13 02:07:30 +00:00
|
|
|
function factorial [
|
2014-12-13 01:54:31 +00:00
|
|
|
default-scope/scope-address <- new scope/literal, 30/literal
|
|
|
|
n/integer <- next-input
|
2014-11-26 16:30:26 +00:00
|
|
|
{
|
|
|
|
base-case
|
|
|
|
}
|
|
|
|
recursive-case
|
|
|
|
]
|
|
|
|
|
|
|
|
after base-case [
|
|
|
|
; if n=0 return 1
|
2014-12-13 01:54:31 +00:00
|
|
|
zero?/boolean <- equal n/integer, 0/literal
|
2014-11-26 16:30:26 +00:00
|
|
|
break-unless zero?/boolean
|
|
|
|
reply 1/literal
|
|
|
|
]
|
|
|
|
|
|
|
|
after recursive-case [
|
|
|
|
; return n*factorial(n-1)
|
2014-12-13 01:54:31 +00:00
|
|
|
tmp1/integer <- subtract n/integer, 1/literal
|
2014-11-26 16:30:26 +00:00
|
|
|
tmp2/integer <- factorial tmp1/integer
|
2014-12-13 01:54:31 +00:00
|
|
|
result/integer <- multiply tmp2/integer, n/integer
|
2014-11-26 16:30:26 +00:00
|
|
|
reply result/integer
|
|
|
|
]
|
|
|
|
```
|
|
|
|
|
|
|
|
(You'll find this version in `tangle.mu`.)
|
|
|
|
|
|
|
|
---
|
|
|
|
|
2014-11-01 23:34:33 +00:00
|
|
|
Another example, this time with concurrency.
|
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
```shell
|
2014-11-01 23:34:33 +00:00
|
|
|
$ ./anarki/arc mu.arc fork.mu
|
2014-11-25 17:17:18 +00:00
|
|
|
```
|
2014-11-01 23:34:33 +00:00
|
|
|
|
|
|
|
Notice that it repeatedly prints either '34' or '35' at random. Hit ctrl-c to
|
|
|
|
stop.
|
|
|
|
|
2014-11-26 16:48:06 +00:00
|
|
|
Yet another example forks two 'routines' that communicate over a channel:
|
2014-11-25 09:25:20 +00:00
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
```shell
|
2014-11-25 09:25:20 +00:00
|
|
|
$ ./anarki/arc mu.arc channel.mu
|
|
|
|
produce: 0
|
|
|
|
produce: 1
|
|
|
|
produce: 2
|
|
|
|
produce: 3
|
|
|
|
consume: 0
|
|
|
|
consume: 1
|
|
|
|
consume: 2
|
|
|
|
produce: 4
|
|
|
|
consume: 3
|
|
|
|
consume: 4
|
|
|
|
|
|
|
|
# The exact order above might shift over time, but you'll never see a number
|
|
|
|
# consumed before it's produced.
|
2014-11-25 17:17:18 +00:00
|
|
|
```
|
2014-11-25 09:25:20 +00:00
|
|
|
|
|
|
|
Channels are the unit of synchronization in mu. Blocking on channels are the
|
|
|
|
only way tasks can sleep waiting for results. The plan is to do all I/O over
|
|
|
|
channels that wait for data to return.
|
|
|
|
|
|
|
|
Routines are expected to communicate purely by message passing, though nothing
|
|
|
|
stops them from sharing memory since all routines share a common address
|
|
|
|
space. However, idiomatic mu will make it hard to accidentally read or clobber
|
|
|
|
random memory locations. Bounds checking is baked deeply into the semantics,
|
|
|
|
and pointer arithmetic will be mostly forbidden (except inside the memory
|
|
|
|
allocator and a few other places).
|
|
|
|
|
|
|
|
---
|
|
|
|
|
2014-11-01 23:15:15 +00:00
|
|
|
Try running the tests:
|
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
```shell
|
2014-11-01 23:15:15 +00:00
|
|
|
$ ./anark/arc mu.arc.t
|
2014-11-25 07:59:55 +00:00
|
|
|
$ # all tests passed!
|
2014-11-25 17:17:18 +00:00
|
|
|
```
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-26 16:30:26 +00:00
|
|
|
Now start reading `mu.arc.t` to see how it works. A colorized copy of it is at
|
2014-11-29 22:19:31 +00:00
|
|
|
mu.arc.t.html and http://akkartik.github.io/mu.
|
2014-11-25 07:59:55 +00:00
|
|
|
|
2014-11-26 16:30:26 +00:00
|
|
|
You might also want to peek in the `.traces` directory, which automatically
|
2014-11-25 07:59:55 +00:00
|
|
|
includes logs for each test showing you just how it ran on my machine. If mu
|
|
|
|
eventually gets complex enough that you have trouble running examples, these
|
|
|
|
logs might help figure out if my system is somehow different from yours or if
|
|
|
|
I've just been insufficiently diligent and my documentation is out of date.
|
|
|
|
|
|
|
|
The immediate goal of mu is to build up towards an environment for parsing and
|
|
|
|
visualizing these traces in a hierarchical manner, and to easily turn traces
|
|
|
|
into reproducible tests by flagging inputs entering the log and outputs
|
|
|
|
leaving it. The former will have to be faked in, and the latter will want to
|
|
|
|
be asserted on, to turn a trace into a test.
|
|
|
|
|
2014-11-29 15:52:08 +00:00
|
|
|
**Credits**
|
2014-11-25 07:59:55 +00:00
|
|
|
|
|
|
|
Mu builds on many ideas that have come before, especially:
|
|
|
|
|
2014-11-26 15:04:04 +00:00
|
|
|
- [Peter Naur](http://alistair.cockburn.us/ASD+book+extract%3A+%22Naur,+Ehn,+Musashi%22)
|
2014-11-25 07:59:55 +00:00
|
|
|
for articulating the paramount problem of programming: communicating a
|
|
|
|
codebase to others;
|
2014-11-26 15:04:04 +00:00
|
|
|
- [Christopher Alexander](http://www.amazon.com/Notes-Synthesis-Form-Harvard-Paperbacks/dp/0674627512)
|
|
|
|
and [Richard Gabriel](http://dreamsongs.net/Files/PatternsOfSoftware.pdf) for
|
2014-11-25 07:59:55 +00:00
|
|
|
the intellectual tools for reasoning about the higher order design of a
|
|
|
|
codebase;
|
|
|
|
- Unix and C for showing us how to co-evolve language and OS, and for teaching
|
|
|
|
the (much maligned, misunderstood and underestimated) value of concise
|
|
|
|
*implementation* in addition to a clean interface;
|
2014-11-26 15:04:04 +00:00
|
|
|
- Donald Knuth's [literate programming](http://www.literateprogramming.com/knuthweb.pdf)
|
2014-11-25 07:59:55 +00:00
|
|
|
for liberating "code for humans to read" from the tyranny of compiler order;
|
2014-11-26 15:04:04 +00:00
|
|
|
- [David Parnas](http://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf)
|
2014-11-25 07:59:55 +00:00
|
|
|
and others for highlighting the value of separating concerns and stepwise
|
|
|
|
refinement;
|
2014-11-26 15:04:04 +00:00
|
|
|
- [Lisp](http://www.paulgraham.com/rootsoflisp.html) for showing the power of
|
2014-11-25 07:59:55 +00:00
|
|
|
dynamic languages, late binding and providing the right primitives a la
|
|
|
|
carte, especially lisp macros;
|
|
|
|
- The folklore of debugging by print and the trace facility in many lisp
|
|
|
|
systems;
|
|
|
|
- Automated tests for showing the value of developing programs inside an
|
|
|
|
elaborate harness;
|
2014-11-26 15:04:04 +00:00
|
|
|
- [Python doctest](http://docs.python.org/2/library/doctest.html) for
|
2014-11-25 07:59:55 +00:00
|
|
|
exemplifying interactive documentation that doubles as tests;
|
2014-11-26 15:04:04 +00:00
|
|
|
- [ReStructuredText](https://en.wikipedia.org/wiki/ReStructuredText)
|
|
|
|
and [its antecedents](https://en.wikipedia.org/wiki/Setext) for showing that
|
2014-11-25 07:59:55 +00:00
|
|
|
markup can be clean;
|
|
|
|
- BDD for challenging us all to write tests at a higher level;
|
|
|
|
- JavaScript and CSS for demonstrating the power of a DOM for complex
|
|
|
|
structured documents.
|