uxn tutorial day 1 in a different file

This commit is contained in:
sejo 2021-07-23 13:33:07 -05:00
parent d65289c61f
commit dcedd849b1
2 changed files with 621 additions and 605 deletions

View File

@ -1,17 +1,30 @@
# uxn tutorial
(work in progress)
a beginner's guide for programming the {uxn} computer.
a slow-paced companion to the official documentation.
a beginner's guide for programming the {uxn} computer, and a slow-paced companion to the official documentation.
=> https://wiki.xxiivv.com/site/uxntal.html uxntal
=> https://wiki.xxiivv.com/site/uxnemu.html uxnemu
# outline
the tutorial is divided in 8 days (or sections), as it can be followed along with a workshop.
the tutorial is divided in 8 days (or sections), as it could be followed along with a workshop.
(as of today, this is a work in progress)
# day 1
in this first section of the tutorial we talk about the basics of the uxn computer, its programming paradigm, its architecture, and why you would want to learn to program it.
we also jump right in into our first simple programs to demonstrate fundamental concepts that we will develop further in the following days.
=> ./uxn_tutorial_day_1.gmi {uxn tutorial day 1}
# day 2
coming soon!
# draft outline
this outline is here and now as a reference of the overall structure of the tutorial.
## day 1: the basics
@ -100,603 +113,6 @@ new mode: keep mode
* share what you created :)
# day 1
hello! in this section we talk about the basics of the uxn computer, its programming paradigm, its architecture, and why you would want to learn to program it.
we also jump right in into our first simple programs to demonstrate fundamental concepts that we will develop further in next sections
## why uxn?
or first of all... what is uxn?
> Uxn is a portable 8-bit virtual computer capable of running simple tools and games programmable in its own little assembly language. It is also playground to learn basic computation skills.
=> https://wiki.xxiivv.com/site/uxn.html XXIIVV - uxn
i invite you to read "why create a smol virtual computer" from the 100R site, as well:
=> https://100r.co/site/uxn.html 100R - uxn
so, uxn is a virtual (for the moment?) computer that is simple enough to be emulated by many old and new computing platforms. these platforms include running uxn with pen and paper :)
personally, i see in it the following features:
* built for the longterm
* built for audiovisual interactive applications
* simple architecture and instruction set (only 32 instructions!)
* offline-first: it works locally and you only need a couple of documentation files to get going
* practice and experimentation ground for computing within limits
* ported already to several years old and modern computing platforms
all these concepts sound great to me, and hopefully to you too! however, i see in it a couple of aspects that may make it seem not too very approachable:
* it is programmed in an assembly language, uxntal
* it uses the postfix notation / is inspired by forth machines
the idea of this tutorial is to explore these two aspects and reveal how they play along to give uxn its power with relatively little complexity.
## postfix notation (and the stack)
uxn is inspired by forth-machines in that it uses the recombination of simple components to achieve appropriate solutions, and in that it is a stack-based machine.
this implies that it is primarily based on interactions with a "push down stack", where operations are indicated using what is called postfix notation.
> Reverse Polish notation (RPN), also known as Polish postfix notation or simply postfix notation, is a mathematical notation in which operators follow their operands [...]
=> https://en.wikipedia.org/wiki/Reverse_Polish_notation Reverse Polish notation - Wikipedia
### postfix addition
in postfix notation, the addition of two numbers would be written in the following form:
``` 1 48 +
1 48 +
```
where, reading from left to right:
* number 1 is pushed down onto the stack
* number 48 is pushed down onto the stack
* + takes two elements from the top of the stack, adds them, and pushes the result down onto the stack
the book Starting Forth has some great illustrations of this process of addition:
=> https://www.forth.com/starting-forth/1-forth-stacks-dictionary/#The_Stack_Forth8217s_Workspace_for_Arithmetic The Stack: Forths Workspace for Arithmetic
### from infix to postfix
more complex expressions in infix notation, that require either parenthesis or rules of operator precedence (and a more complex system for decoding them), can be simplified with postfix notation.
for example, the following infix expression:
``` (2 + 16)/8 + 48
(3 + 5)/2 + 48
```
can be written in postfix notation as:
``` 3 5 + 2 / 48 +
3 5 + 2 / 48 +
```
we can also write it in many other ways, for example:
``` 48 2 3 5 + / +
48 3 5 + 2 / +
```
make sure these expressions work and are equivalent! you just have to follow these rules, reading from left to right:
* if it's a number, push it down onto the stack
* if it's an operator, take two elements from the top of the stack, apply the operation, and push the result back onto the stack.
note: in the case of the division, the operands follow the same left-to-right order. 3/2 would be written as:
``` 3 2 /
3 2 /
```
you'll start seeing how the use of the stack can be very powerful as it can save operands and/or intermediate results without us having to explicitly assign a place in memory for them (i.e. like using "variables" in other programming languages)
we'll come back to postfix notation and the stack very soon!
## uxn computer architecture
one of the perks of programming a computer at a low-level of abstraction, as we will be doing with uxn, is that we have to know and be aware of its internal workings.
### 8-bits and hexadecimal
binary words of 8-bits, also known as bytes, are the basic elements of data encoding and manipulation in uxn.
uxn can also handle binary words of 16-bits (2 bytes), also known as shorts, by concatenating two consecutive bytes. if we want to get more technical about it, we can say it uses big-endiannes: the "high" byte has a lower address (i.e. it is written before) than the "low" byte.
numbers in uxn are expressed using the hexadecimal system (base 16), where each digit (nibble) goes from 0 to 9 and then from 'a' to 'f' (in lower case).
a byte needs two hexadecimal digits (nibbles) to be expressed, and a short needs four.
### the uxn cpu
it is said that the uxn cpu is a beet, capable of performing 32 different instructions with three different mode flags.
each instruction along with its mode flags can be encoded in a single word of 8-bits.
all of these instructions operate with elements in the stack, either to get from it their operands and/or to push down onto it their results.
we'll be covering these instructions slowly over this tutorial.
### uxn memory
memory in the uxn computer consists in four separate spaces:
* main memory, with 65536 bytes
* i/o memory, with 256 bytes divided in 16 sections (or devices) of 16 bytes each: 8 bytes for inputs and 8 bytes for outputs.
* working stack, with 256 bytes
* return stack, with 256 bytes
each byte in the main memory has an address of 16-bits (2 bytes) in size, while each byte in the i/o memory has an address of 8-bits (1 byte) in size. both of them can be accessed randomly.
the first 256 bytes of the main memory constitute a section called the zero page. this section can be addressed by 8-bits (1 byte), and it is meant for data storage during runtime of the machine.
there are different instructions for interacting with each of these memory spaces.
the main memory stores the program to be executed, starting at the 257th byte (address 0100 in hexadecimal). it can also store data.
the stacks cannot be accessed randomly; the uxn machine takes care of them.
### instruction cycle
the uxn cpu reads one byte at a time from the main memory.
the program counter is a word of 16-bits that indicates the address of the byte to read next. its initial value is the address 0100 in hexadecimal.
once the cpu reads a byte, it decodes it as an instruction and performs it.
the instruction will normally imply a change in the stack(s), and sometimes it may imply a change of the normal flow of the program counter: instead of pointing to the next byte in memory, it can be made to point elsewhere, "jumping" from a place in memory to another.
## installation and toolchain
ready? let's get the uxn assembler (uxnasm) and emulator (uxnemu) from their git repository:
=> https://git.sr.ht/~rabbits/uxn ~rabbits/uxn - sourcehut git
these instructions are for linux-based systems.
if you need a hand, find us in #uxn on irc.esper.net :)
### install SDL2
in order to build uxnemu, we need to install the SDL2 library.
in a terminal in debian/ubuntu, do:
``` sudo apt install libsdl2-dev
$ sudo apt install libsdl2-dev
```
or in guix:
``` guix install sdl2
$ guix install sdl2
```
### get and build uxn
let's get and build uxnemu and uxnasm:
```
$ git clone https://git.sr.ht/~rabbits/uxn
$ cd uxn
$ ./build.sh
```
if everything went alright, you'll see many messages in the terminal and a little new window with the title uxn, and a demo application: uxnemu is now running a "rom" corresponding to that application.
### uxnemu controls
* F1 circles between different zoom levels
* F2 shows the on-screen debugger
* F3 takes a screenshot of the window
### using the toolchain
you'll see that after building uxn, you have three new executable files in the bin/ directory:
* uxnemu: the emulator
* uxnasm: the assembler
* uxncli: a non-interactive console-based emulator
you can adjust your $PATH to have them available anywhere.
the idea is that in order to run a program written in uxntal (the uxn assembly language), first you have to assemble it into a "rom", and then you can run this rom with the emulator.
for example, in order to run {darena} that is in projects/examples/demos/ :
```
# assemble darena.tal into darena.rom
$ ./bin/uxnasm projects/examples/demos/darena.tal bin/darena.rom
# run darena.rom
$ ./bin/uxnemu bin/darena.rom
```
take a look at the available demos! (or not, and let's start programming ours!)
## uxntal and a very basic hello world
uxntal is the assembly language for the uxn machine.
we were talking before about the uxn cpu and the 32 instructions it knows how to perform, each of them encoded as a single 8-bit word (byte).
that uxntal is an assembly language implies that there's a one-to-one mapping of a written instruction in the language to a corresponding 8-bit word that the cpu can interpret.
for example, the instruction ADD in uxntal is encoded as a single byte with the value 18 in hexadecimal, and corresponds to the following set of actions: take the top two elements from the stack, add them, and push down the result.
in forth-like systems we can see the following kind of notation to express the operands that an instruction takes from the stack, and the result(s) that it pushes down onto the stack:
```
ADD ( a b -- a+b )
```
this means that ADD takes first the top element 'b', then it takes the new top element 'a', and pushes back the result of adding a+b.
now that we are at it, there's a complementary instruction, SUB (opcode 19), that takes the top two elements from the stack, subtracts them, and pushes down the result:
```
SUB ( a b -- a-b )
```
note that the order of the operands is similar to the division we discussed above when talking about postfix notation.
### a first program
let's write the following program in our favorite text editor, and save it as hello.tal:
```
( hello.tal )
|0100 LIT 68 LIT 18 DEO
```
let's assemble it and run it:
```
$ ./bin/uxnasm hello.tal bin/hello.rom && ./bin/uxnemu bin/hello.rom
```
we will see an output that looks like the following:
```
Assembled bin/hello.rom(5 bytes), 0 labels, 0 macros.
Uxn loaded[bin/hello.rom].
Device added #00: system, at 0x0000
Device added #01: console, at 0x0010
Device added #02: screen, at 0x0020
Device added #03: audio0, at 0x0030
Device added #04: audio1, at 0x0040
Device added #05: audio2, at 0x0050
Device added #06: audio3, at 0x0060
Device added #07: ---, at 0x0070
Device added #08: controller, at 0x0080
Device added #09: mouse, at 0x0090
Device added #0a: file, at 0x00a0
Device added #0b: datetime, at 0x00b0
Device added #0c: ---, at 0x00c0
Device added #0d: ---, at 0x00d0
Device added #0e: ---, at 0x00e0
Device added #0f: ---, at 0x00f0
h
```
the last 'h' we see is the output of our program. change the 68 to, for example, 65, and now you'll see an 'e'.
so what is going on?
### one instruction at a time
we just ran the following program in uxntal:
```
( hello.tal )
|0100 LIT 68 LIT 18 DEO
```
the first line is a comment: comments are enclosed between parenthesis and there have to be spaces in between them. similar to other programming languages, comments are ignored by the assembler.
the second line has several things going on:
* |0100 : you may remember this number from before - this is the initial value of the program counter; the address of the first byte that the cpu reads. we use this notation to indicate that whatever is written afterwards, will be written in memory starting at this address.
* LIT : this appears twice, and is an uxn instruction with the following actions: it pushes the next byte in memory down onto the stack, and makes the program counter skip that byte.
* 68 : an hexadecimal number, that corresponds to the ascii code of the character 'h'
* 18 : an hexadecimal number, that corresponds to an i/o address: device 1 (console), address 8.
* DEO : another uxn instruction, that we could define as the following: output the given byte into the given device address, both taken from the stack ( byte address -- )
reading the program from left to right, we can see the following behavior:
* the LIT instruction pushes number 68 down onto the stack
* the LIT instruction pushes number 18 down onto the stack
* the DEO instruction takes the top element from the stack (18) and uses it as a device address
* the DEO instruction takes the top element from the stack (68) and uses it as a byte to output
* the DEO instruction outputs the byte to the device address, leaving the stack empty
and what is the i/o device with address 18?
looking at the devices table from the uxnemu reference, we can see that the device with address 1 in the high byte is the console (standard input and output), and that the column with address 8 corresponds to "write".
=> https://wiki.xxiivv.com/site/uxnemu.html uxnemu
so, device address 18 corresponds to "console write", or standard output.
our program is sending the hexadecimal value 68 (character 'h') to the standard output!
you can see the hexadecimal values of the ascii characters in the following table:
=> https://wiki.xxiivv.com/site/ascii.html ascii table
### assembled rom
we can see that the assembler reported that our program is 5 bytes in size:
```
Assembled bin/hello.rom(5 bytes), 0 labels, 0 macros.
```
for the curious (like you!), we could use a tool like hexdump to see its contents:
```
$ hexdump -C bin/hello.rom
00000000 01 68 01 18 17 |.h...|
00000005
```
01 is the "opcode" corresponding to LIT, and 17 is the opcode corresponding to DEO. and there they are our 68 and 18!
so, effectively, our assembled program matches one-to-one the instructions we just wrote!
actually, we could have written our program with these hexadecimal numbers (the machine code), and it would have worked the same:
```
( hello.tal )
|0100 01 68 01 18 17 ( LIT 68 LIT 18 DEO )
```
maybe not the most practical way of programming, but indeed a fun one :)
you can find the opcodes of all 32 instructions in the uxntal reference
=> https://wiki.xxiivv.com/site/uxntal.html XXIIVV - uxntal
### hello program
we could expand our program to print more characters:
```
( hello.tal )
|0100 LIT 68 LIT 18 DEO ( h )
LIT 65 LIT 18 DEO ( e )
LIT 6c LIT 18 DEO ( l )
LIT 6c LIT 18 DEO ( l )
LIT 6f LIT 18 DEO ( o )
LIT 0a LIT 18 DEO ( newline )
```
if we assemble and run it, we'll now have a 'hello' in our terminal, using 30 bytes of program :)
ok, so... do you like it?
it looks unnecessarily complex?
we'll look now at some features of uxntal that make writing and reading code more "comfy".
## runes, labels, macros
runes are special characters that indicate to uxnasm some pre-processing to do when assembling our programs.
### absolute pad rune
we saw already the first of them: | defines an "absolute pad": the address where the next written elements will be located in memory.
if the address is 1-byte long, it is assumed to be an address of the i/o memory space or of the zero page.
if the address is 2-bytes long, it is assumed to be an address for the main memory.
### literal hex rune
let's talk about another one: #.
this character defines a "literal hex": it is basically a shorthand for the LIT instruction.
using this rune, we could re-write our first program as:
```
( hello.tal )
|0100 #68 #18 DEO
```
note that you can only use this rune to write the contents of either one or two bytes (two or four nibbles).
the following would have the same behavior as the program above, but using one less byte (in the next section/day we'll see why)
```
( hello.tal )
|0100 #6818 DEO
```
important: remember that this rune (and the others with the word "literal" in their names) is a shorthand for the LIT instruction. this can lead to confusion in some cases :)
### raw character rune
this is the raw character rune: '
it allows us to have uxnasm decode the numerical value of an ascii character.
our "hello program" would look like the following, using the new runes we just learned:
```
( hello.tal )
|0100 LIT 'h #18 DEO
LIT 'e #18 DEO
LIT 'l #18 DEO
LIT 'l #18 DEO
LIT 'o #18 DEO
#0a #18 DEO ( newline )
```
the "raw" in the name of this rune indicates that it's not literal, i.e. that it doesn't add a LIT instruction.
### runes for labels
even though right now we know that #18 corresponds to pushing the console write device address down onto the stack, for readability and future-proofing of our code it is a good practice to assign a set of labels that would correspond to that device and sub-address.
the rune @ allows us to define labels, and the rune & allows us to define sub-labels.
for example, for the console device, the way you would see this written in uxn programs is the following:
```
|10 @Console [ &vector $2 &read $1 &pad $5 &write $1 &error $1 ]
```
we can see an absolute pad to address 10, that assigns the following to that address. because the address consists of one byte only, uxnasm assumes it is for the i/o memory space or the zero page.
then we see a label @Console: this label will correspond to address 10.
the square brackets are ignored, but included for readability.
next we have several sub-labels, indicated by the & rune, and relative pads, indicated by the $ rune. how do we read/interpret them?
* sublabel &vector has the same address as its parent label @Console: 10
* $2 skips two bytes (we could read this as &vector being an address to a 2-bytes long word)
* sublabel &read has the address 12
* $1 skips one byte (&read would be an address for a 1-byte long word)
* sublabel &pad has the address 13
* $5 skips the remaining bytes of the first group of 8 bytes in the device: these bytes correspond to the "inputs"
* sublabel &write has the address 18 (the one we knew already!)
* $1 skips one byte (&write would be an address for a 1-byte long word)
* sublabel &error has the address 19
none of this would be translated to machine code, but aids us in writing uxntal code.
the rune for referring to literal address in the zero page or i/o address space, is . (dot), and a / (slash) allows us to refer to one of its sublabels.
remember: as a "literal address" rune it will add a LIT instruction before the corresponding address :)
we could re-write our "hello program" as follows:
```
( hello.tal )
( devices )
|10 @Console [ &vector $2 &read $1 &pad $5 &write $1 &error $1 ]
( main program )
|0100 LIT 'h .Console/write DEO
LIT 'e .Console/write DEO
LIT 'l .Console/write DEO
LIT 'l .Console/write DEO
LIT 'o .Console/write DEO
#0a .Console/write DEO ( newline )
```
now this starts to look more like the examples you might find online and/or in the uxn repo :)
### macros
following the forth heritage (?), in uxntal we can define our own "words" in macros that allow us to group and reuse instructions.
during assembly, these macros are (recursively) replaced by the contents in their definitions.
for example, we can see that the following piece of code is repeated many times in our program:
```
.Console/write DEO ( equivalent to #18 DEO, or LIT 18 DEO )
```
we could define a macro called EMIT that will take from the stack a byte corresponding to a character, and print it to standard output. for this, we need the % rune, and curly brackets for the definition.
don't forget the spaces!
```
( print a character to standard output )
%EMIT { .Console/write DEO } ( character -- )
```
in order to call a macro, we just write its name:
```
( print character h )
LIT 'h EMIT
```
we can call macros inside macros, for example:
```
( print a newline )
%NL { #0a EMIT } ( -- )
```
## a more idiomatic hello world
using all these macros and runes, our program could end up looking like the following:
```
( hello.tal )
( devices )
|10 @Console [ &vector $2 &read $1 &pad $5 &write $1 &error $1 ]
( macros )
( print a character to standard output )
%EMIT { .Console/write DEO } ( character -- )
( print a newline )
%NL { #0a EMIT } ( -- )
( main program )
|0100 LIT 'h EMIT
LIT 'e EMIT
LIT 'l EMIT
LIT 'l EMIT
LIT 'o EMIT
NL
```
it ends up being assembled in the same 30 bytes as the examples above, but hopefully more readable and maintainable.
we could "improve" this program by having a loop printing the characters, but we'll study that later on :)
## exercises
### EMIT reordering
in our previous program, the EMIT macro is called just after pushing a character down onto the stack.
how would you rewrite the program so that you push all the characters first, and then "EMIT" all them with a sequence like this one?
```
EMIT EMIT EMIT EMIT EMIT
```
### print a digit
if you look at the ascii table, you'll see that the hexadecimal ascii code 30 corresponds to the digit 0, 31 to the digit 1, and so on until 39 that corresponds to digit 9.
define a PRINT-DIGIT macro that takes a number (from 0 to 9) from the stack, and prints its corresponding digit to standard output.
```
%PRINT-DIGIT { } ( number -- )
```
## instructions of day 1
these are the instructions we covered today:
* ADD: take the top two elements from the stack, add them, and push down the result ( a b -- a+b )
* SUB: take the top two elements from the stack, subtract them, and push down the result ( a b -- a-b )
* LIT: push the next byte in memory down onto the stack
* DEO: output the given byte into the given device address, both taken from the stack ( byte address -- )
# day 2
stay tuned!
# support
if you found this tutorial to be helpful, consider sharing it and {support} it :)
if you found this tutorial to be helpful, consider sharing it and giving it your {support} :)

600
src/uxn_tutorial_day_1.gmo Normal file
View File

@ -0,0 +1,600 @@
# uxn tutorial: day 1, the basics
hello! in this first section of the {uxn tutorial} we talk about the basics of the uxn computer, its programming paradigm, its architecture, and why you would want to learn to program it.
we also jump right in into our first simple programs to demonstrate fundamental concepts that we will develop further in the following days.
# why uxn?
or first of all... what is uxn?
> Uxn is a portable 8-bit virtual computer capable of running simple tools and games programmable in its own little assembly language. It is also playground to learn basic computation skills.
=> https://wiki.xxiivv.com/site/uxn.html XXIIVV - uxn
i invite you to read "why create a smol virtual computer" from the 100R site, as well:
=> https://100r.co/site/uxn.html 100R - uxn
so, uxn is a virtual (for the moment?) computer that is simple enough to be emulated by many old and new computing platforms. these platforms include running uxn with pen and paper :)
personally, i see in it the following features:
* built for the longterm
* built for audiovisual interactive applications
* simple architecture and instruction set (only 32 instructions!)
* offline-first: it works locally and you only need a couple of documentation files to get going
* practice and experimentation ground for computing within limits
* ported already to several years old and modern computing platforms
all these concepts sound great to me, and hopefully to you too! however, i see in it a couple of aspects that may make it seem not too very approachable:
* it is programmed in an assembly language, uxntal
* it uses the {postfix} notation (aka reverse polish notation) / it is inspired by forth machines
the idea of this tutorial is to explore these two aspects and reveal how they play along to give uxn its power with relatively little complexity.
# postfix notation (and the stack)
uxn is inspired by forth-machines in that it uses the recombination of simple components to achieve appropriate solutions, and in that it is a stack-based machine.
this implies that it is primarily based on interactions with a "push down stack", where operations are indicated using what is called postfix notation.
> Reverse Polish notation (RPN), also known as Polish postfix notation or simply postfix notation, is a mathematical notation in which operators follow their operands [...]
=> https://en.wikipedia.org/wiki/Reverse_Polish_notation Reverse Polish notation - Wikipedia
## postfix addition
in postfix notation, the addition of two numbers would be written in the following form:
``` 1 48 +
1 48 +
```
where, reading from left to right:
* number 1 is pushed down onto the stack
* number 48 is pushed down onto the stack
* + takes two elements from the top of the stack, adds them, and pushes the result down onto the stack
the book Starting Forth has some great illustrations of this process of addition:
=> https://www.forth.com/starting-forth/1-forth-stacks-dictionary/#The_Stack_Forth8217s_Workspace_for_Arithmetic The Stack: Forths Workspace for Arithmetic
## from infix to postfix
more complex expressions in infix notation, that require either parenthesis or rules of operator precedence (and a more complex system for decoding them), can be simplified with postfix notation.
for example, the following infix expression:
``` (2 + 16)/8 + 48
(3 + 5)/2 + 48
```
can be written in postfix notation as:
``` 3 5 + 2 / 48 +
3 5 + 2 / 48 +
```
we can also write it in many other ways, for example:
``` 48 2 3 5 + / +
48 3 5 + 2 / +
```
make sure these expressions work and are equivalent! you just have to follow these rules, reading from left to right:
* if it's a number, push it down onto the stack
* if it's an operator, take two elements from the top of the stack, apply the operation, and push the result back onto the stack.
note: in the case of the division, the operands follow the same left-to-right order. 3/2 would be written as:
``` 3 2 /
3 2 /
```
you'll start seeing how the use of the stack can be very powerful as it can save operands and/or intermediate results without us having to explicitly assign a place in memory for them (i.e. like using "variables" in other programming languages)
we'll come back to postfix notation and the stack very soon!
# uxn computer architecture
one of the perks of programming a computer at a low-level of abstraction, as we will be doing with uxn, is that we have to know and be aware of its internal workings.
## 8-bits and hexadecimal
binary words of 8-bits, also known as bytes, are the basic elements of data encoding and manipulation in uxn.
uxn can also handle binary words of 16-bits (2 bytes), also known as shorts, by concatenating two consecutive bytes. we'll talk more about this in the second day of the tutorial.
numbers in uxn are expressed using the hexadecimal system (base 16), where each digit (nibble) goes from 0 to 9 and then from 'a' to 'f' (in lower case).
a byte needs two hexadecimal digits (nibbles) to be expressed, and a short needs four.
## the uxn cpu
it is said that the uxn cpu is a beet, capable of performing 32 different instructions with three different mode flags.
each instruction along with its mode flags can be encoded in a single word of 8-bits.
all of these instructions operate with elements in the stack, either to get from it their operands and/or to push down onto it their results.
we'll be covering these instructions slowly over this tutorial.
## uxn memory
memory in the uxn computer consists in four separate spaces:
* main memory, with 65536 bytes
* i/o memory, with 256 bytes divided in 16 sections (or devices) of 16 bytes each: 8 bytes for inputs and 8 bytes for outputs.
* working stack, with 256 bytes
* return stack, with 256 bytes
each byte in the main memory has an address of 16-bits (2 bytes) in size, while each byte in the i/o memory has an address of 8-bits (1 byte) in size. both of them can be accessed randomly.
the first 256 bytes of the main memory constitute a section called the zero page. this section can be addressed by 8-bits (1 byte), and it is meant for data storage during runtime of the machine.
there are different instructions for interacting with each of these memory spaces.
the main memory stores the program to be executed, starting at the 257th byte (address 0100 in hexadecimal). it can also store data.
the stacks cannot be accessed randomly; the uxn machine takes care of them.
## instruction cycle
the uxn cpu reads one byte at a time from the main memory.
the program counter is a word of 16-bits that indicates the address of the byte to read next. its initial value is the address 0100 in hexadecimal.
once the cpu reads a byte, it decodes it as an instruction and performs it.
the instruction will normally imply a change in the stack(s), and sometimes it may imply a change of the normal flow of the program counter: instead of pointing to the next byte in memory, it can be made to point elsewhere, "jumping" from a place in memory to another.
# installation and toolchain
ready? let's get the uxn assembler (uxnasm) and emulator (uxnemu) from their git repository:
=> https://git.sr.ht/~rabbits/uxn ~rabbits/uxn - sourcehut git
these instructions are for linux-based systems.
if you need a hand, find us in #uxn on irc.esper.net :)
## install SDL2
in order to build uxnemu, we need to install the SDL2 library.
in a terminal in debian/ubuntu, do:
``` sudo apt install libsdl2-dev
$ sudo apt install libsdl2-dev
```
or in guix:
``` guix install sdl2
$ guix install sdl2
```
## get and build uxn
let's get and build uxnemu and uxnasm:
```
$ git clone https://git.sr.ht/~rabbits/uxn
$ cd uxn
$ ./build.sh
```
if everything went alright, you'll see many messages in the terminal and a little new window with the title uxn, and a demo application: uxnemu is now running a "rom" corresponding to that application.
## uxnemu controls
* F1 circles between different zoom levels
* F2 shows the on-screen debugger
* F3 takes a screenshot of the window
## using the toolchain
you'll see that after building uxn, you have three new executable files in the bin/ directory:
* uxnemu: the emulator
* uxnasm: the assembler
* uxncli: a non-interactive console-based emulator
you can adjust your $PATH to have them available anywhere.
the idea is that in order to run a program written in uxntal (the uxn assembly language), first you have to assemble it into a "rom", and then you can run this rom with the emulator.
for example, in order to run {darena} that is in projects/examples/demos/ :
```
assemble darena.tal into darena.rom
$ ./bin/uxnasm projects/examples/demos/darena.tal bin/darena.rom
run darena.rom
$ ./bin/uxnemu bin/darena.rom
```
take a look at the available demos! (or not, and let's start programming ours!)
# uxntal and a very basic hello world
uxntal is the assembly language for the uxn machine.
we were talking before about the uxn cpu and the 32 instructions it knows how to perform, each of them encoded as a single 8-bit word (byte).
that uxntal is an assembly language implies that there's a one-to-one mapping of a written instruction in the language to a corresponding 8-bit word that the cpu can interpret.
for example, the instruction ADD in uxntal is encoded as a single byte with the value 18 in hexadecimal, and corresponds to the following set of actions: take the top two elements from the stack, add them, and push down the result.
in forth-like systems we can see the following kind of notation to express the operands that an instruction takes from the stack, and the result(s) that it pushes down onto the stack:
```
ADD ( a b -- a+b )
```
this means that ADD takes first the top element 'b', then it takes the new top element 'a', and pushes back the result of adding a+b.
now that we are at it, there's a complementary instruction, SUB (opcode 19), that takes the top two elements from the stack, subtracts them, and pushes down the result:
```
SUB ( a b -- a-b )
```
note that the order of the operands is similar to the division we discussed above when talking about postfix notation.
## a first program
let's write the following program in our favorite text editor, and save it as hello.tal:
```
( hello.tal )
|0100 LIT 68 LIT 18 DEO
```
let's assemble it and run it:
```
$ ./bin/uxnasm hello.tal bin/hello.rom && ./bin/uxnemu bin/hello.rom
```
we will see an output that looks like the following:
```
Assembled bin/hello.rom(5 bytes), 0 labels, 0 macros.
Uxn loaded[bin/hello.rom].
Device added #00: system, at 0x0000
Device added #01: console, at 0x0010
Device added #02: screen, at 0x0020
Device added #03: audio0, at 0x0030
Device added #04: audio1, at 0x0040
Device added #05: audio2, at 0x0050
Device added #06: audio3, at 0x0060
Device added #07: ---, at 0x0070
Device added #08: controller, at 0x0080
Device added #09: mouse, at 0x0090
Device added #0a: file, at 0x00a0
Device added #0b: datetime, at 0x00b0
Device added #0c: ---, at 0x00c0
Device added #0d: ---, at 0x00d0
Device added #0e: ---, at 0x00e0
Device added #0f: ---, at 0x00f0
h
```
the last 'h' we see is the output of our program. change the 68 to, for example, 65, and now you'll see an 'e'.
so what is going on?
## one instruction at a time
we just ran the following program in uxntal:
```
( hello.tal )
|0100 LIT 68 LIT 18 DEO
```
the first line is a comment: comments are enclosed between parenthesis and there have to be spaces in between them. similar to other programming languages, comments are ignored by the assembler.
the second line has several things going on:
* |0100 : you may remember this number from before - this is the initial value of the program counter; the address of the first byte that the cpu reads. we use this notation to indicate that whatever is written afterwards, will be written in memory starting at this address.
* LIT : this appears twice, and is an uxn instruction with the following actions: it pushes the next byte in memory down onto the stack, and makes the program counter skip that byte.
* 68 : an hexadecimal number, that corresponds to the ascii code of the character 'h'
* 18 : an hexadecimal number, that corresponds to an i/o address: device 1 (console), address 8.
* DEO : another uxn instruction, that we could define as the following: output the given byte into the given device address, both taken from the stack ( byte address -- )
reading the program from left to right, we can see the following behavior:
* the LIT instruction pushes number 68 down onto the stack
* the LIT instruction pushes number 18 down onto the stack
* the DEO instruction takes the top element from the stack (18) and uses it as a device address
* the DEO instruction takes the top element from the stack (68) and uses it as a byte to output
* the DEO instruction outputs the byte to the device address, leaving the stack empty
and what is the i/o device with address 18?
looking at the devices table from the uxnemu reference, we can see that the device with address 1 in the high byte is the console (standard input and output), and that the column with address 8 corresponds to "write".
=> https://wiki.xxiivv.com/site/uxnemu.html uxnemu
so, device address 18 corresponds to "console write", or standard output.
our program is sending the hexadecimal value 68 (character 'h') to the standard output!
you can see the hexadecimal values of the ascii characters in the following table:
=> https://wiki.xxiivv.com/site/ascii.html ascii table
## assembled rom
we can see that the assembler reported that our program is 5 bytes in size:
```
Assembled bin/hello.rom(5 bytes), 0 labels, 0 macros.
```
for the curious (like you!), we could use a tool like hexdump to see its contents:
```
$ hexdump -C bin/hello.rom
00000000 01 68 01 18 17 |.h...|
00000005
```
01 is the "opcode" corresponding to LIT, and 17 is the opcode corresponding to DEO. and there they are our 68 and 18!
so, effectively, our assembled program matches one-to-one the instructions we just wrote!
actually, we could have written our program with these hexadecimal numbers (the machine code), and it would have worked the same:
```
( hello.tal )
|0100 01 68 01 18 17 ( LIT 68 LIT 18 DEO )
```
maybe not the most practical way of programming, but indeed a fun one :)
you can find the opcodes of all 32 instructions in the uxntal reference
=> https://wiki.xxiivv.com/site/uxntal.html XXIIVV - uxntal
## hello program
we could expand our program to print more characters:
```
( hello.tal )
|0100 LIT 68 LIT 18 DEO ( h )
LIT 65 LIT 18 DEO ( e )
LIT 6c LIT 18 DEO ( l )
LIT 6c LIT 18 DEO ( l )
LIT 6f LIT 18 DEO ( o )
LIT 0a LIT 18 DEO ( newline )
```
if we assemble and run it, we'll now have a 'hello' in our terminal, using 30 bytes of program :)
ok, so... do you like it?
it looks unnecessarily complex?
we'll look now at some features of uxntal that make writing and reading code more "comfy".
# runes, labels, macros
runes are special characters that indicate to uxnasm some pre-processing to do when assembling our programs.
## absolute pad rune
we saw already the first of them: | defines an "absolute pad": the address where the next written elements will be located in memory.
if the address is 1-byte long, it is assumed to be an address of the i/o memory space or of the zero page.
if the address is 2-bytes long, it is assumed to be an address for the main memory.
## literal hex rune
let's talk about another one: #.
this character defines a "literal hex": it is basically a shorthand for the LIT instruction.
using this rune, we could re-write our first program as:
```
( hello.tal )
|0100 #68 #18 DEO
```
note that you can only use this rune to write the contents of either one or two bytes (two or four nibbles).
the following would have the same behavior as the program above, but using one less byte (in the next section/day we'll see why)
```
( hello.tal )
|0100 #6818 DEO
```
important: remember that this rune (and the others with the word "literal" in their names) is a shorthand for the LIT instruction. this can lead to confusion in some cases :)
## raw character rune
this is the raw character rune: '
it allows us to have uxnasm decode the numerical value of an ascii character.
our "hello program" would look like the following, using the new runes we just learned:
```
( hello.tal )
|0100 LIT 'h #18 DEO
LIT 'e #18 DEO
LIT 'l #18 DEO
LIT 'l #18 DEO
LIT 'o #18 DEO
#0a #18 DEO ( newline )
```
the "raw" in the name of this rune indicates that it's not literal, i.e. that it doesn't add a LIT instruction.
## runes for labels
even though right now we know that #18 corresponds to pushing the console write device address down onto the stack, for readability and future-proofing of our code it is a good practice to assign a set of labels that would correspond to that device and sub-address.
the rune @ allows us to define labels, and the rune & allows us to define sub-labels.
for example, for the console device, the way you would see this written in uxn programs is the following:
```
|10 @Console [ &vector $2 &read $1 &pad $5 &write $1 &error $1 ]
```
we can see an absolute pad to address 10, that assigns the following to that address. because the address consists of one byte only, uxnasm assumes it is for the i/o memory space or the zero page.
then we see a label @Console: this label will correspond to address 10.
the square brackets are ignored, but included for readability.
next we have several sub-labels, indicated by the & rune, and relative pads, indicated by the $ rune. how do we read/interpret them?
* sublabel &vector has the same address as its parent label @Console: 10
* $2 skips two bytes (we could read this as &vector being an address to a 2-bytes long word)
* sublabel &read has the address 12
* $1 skips one byte (&read would be an address for a 1-byte long word)
* sublabel &pad has the address 13
* $5 skips the remaining bytes of the first group of 8 bytes in the device: these bytes correspond to the "inputs"
* sublabel &write has the address 18 (the one we knew already!)
* $1 skips one byte (&write would be an address for a 1-byte long word)
* sublabel &error has the address 19
none of this would be translated to machine code, but aids us in writing uxntal code.
the rune for referring to literal address in the zero page or i/o address space, is . (dot), and a / (slash) allows us to refer to one of its sublabels.
remember: as a "literal address" rune it will add a LIT instruction before the corresponding address :)
we could re-write our "hello program" as follows:
```
( hello.tal )
( devices )
|10 @Console [ &vector $2 &read $1 &pad $5 &write $1 &error $1 ]
( main program )
|0100 LIT 'h .Console/write DEO
LIT 'e .Console/write DEO
LIT 'l .Console/write DEO
LIT 'l .Console/write DEO
LIT 'o .Console/write DEO
#0a .Console/write DEO ( newline )
```
now this starts to look more like the examples you might find online and/or in the uxn repo :)
## macros
following the forth heritage (?), in uxntal we can define our own "words" in macros that allow us to group and reuse instructions.
during assembly, these macros are (recursively) replaced by the contents in their definitions.
for example, we can see that the following piece of code is repeated many times in our program:
```
.Console/write DEO ( equivalent to #18 DEO, or LIT 18 DEO )
```
we could define a macro called EMIT that will take from the stack a byte corresponding to a character, and print it to standard output. for this, we need the % rune, and curly brackets for the definition.
don't forget the spaces!
```
( print a character to standard output )
%EMIT { .Console/write DEO } ( character -- )
```
in order to call a macro, we just write its name:
```
( print character h )
LIT 'h EMIT
```
we can call macros inside macros, for example:
```
( print a newline )
%NL { #0a EMIT } ( -- )
```
# a more idiomatic hello world
using all these macros and runes, our program could end up looking like the following:
```
( hello.tal )
( devices )
|10 @Console [ &vector $2 &read $1 &pad $5 &write $1 &error $1 ]
( macros )
( print a character to standard output )
%EMIT { .Console/write DEO } ( character -- )
( print a newline )
%NL { #0a EMIT } ( -- )
( main program )
|0100 LIT 'h EMIT
LIT 'e EMIT
LIT 'l EMIT
LIT 'l EMIT
LIT 'o EMIT
NL
```
it ends up being assembled in the same 30 bytes as the examples above, but hopefully more readable and maintainable.
we could "improve" this program by having a loop printing the characters, but we'll study that later on :)
# exercises
## EMIT reordering
in our previous program, the EMIT macro is called just after pushing a character down onto the stack.
how would you rewrite the program so that you push all the characters first, and then "EMIT" all them with a sequence like this one?
```
EMIT EMIT EMIT EMIT EMIT
```
## print a digit
if you look at the ascii table, you'll see that the hexadecimal ascii code 30 corresponds to the digit 0, 31 to the digit 1, and so on until 39 that corresponds to digit 9.
define a PRINT-DIGIT macro that takes a number (from 0 to 9) from the stack, and prints its corresponding digit to standard output.
```
%PRINT-DIGIT { } ( number -- )
```
# instructions of day 1
these are the instructions we covered today:
* ADD: take the top two elements from the stack, add them, and push down the result ( a b -- a+b )
* SUB: take the top two elements from the stack, subtract them, and push down the result ( a b -- a-b )
* LIT: push the next byte in memory down onto the stack
* DEO: output the given byte into the given device address, both taken from the stack ( byte address -- )
# day 2
stay tuned for the next sections of the {uxn tutorial}!
# support
if you found this tutorial to be helpful, consider sharing it and giving it your {support} :)