Commit Graph

11 Commits

Author SHA1 Message Date
Kartik K. Agaram d6866ec35d . 2021-10-10 15:52:03 -07:00
Kartik K. Agaram 3254fe5ca5 . 2021-10-10 15:50:52 -07:00
Kartik K. Agaram 365b1f855c . 2021-10-10 15:48:47 -07:00
Kartik K. Agaram dca845877b tag combining character code-points
Unfortunately the Unicode database doesn't actually provide obvious
metadata for combining characters. The process I followed is as follows.
I noticed that GNU Unifont provides the following files for download:

  - unifont-13.0.06.hex: All Plane 0 glyphs
  - unifont_sample-13.0.06.hex: The above .hex file with combining circles added

Downloading and diffing the two yields all code-points with combining
circles. I assume they are exactly the combining characters I care
about.

One mechanical difficulty is cross-correlating the above files that
include the code-point in each line with font.subx which does not. I got
things to work by modifying the above files in place until they have the
same format as font.subx, using the following Vim commands on each file:

  :%s|.\{64\}|10/size^M00/is-combine^M&|
  :%s|^.\{32\}$|08/size^M00/is-combine^M&00000000000000000000000000000000|
  :%s|..|& |g
  :%s|10 /s iz e|10/size|
  :%s|08 /s iz e|08/size|
  :%s|00 /i s- co mb in e|00/is-combine|

Now I can update the metadata with a Vim macro which jumps to the next
hunk and increments /is-combine on the previous line.
2021-08-31 23:03:34 -07:00
Kartik K. Agaram b8afd4becf start hacky experiment to support combining chars
https://en.wikipedia.org/wiki/Combining_character

The plan: just draw the combining character in the same space as the
previous character. This will almost certainly not work for some Unicode
blocks (tibetan?)

This commit only changes the data/memory/disk model to make some space.
As always in Mu, we avoid bit-mask tricks even if that wastes memory.
2021-08-31 23:03:34 -07:00
Kartik K. Agaram 0633e401f9 . 2021-08-29 20:43:57 -07:00
Kartik K. Agaram b1dcfb03d0 load Font in a non-contiguous area of memory 2021-08-29 20:34:53 -07:00
Kartik K. Agaram efae02cf11 . 2021-08-29 11:21:09 -07:00
Kartik K. Agaram 1b18ec6ee9 import a few more unicode blocks from Unifont
shell/ is currently broken; we've overflowed available contiguous space
for code.

Block names based on https://www.compart.com/en/unicode/block:
  0x0000 - 0x007f Basic Latin 128
  0x0080 - 0x00ff Latin-1 Supplement 128
  0x0100 - 0x017f Latin Extended-A 128
  0x0180 - 0x024f Latin Extended-B 208
  0x0250 - 0x02af IPA Extensions 96
  0x02b0 - 0x02ff Spacing Modifier Letters 80
  0x0300 - 0x036f Combining Diacritical Marks 112
  0x0370 - 0x03ff Greek and Coptic 135
  0x0400 - 0x04ff Cyrillic 256
  0x0500 - 0x052f Cyrillic Supplement 48
  0x0530 - 0x058f Armenian 91
  0x0590 - 0x05ff Hebrew 88
  0x0600 - 0x06ff Arabic 255
  0x0700 - 0x074f Syriac 77
  0x0750 - 0x077f Arabic Supplement 48
  0x0780 - 0x07bf Thaana 50
  0x07c0 - 0x07ff NKo 62
  0x0800 - 0x083f Samaritan 61
  0x0840 - 0x085f Mandaic 29
  0x0860 - 0x086f Syriac Supplement 11
  0x08a0 - 0x08ff Arabic Extended-A 84
  0x0900 - 0x097f Devanagari 128
  0x0980 - 0x09ff Bengali 96
  0x0a00 - 0x0a7f Gurmukhi 80
  0x0a80 - 0x0aff Gujarati 91
  0x0b00 - 0x0b7f Oriya 91
  0x0b80 - 0x0bff Tamil 72
  0x0c00 - 0x0c7f Telugu 98
  0x0c80 - 0x0cff Kannada 89
  0x0d00 - 0x0d7f Malayalam 118
  0x0d80 - 0x0dff Sinhala 91
  0x0e00 - 0x0e7f Thai 87
  0x0e80 - 0x0eff Lao 82
  0x0f00 - 0x0fff Tibetan 211
  0x1000 - 0x109f Myanmar 160
  0x10a0 - 0x10ff Georgian 88

But don't trust the block sizes above. Thanks to gdb[1] for this helper:

define z
  print 2 * (0x$arg1 - 0x$arg0 + 1)
end

e.g:
  (gdb) z 10a0 10ff
  192

[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Define.html
2021-08-29 11:20:47 -07:00
Kartik K. Agaram 79e2569f1a font data structure now supports 16-bit glyphs
We can't yet render the latter 8 bits.
2021-08-28 21:11:45 -07:00
Kartik K. Agaram 2c87cd2f34 reorganize font before adding non-ASCII 2021-08-27 08:41:15 -07:00