<85> in document creates rendering issue #199
Labels
No Label
blocked
bug
build
documentation
duplicate
enhancement
finger
gemini
gopher
help wanted
http
in progress
invalid
local
needs-info
non-code
non-functional
non-urgent
question
release
rendering
suggestion
telnet
terminal
urgent
wontfix
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sloum/bombadillo#199
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Documents containing a special character
<85>
are not rendered correctly.Using Bombadillo 2.3.1 as well as release2.3.3
Steps to reproduce:
gemini://gemini.conman.org:1965/boston/2020/11/03.3
gemini://gemini.conman.org:1965/boston/2020/11/03.2
gemini://gemini.conman.org:1965/boston/2020/11/03.1
gemini://gemini.conman.org:1965/boston/2020/10/12.1
Viewing the documents in vim shows a special character
<85>
. I think this isU+0085
, orNEL
, the next line symbol.In page.go, we can get rid of characters we don't want. Adding
'\u0085'
to this list makes rendering work. This probably isn't good enough, as the character is meant to be a line break, but I'm not sure exactly how it is meant to be represented yet.Interesting. I had not been familiar with that character. I wonder if there is a list somewhere of characters of that sort (that are not printed characters themselves, but modify the output).
This article has an extremely detailed description of unicode line breaks. There's also the wikipedia article. I've only partially read these they are very long.
I don't think there is anything builtin to identify these type of characters -
unicode.IsSpace
might be the closest.This question shows a similar problem and how these are identified in Java.
Need to read more.
This section might be the most relevant:
https://en.wikipedia.org/wiki/Newline#Unicode
The rune literals in Go treat unicode representations like
'u000A'
as equal to'\n'
, so we already handle most of these inWrapContent
but mostly by ignoring them. We do not handle the following items:As we are ignoring most of the others, should we also ignore these? It seems like it might be just as complex to implement them.
Awesome! That is great news that Go treats them as equal to
\n
. I think we should print a newline for any of the above three characters. THere should be a rune for them, right? If so it should be either an&&
within anelse if
or anothercase
within aswitch
(cant remember what is happening there to know which one).Is there any downside you can think of to treating them like
\n
? If we do this as part of the line wrapping it means if someone downloads the file they will still get the original characters as intended (which is good).I've done a WIP PR on this to help with the explanation.
Just to try to be clearer regarding your first point, the unicode line terminators from Wikipedia are just
\n
,\f
,\v
,\r
and\r\n
, plus the three I highlighted -NEL
,LS
andPS
, but using a unicode reference. I was just confused, but literally the unicode reference for line feed isu000A
and'u000A' == '\n'
isTrue
in Go. That isn't related to NEL, LS or PS, Go doesn't treat them as equal to\n
. Sorry!Your last point about downsides, do you mean like a loss of fidelity? If so, not in a way we don't already do. A good equivalent for this is supporting
\v
instead of ignoring it. We wouldn't actually use\v
when printing to the screen, but approximate how it would look using spaces and\n
.The main downside is that it's a complicated topic for a rare occurrence. As noted on Wikipedia: "Recognizing and using the newline codes greater than 0x7F (NEL, LS and PS) is not often done". gnome-terminal and st have only implemented
NEL
as a line terminator,LS
andPS
aren't rendered. But at least we are learning something.Just a note for future reference that NEL, LS and PS are implemented, each as a single line ending.