Adds web support in lynx mode #60

Merged
sloum merged 7 commits from lynx-web-mode into develop 2019-10-27 14:15:57 +00:00
Owner

This adds a new config option (lynxmode) that takes a boolean value to turn on or off. If set to "true" && openhttp is also set to "true" && lynx is installed, the web page will be rendered by lynx via the dump command then its links will be parsed and it will be displayed in Bombadillo. This works surprisingly well. Forms do not work at all and a lot of interface heavy websites are crap this way... but it works amazingly well with other protocols linking to articles.

Try it out by going here: gemini://typed-hole.org:1965/lobsters/lobsters.gemini

The links to articles on that gemini page are all web links (the links to comments remain in gemeini). It is surprisingly cool to be able to browse lobsters from Bombadillo.

This was a pretty simple addition and I believe I covered the error cases well. If lynx is not installed, the user will be notified that lynx could not be found on their $PATH. If openhttp is set to false no web links will be followed. If openhttp is true, terminalonly is true, and lynxmode is false the link will not be opened and the user will be told to enable lynx mode if they want to view web links in a terminal setting. This solves our issue of opening a browser in a terminal only environment (if they have lynx installed anyway).

This does shift the non-web client title I have been using. I still think it fits as the web is clearly a second class citizen here, but having support for it makes Bombadillo a MUCH more complete client with a far wider appeal. The fact that it does not render its own web content I think still fits the spirit of things.

Let me know what you think...

This adds a new config option (`lynxmode`) that takes a boolean value to turn on or off. If set to "true" && openhttp is also set to "true" && lynx is installed, the web page will be rendered by lynx via the dump command then its links will be parsed and it will be displayed in Bombadillo. This works surprisingly well. Forms do not work at all and a lot of interface heavy websites are crap this way... but it works amazingly well with other protocols linking to articles. Try it out by going here: gemini://typed-hole.org:1965/lobsters/lobsters.gemini The links to articles on that gemini page are all web links (the links to comments remain in gemeini). It is surprisingly cool to be able to browse lobsters from Bombadillo. This was a pretty simple addition and I believe I covered the error cases well. If lynx is not installed, the user will be notified that lynx could not be found on their $PATH. If `openhttp` is set to false no web links will be followed. If `openhttp` is true, `terminalonly` is true, and `lynxmode` is false the link will not be opened and the user will be told to enable lynx mode if they want to view web links in a terminal setting. This solves our issue of opening a browser in a terminal only environment (if they have lynx installed anyway). This does shift the `non-web` client title I have been using. I still think it fits as the web is clearly a second class citizen here, but having support for it makes Bombadillo a MUCH more complete client with a far wider appeal. The fact that it does not render its own web content I think still fits the spirit of things. Let me know what you think...
asdf was assigned by sloum 2019-10-23 05:10:31 +00:00
sloum added the
enhancement
label 2019-10-23 05:10:31 +00:00
sloum added the
documentation
http
labels 2019-10-23 05:31:50 +00:00
Collaborator

I've had a quick look at this now. Very cool.

Showing the link id inline is an interesting difference. Is it at all possible to underline the URL text? Like:

  • Here is a [1]link with an underline on the text

I notice that when trying to view link with id 6 from gopher://bombadillo.colorfield.space:70/1/user-guide.map lynx returns exit status 1. It's hard to tell why this is without digging in further, which I can do tomorrow.

I've had a quick look at this now. Very cool. Showing the link id inline is an interesting difference. Is it at all possible to underline the URL text? Like: - Here is a [1]<u>link</u> with an underline on the text I notice that when trying to view link with id 6 from gopher://bombadillo.colorfield.space:70/1/user-guide.map lynx returns exit status 1. It's hard to tell why this is without digging in further, which I can do tomorrow.
Author
Owner

I've had a few pages do that. In the case of that one running :check 6 lets me know that the link is malformed. Must be a problem with the tabs on that gophermap. I use vim and sometimes forget to turn on real tabs when editing gophermaps. Though it did render the gophermap properly. I'll dig in more. It does NOT seem to be an issue with the http module though.

I've had a few pages do that. In the case of that one running `:check 6` lets me know that the link is malformed. Must be a problem with the tabs on that gophermap. I use vim and sometimes forget to turn on real tabs when editing gophermaps. Though it did render the gophermap properly. I'll dig in more. It does NOT seem to be an issue with the http module though.
Author
Owner

As to the underline, not really with the current screen rendering :( The escape code would be treated as extra characters by the wrapping. So even if we did scan/parse it and add them the wrapping would get messed up and it could mess up the bookmarks bar too if the line was cut off after starting an underline but not finishing one before the bookmark bar started.

As to the underline, not really with the current screen rendering :( The escape code would be treated as extra characters by the wrapping. So even if we did scan/parse it and add them the wrapping would get messed up and it could mess up the bookmarks bar too if the line was cut off after starting an underline but not finishing one before the bookmark bar started.
Author
Owner

Ok. Did a little research. The link in question had some tab issues. I fixed that header on the whole gopherhole and then added a recent devlog that had not been updated.

👍

Ok. Did a little research. The link in question had some tab issues. I fixed that header on the whole gopherhole and then added a recent devlog that had not been updated. :thumbsup:
Collaborator

Looks like images can be read in by bombadillo, which tries to display them... For example, link 6 on bombadillo.colorfield.space is a png, when opened bombadillo displays the file contents.

I'm curious about how this could be handled, but don't have any solid suggestions.

  • Lynx could not send links for images or other binary files? I couldn't see a way to do this from the man page.
  • Ignore links based on file extension? Probably easy to do, but might be hard to determine which extensions are OK and which aren't.
  • Use the mime type to identify viewable documents (`lynx -head -dump URL)
  • Another way to identify that stdin is providing non-text data?
Looks like images can be read in by bombadillo, which tries to display them... For example, link 6 on bombadillo.colorfield.space is a png, when opened bombadillo displays the file contents. I'm curious about how this could be handled, but don't have any solid suggestions. - Lynx could not send links for images or other binary files? I couldn't see a way to do this from the man page. - Ignore links based on file extension? Probably easy to do, but might be hard to determine which extensions are OK and which aren't. - Use the mime type to identify viewable documents (`lynx -head -dump URL) - Another way to identify that stdin is providing non-text data?
Author
Owner

The easiest path forward for that may be to just let them get read in, but fix file writing for http (I dont think it currently works) so that users can download anything that doesn't display how they would expect.

I do think that http really is just in general a second class citizen in the client. I am REALLY psyched that it is now possible to do basic reading of content. I had not really planned on even having links be a thing that were supported for http... but since lynx aoutomatically adds the numbers I went with it. I believe there is a flag to kill the numbers and we could just not do links... but that feels wasteful since it already works.

Since users can always look at where a link goes to using check, I think fixing writing of files should be enough. If you have a different route that you would like pursued I am open to alternatives.

The easiest path forward for that may be to just let them get read in, but fix file writing for http (I dont think it currently works) so that users can download anything that doesn't display how they would expect. I do think that http really is just in general a second class citizen in the client. I am REALLY psyched that it is now possible to do basic reading of content. I had not really planned on even having links be a thing that were supported for http... but since lynx aoutomatically adds the numbers I went with it. I believe there is a flag to kill the numbers and we could just not do links... but that feels wasteful since it already works. Since users can always look at where a link goes to using `check`, I think fixing writing of files should be enough. If you have a different route that you would like pursued I am open to alternatives.
Author
Owner

Rereading your above comment. The head dump route could be an alternate possibility and for that one just automatically download the file? I do not love the idea of having to run two subprocess requests for every http request though :-/

Rereading your above comment. The head dump route could be an alternate possibility and for that one just automatically download the file? I do not love the idea of having to run two subprocess requests for every http request though :-/
Collaborator

I see what you mean, and agree with a minimal-effort approach.

If there is something that can be done about this, ideally only text would be displayed, and any non-text is handled in the same way as it would be handled for gopher or gemini (downloaded, or prompt the user).

I have been looking to see what other options we might have. This is what I can see so far:

  • Check the head for content type before doing anything
  • This can be done using lynx -dump -head URL, or using http.DetectContentType()
  • Use lynx to dump the URL, then check the content type
  • http.DetectContentType() could be used on whatever is read in
  • Commands like file or xdg-mime can be used to identify a file's content type
  • There are some Go libraries that implement file's functionality
  • There is also mime in the standard library but I don't think it does what we want

Hopefully this will give you some ideas for how this could be addressed - let me know what you think.

I see what you mean, and agree with a minimal-effort approach. If there is something that can be done about this, ideally only text would be displayed, and any non-text is handled in the same way as it would be handled for gopher or gemini (downloaded, or prompt the user). I have been looking to see what other options we might have. This is what I can see so far: - Check the head for content type before doing anything - This can be done using `lynx -dump -head URL`, or using `http.DetectContentType()` - Use lynx to dump the URL, then check the content type - `http.DetectContentType()` could be used on whatever is read in - Commands like `file` or `xdg-mime` can be used to identify a file's content type - There are some Go libraries that implement `file`'s functionality - There is also `mime` in the standard library but I don't think it does what we want Hopefully this will give you some ideas for how this could be addressed - let me know what you think.
Author
Owner

I just tried out running the -dump -head flags. They work well and could be really easily parsed... but it was slow to get the response. So going that way will require two request/response cycles that might slow things down a lot. The other option is, of course, sniffing the mime from the content that was acquired already, but that also presents issues. I'm inclined to go with the first way. I have to crash out now, but I will try to code something up tomorrow night.

I just tried out running the `-dump -head` flags. They work well and could be really easily parsed... but it was slow to get the response. So going that way will require two request/response cycles that might slow things down a lot. The other option is, of course, sniffing the mime from the content that was acquired already, but that also presents issues. I'm inclined to go with the first way. I have to crash out now, but I will try to code something up tomorrow night.
Author
Owner

Ok. I went ahead and made some semblance of this work. Here are the basic details:

  1. Before any http(s) request with lynxmode on we first check for the word "text" in the content-type header (kind of a shortcut, but it should be fine).
  2. If "text" is present in the content-type header, we use lynx to dump the page; then we parse the links and put it on the screen.
  3. If "text" is not present, we download the file.

I had a bunch of logic in for opting out of the download and all kinds of other stuff, but browsers these days tend to just download files and you can delete them if you want. I am pretty fine with this I think. It is consistent with how the gopher protocol handler deals with non-text files and relates to the note below about gemini and mailcap.

I also reworked the GIANT Visit() method. It now serves as more of a controller/router and sends the parsed url, if it is for a valid scheme, to a protocol handler in the format of func handle[Scheme](u Url). This feels better from a reading and working in the code perspective, though is largely the same functionally. I think it will be easier to manage long term.

Give it a whirl and let me know if you spot any issues.

.
.
.

On a secondary note:

I have thought about it a bit, and I think I would like to remove the mailcap dependency and just treat non-text as something to be downloaded. It feels weird for gemini to do something different than every other protocol and I dont want to add more bulk into http or gopher by hooking it up there. It also only really works well in a graphical environment since we aren't sub-processing out every possible command. I think cutting it will simplify the code a bit, reduce dependencies, and not reduce usability of the software almost at all. What do you think of this idea? If you are for it, I will open a separate PR covering that (maybe an issue to tie it to?)

Anyway, if you want to think about the mailcap thing, no worries. But let me know about the http stuff when you can. I think we are so close to a release here. Maybe this is better suited for e-mail, but I'll bring it up here:

  • Are there things you want to see in the 2.0.0 release that are not done or in progress currently?
  • What is our "ready to ship" point?
  • It is arbitrary, but I would like to have 2.0.0 out no later than end of year, but preferably much sooner (november/december).
Ok. I went ahead and made some semblance of this work. Here are the basic details: 1. Before any http(s) request with lynxmode on we first check for the word "text" in the content-type header (kind of a shortcut, but it should be fine). 2. If "text" is present in the content-type header, we use lynx to dump the page; then we parse the links and put it on the screen. 3. If "text" is not present, we download the file. I had a bunch of logic in for opting out of the download and all kinds of other stuff, but browsers these days tend to just download files and you can delete them if you want. I am pretty fine with this I think. It is consistent with how the gopher protocol handler deals with non-text files and relates to the note below about gemini and mailcap. I also reworked the GIANT `Visit()` method. It now serves as more of a controller/router and sends the parsed url, if it is for a valid scheme, to a protocol handler in the format of `func handle[Scheme](u Url)`. This feels better from a reading and working in the code perspective, though is largely the same functionally. I think it will be easier to manage long term. Give it a whirl and let me know if you spot any issues. . . . On a secondary note: I have thought about it a bit, and I think I would like to remove the mailcap dependency and just treat non-text as something to be downloaded. It feels weird for gemini to do something different than every other protocol and I dont want to add more bulk into http or gopher by hooking it up there. It also only really works well in a graphical environment since we aren't sub-processing out every possible command. I think cutting it will simplify the code a bit, reduce dependencies, and not reduce usability of the software almost at all. What do you think of this idea? If you are for it, I will open a separate PR covering that (maybe an issue to tie it to?) Anyway, if you want to think about the mailcap thing, no worries. But let me know about the http stuff when you can. I think we are so close to a release here. Maybe this is better suited for e-mail, but I'll bring it up here: - Are there things you want to see in the 2.0.0 release that are not done or in progress currently? - What is our "ready to ship" point? - It is arbitrary, but I would like to have 2.0.0 out no later than end of year, but preferably much sooner (november/december).
asdf requested changes 2019-10-27 01:37:00 +00:00
@ -234,3 +243,3 @@
.B
terminalonly
Sets whether or not to try to open non-text files served via gemini in gui programs or not. If set to fItruefP, bombdaillo will only attempt to use terminal programs to open files. If set to anything else, fBbombadillofP may choose from the appropriate programs installed on the system, if one is present.
Sets whether or not to try to open non-text files served via gemini in GUI programs or not. If set to fItruefP fBbombadillofP will only attempt to use terminal programs to open files. If set to fIfalsefP fBbombadillofP may choose from the appropriate programs installed on the system, including graphical ones.
Collaborator

This section will need review once the discussion around Mailcap is complete.

This section will need review once the discussion around Mailcap is complete.
Collaborator

Firstly - I requested changes but didn't realise it gives you a big red X. That's not the way I meant to have it communicated!

I've made some small changes to the man page - spelling corrections and I reworded the http section. My changes are a bit opinionated though, so let me know if any issues.

As marked, the documentation for terminalonly setting will need further review because it impacts http/https. This will probably wait until mailcap/downloads is addressed, and can be a different PR even.

http looks good and seems to work well. The use of the net/http library in Fetch() is interesting, and makes me think there are a lot of different ways we could approach web...but this is really more than enough for 2.0.0. I really like the way this works as it is.

Changing Visit() to the various handle*() functions is very satisfying :)

On mailcap/downloads:

  • removing mailcap and just doing automatic downloads of non-text data for all protocols would be ok for now. Or, no automatic downloads, but prompt the user like 'non-text data. use :w 1 to download`.
  • After 2.0.0, we could look at a unified method to deal with non-text data that could include mailcap or xdg-open or whatever.

As to what goes in to v2.0.0 - I agree sooner rather than later. Maybe we should be strict, and give ourselves a fixed time (maybe a week or two) to raise any issues for 2.0.0 and mark them as such. Then another fixed time for the release once we quantify the work, pushing back anything non-urgent until after then. I've added a milestone to test this out.

Firstly - I requested changes but didn't realise it gives you a big red X. That's not the way I meant to have it communicated! I've made some small changes to the man page - spelling corrections and I reworded the http section. My changes are a bit opinionated though, so let me know if any issues. As marked, the documentation for `terminalonly` setting will need further review because it impacts http/https. This will probably wait until mailcap/downloads is addressed, and can be a different PR even. http looks good and seems to work well. The use of the `net/http` library in `Fetch()` is interesting, and makes me think there are a lot of different ways we could approach web...but this is really more than enough for 2.0.0. I really like the way this works as it is. Changing `Visit()` to the various `handle*()` functions is very satisfying :) On mailcap/downloads: - removing mailcap and just doing automatic downloads of non-text data for all protocols would be ok for now. Or, no automatic downloads, but prompt the user like 'non-text data. use :w 1 to download`. - After 2.0.0, we could look at a unified method to deal with non-text data that could include mailcap or xdg-open or whatever. As to what goes in to v2.0.0 - I agree sooner rather than later. Maybe we should be strict, and give ourselves a fixed time (maybe a week or two) to raise any issues for 2.0.0 and mark them as such. Then another fixed time for the release once we quantify the work, pushing back anything non-urgent until after then. I've added a milestone to test this out.
Author
Owner

No worries regarding the request changes X. :)

The changes to the man page all look good to me. I really should start running things I write through some form of spell check, lol. Some good catches/changes. Thanks!

I'm glad it is working well for you. I have really enjoyed its presence and it makes me able to do much more in Bombadillo. On a whim I decided to try using golang's native access to web content for fetch rather than routing through lynx. Since no parsing is required it felt more reliable and direct. I agree, this is something that can be reviewed more going forward but should suffice for the 2.0.0 release.

I'm glad you approve of the Visit restructure. It feels SO much more readable now to me.

I think removing mailcap is probably the good call. It was a fun idea, but in practice is unreliable and buggy. At least for the time being I think the simpler approach would be better. I'll try to put together a separate PR removing that feature soon.

Very cool re: milestones. I had seen that in the UI but have never used it. Seems interesting. I think the plan you have suggested makes sense. Lets take one (?) week to just identify issues/bugs/features we would like (or like fixed) to for sure make it into the release. Once we have the list we can get an estimate of how long all of it might take. I think we are pretty close to where we want to be so I do not imagine it will be a whole TON of stuff, or I hope not at least.

A few of the current issues will be closed by 2.0.0 being released (adding the -v flag for example), should I close them in anticipation? Or wait until the release is officially on the main branch?

No worries regarding the request changes X. :) The changes to the man page all look good to me. I really should start running things I write through some form of spell check, lol. Some good catches/changes. Thanks! I'm glad it is working well for you. I have really enjoyed its presence and it makes me able to do much more in Bombadillo. On a whim I decided to try using golang's native access to web content for `fetch` rather than routing through lynx. Since no parsing is required it felt more reliable and direct. I agree, this is something that can be reviewed more going forward but should suffice for the 2.0.0 release. I'm glad you approve of the `Visit` restructure. It feels SO much more readable now to me. I think removing mailcap is probably the good call. It was a fun idea, but in practice is unreliable and buggy. At least for the time being I think the simpler approach would be better. I'll try to put together a separate PR removing that feature soon. Very cool re: milestones. I had seen that in the UI but have never used it. Seems interesting. I think the plan you have suggested makes sense. Lets take one (?) week to just identify issues/bugs/features we would like (or like fixed) to for sure make it into the release. Once we have the list we can get an estimate of how long all of it might take. I think we are pretty close to where we want to be so I do not imagine it will be a whole TON of stuff, or I hope not at least. A few of the current issues will be closed by 2.0.0 being released (adding the `-v` flag for example), should I close them in anticipation? Or wait until the release is officially on the main branch?
Collaborator

Good question about when to close issues. For most issues, those raised by contributors, the issue could be closed with the PR. The milestone counts the number of open issues, so it can be used to track progress.

Thinking about issues raised by end users as those requiring the most attention, we might wait for confirmation before closing the issue, or close with the expectation that we're waiting on further feedback and that it can be reopened. Typically it would not need to remain open until develop is merged in to master, as we could probably update that issue letting the person know it is now part of version X.

Good question about when to close issues. For most issues, those raised by contributors, the issue could be closed with the PR. The milestone counts the number of open issues, so it can be used to track progress. Thinking about issues raised by end users as those requiring the most attention, we might wait for confirmation before closing the issue, or close with the expectation that we're waiting on further feedback and that it can be reopened. Typically it would not need to remain open until develop is merged in to master, as we could probably update that issue letting the person know it is now part of version X.
asdf approved these changes 2019-10-27 11:12:25 +00:00
asdf left a comment
Collaborator

Man page to be updated in another PR addressing mailcap. Lynx mode tested and reviewed, working OK.

Man page to be updated in another PR addressing mailcap. Lynx mode tested and reviewed, working OK.
Author
Owner

Sounds good. I believe I have commented on, for example, the version identification one. I will close it and add a note.

I will also merge this in and get a new one set up for the mailcap library thing.

Sounds good. I believe I have commented on, for example, the version identification one. I will close it and add a note. I will also merge this in and get a new one set up for the mailcap library thing.
sloum closed this pull request 2019-10-27 14:15:56 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sloum/bombadillo#60
No description provided.