Commit Graph

872 Commits

Author SHA1 Message Date
Hiltjo Posthuma bf6d74e097 sfeed_mbox.1: add examples to this man page aswell 2021-08-04 10:30:26 +02:00
Hiltjo Posthuma 702fd51930 man page improvements 2021-08-03 20:43:54 +02:00
Hiltjo Posthuma 7199cd10a9 code-style: use a newline before return in main() 2021-08-03 20:43:54 +02:00
Hiltjo Posthuma 93044ecd31 man page improvements
- Some rewording and typo fixes.
- Specify in more detail how sfeed_web detects links from HTML code.
2021-07-25 11:46:44 +02:00
Hiltjo Posthuma 8a07c4a3d0 sfeed_{web,xmlenc}.1: use my site as an example 2021-07-24 22:46:05 +02:00
Hiltjo Posthuma ae4efcfe05 sfeed_update.1: just use ~/ instead of $HOME consistently in examples 2021-07-22 16:58:50 +02:00
Hiltjo Posthuma 7c78c67806 code-style: change gmtime to the reentrant/thread-safe gmtime_r
No functional or performance difference (intended) because these programs are
not threaded.
2021-07-19 12:50:44 +02:00
Hiltjo Posthuma 7086670e43 sfeed.c: parsetime: support short digit years for RSS pubDate fields (RFC822)
RSS (pubDate) uses RFC822 dates. This standard is obsoleted by RFC2822.

The RSS 2.0 spec says for the pubDate field:

"[...] All date-times in RSS conform to the Date and Time Specification of RFC
822, with the exception that the year may be expressed with two characters or
four characters (four preferred)."

RFC822 section 5.1 describes the syntax with 2 digit years:
https://datatracker.ietf.org/doc/html/rfc822#section-5.1

It was obsoleted/fixed in RFC2822 section 4.3:
https://datatracker.ietf.org/doc/html/rfc2822#section-4.3
"  Where a two or three digit year occurs in a date, the year is to be
   interpreted as follows: If a two digit year is encountered whose
   value is between 00 and 49, the year is interpreted by adding 2000,
   ending up with a value between 2000 and 2049.  If a two digit year is
   encountered with a value between 50 and 99, or any three digit year
   is encountered, the year is interpreted by adding 1900."

In the real world I've seen all sites using RSS use the 4-digit format.

For historic context of changes and what feeds it might affect:

- RFC822 was published in 13 august 1982, obsoleted by RFC2822.
- RFC2822 was published in april 2001, obsoleted by RFC5322.
- RFC5322 was published in october 2008.
- RDF was started around 1996. It was published around 2004.
- March 15, 1999: RSS 0.90 (Netscape), published by Netscape and authored by
  Ramanathan Guha.
- July 10, 1999: RSS 0.91 (Netscape), published by Netscape and authored by Dan
  Libby.
- June 9, 2000: RSS 0.91 (UserLand), published by UserLand Software and
  authored by Dave Winer.
- Dec. 25, 2000: RSS 0.92, UserLand.
- Aug. 19, 2002: RSS 2.0, UserLand.
- July 15, 2003: RSS 2.0 (version 2.0.1), published by the Berkman Center for
  Internet & Society at Harvard Law School and authored by Dave Winer.
- July 15, 2003: RSS 2.0 (version 2.0.1-rv-1), published by the RSS Advisory
  Board.
- July 17, 2003: RSS 2.0 (version 2.0.1-rv-2), RSS Advisory Board.
- April 6, 2004: RSS 2.0 (version 2.0.1-rv-3), RSS Advisory Board.
- May 31, 2004: RSS 2.0 (version 2.0.1-rv-4), RSS Advisory Board.
- June 19, 2004: RSS 2.0 (version 2.0.1-rv-5), RSS Advisory Board.
- January 25, 2005: RSS 2.0 (version 2.0.1-rv-6), RSS Advisory Board.
- Aug. 12, 2006: RSS 2.0 (version 2.0.8), RSS Advisory Board.
- June 5, 2007: RSS 2.0 (version 2.0.9), RSS Advisory Board.
- Oct. 15, 2007: RSS 2.0 (version 2.0.10), RSS Advisory Board.
- March 30, 2009 (current): RSS 2.0 (version 2.0.11), RSS Advisory Board.

RSS history source: https://www.rssboard.org/rss-history
2021-07-11 13:29:12 +02:00
Hiltjo Posthuma 82db1194f3 bump version to 0.9.25 2021-07-10 18:39:56 +02:00
Hiltjo Posthuma 57daf99ec7 sfeed_web.1: fix typo: url -> URL 2021-07-07 18:14:22 +02:00
Hiltjo Posthuma ed8079dc3e sfeed_mbox: add option to print content
- Add SFEED_MBOX_CONTENT environment option. When set to "1" it outputs the
  content aswell.  This is disabled by default for security reasons, because many
  clients handle HTML in an insecure way.

- Print link and enclosure on one line and align them.
2021-07-06 18:27:28 +02:00
Hiltjo Posthuma 15983fa731 sfeedrc.5: add an example how to override the options in the man page aswell 2021-07-06 18:23:20 +02:00
Hiltjo Posthuma 3034162414 sfeed.{1,5}: number fields in the man page
This makes it slightly easier to lookup fields and map the fields by field
number in scripts (awk, cut) etc.
2021-07-06 18:21:34 +02:00
Hiltjo Posthuma c34c9185c0 README.xml: remove newline before EOF 2021-07-06 18:20:03 +02:00
Hiltjo Posthuma c128eff86f README: add a simplified version of printing the first enclosure
This works on sfeed(5) feed output since they are already sorted.
2021-07-06 18:19:03 +02:00
Hiltjo Posthuma 26921d36c8 sfeed: change comment which reflects printing relative URLs behaviour
This URL printing behaviour was changed recently in commit
f305b032bc
2021-07-06 18:17:14 +02:00
Hiltjo Posthuma 70b1fdee92 sfeed: printtrimmed function does not change or modify the buffer
Make it const char *.
2021-07-06 18:15:36 +02:00
Hiltjo Posthuma ee6016a10e README: fix typo in a comment 2021-06-05 20:29:36 +02:00
Hiltjo Posthuma 194794a534 Makefile: switch to use CPPFLAGS -D_DEFAULT_SOURCE
This fixes a warning on Linux glibc:

/usr/include/features.h:187:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
  187 | # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
      |   ^~~~~~~

Tested on Void GNU/Linux glibc with gcc. Tested on various other platforms for
regressions too namely: OpenBSD, NetBSD, FreeBSD, Void GNU/Linux musl.
2021-06-05 14:45:16 +02:00
Hiltjo Posthuma bd20dca4e1 README: fix escape sequence which is non-POSIX
The "\s" escape sequence is non-POSIX and GNU awk gives a warning:

	gawk: cmd. line:69: warning: escape sequence `\s' treated as plain `s'

BSD awk does not give this warning and supports it.
Use the POSIX [[:space:]] character class instead.

References:
- https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
  The table in the section "Regular Expressions".
- https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05
2021-06-05 14:31:40 +02:00
Hiltjo Posthuma d254a6ac17 bump version to 0.9.24 2021-06-03 17:02:38 +02:00
Hiltjo Posthuma 3efa3b0e26 util.c: err() do not print colon formatted
Most common-used compilers (gcc, clang) optimize this away though.
2021-06-01 18:58:13 +02:00
Hiltjo Posthuma bb25cc51a8 sfeed_gopher: unveil: show path when it failed 2021-06-01 18:33:29 +02:00
Hiltjo Posthuma 55ac2338fc portability and standards: add BSD-like err() and errx() functions
These are BSD functions.
- HaikuOS now compiles without having to use libbsd.
- Tested on SerenityOS (for fun), which doesn't have these functions (yet).
  With a small change to support wcwidth() sfeed works on SerenityOS.
2021-06-01 18:21:08 +02:00
Hiltjo Posthuma cbb82666e0 sfeed_frames.1/sfeed_html.1: reference the style.css example file 2021-05-30 12:06:58 +02:00
Hiltjo Posthuma b723fbcd98 sfeed_opml_export: sync loadconfig() function fixes from sfeed_update
- Do not show stderr of readlink.
- Show the reference to the example sfeedrc (like sfeed_update).
- Make the error message a bit shorter.
- Fix showing the path if it does not exist, for example:

	$ sfeed_opml_export "a"
	readlink: a: No such file or directory
	Configuration file "" does not exist or is not readable.

Now shows:

	$ sfeed_opml_export "a"
	Configuration file "a" cannot be read.
	See sfeedrc.example for an example.
2021-05-29 15:14:33 +02:00
Hiltjo Posthuma 02b590f6a2 sfeed_frames/sfeed_html: show the total counts and improve the title format
This title format now matches the one with sfeed_curses. It shows the count to
the most left and makes it more readable imho. It also works better when the
titlebar is small.
2021-05-27 12:35:00 +02:00
Hiltjo Posthuma 1a90add12e sfeed_update: fix message when the configuration file does not exist
When sfeed_update was called without using a parameter and it used the default
and this path did not exist it would incorrectly print:

	Configuration file "" does not exist or is not readable.
	See sfeedrc.example for an example.

Make the error message a bit shorter too.

This was a partial regression of commit df74ba274c
2021-05-27 12:34:24 +02:00
Hiltjo Posthuma f2c8685cc0 bump version to 0.9.23 2021-04-29 18:06:45 +02:00
Hiltjo Posthuma f34d060ef5 Makefile: fix typo in comment 2021-04-28 19:01:04 +02:00
Hiltjo Posthuma bf1b35d4a9 fixup: a regression with RSS guid, by default ispermalink="true" 2021-04-28 18:26:57 +02:00
Hiltjo Posthuma 4c2b939bbb use the last href attribute value if there are multiple set
Input to reproduce:

	<entry>
	<link href="https://codemadness.org/a" href="https://codemadness.org/b"/>
	</entry>

Old value:

	"https://codemadness.org/ahttps://codemadness.org/b"

New value:

	"https://codemadness.org/b"

same with RSS <enclosure url="" />
2021-04-28 18:26:57 +02:00
Hiltjo Posthuma 789cc616a9 add support for old/legacy Atom 0.3 feeds
This standard was a draft used around 2005-2006.

Instead of the fields "published" and "updated" it used "issued" (mandatory
field) and "modified" (optional). Add support for them and also in preference
of supporting Atom 1.0 and creation dates first.

I don't know any real-life examples that still use this though.

Some references:
- http://rakaz.nl/2005/07/moving-from-atom-03-to-10.html
- https://www.dokuwiki.org/syndication (rss_type "atom" parameter value).
- https://support.google.com/merchants/answer/160598?hl=en
2021-04-28 18:26:57 +02:00
Hiltjo Posthuma bb3aa63579 sfeed.{1,5}: improve documentation, the content-type field can be empty...
... if there is no content.
2021-04-28 18:26:57 +02:00
Hiltjo Posthuma a211bea6a5 enable unlocked I/O by default
getchar_unlocked is part of POSIX and should be supported by most platforms. On
all tested platforms it has a performance benefit, sometimes smallish (<12%),
sometimes large (~40%).
2021-04-28 18:26:57 +02:00
Hiltjo Posthuma 675cfe6a73 README: update newsboat export script
Since newsboat version 2.22 (2020-12-21) it stores the content mime-type of a
field so allow to export this.

The older entries are empty and will be exported as "html" (even though they
might have been plain-text).

... also add the (empty) category field.
2021-04-28 18:26:57 +02:00
Hiltjo Posthuma 8ad3f119b2 improve "ispermalink", "rel" and "type" attribute handling/buffering 2021-04-28 18:26:49 +02:00
Hiltjo Posthuma 9f61f0682f improve content-type "type" attribute handling/buffering 2021-04-28 18:26:46 +02:00
Hiltjo Posthuma 3ea5e988ed sfeed.c: detect the proper mime-type for XHTML
Reference:
https://www.w3.org/2003/01/xhtml-mimetype/
2021-04-27 18:43:46 +02:00
Hiltjo Posthuma 609a0d3675 fix a comment code-style
This fix is very important *ahem*.
2021-04-24 19:50:22 +02:00
Hiltjo Posthuma 4e96b1f3f9 bump version to 0.9.22 2021-03-13 13:22:10 +01:00
Hiltjo Posthuma 99a8e4deeb sfeed_web.1, sfeed_xmlenc.1: remove unneeded mdoc escape sequence 2021-03-12 13:11:17 +01:00
Hiltjo Posthuma 317d08eee3 sfeed_update: return instead of exit in main() on success
This is useful so the script can be included, call main and then have
additional post-main functionality.
2021-03-03 18:12:34 +01:00
Hiltjo Posthuma ceefac3e91 README: workaround empty fields with *BSD xargs -0
Workaround it by setting the empty "middle" fields to some value. The last
field can be empty.

Some feeds were incorrectly using the wrong base URL if the `baseurl` field was
empty but the encoding field was set. So it incorrectly used the encoding field
instead.

Only now noticed some feeds were failing because the baseURL is validated since
commit f305b032bc and returning a non-zero exit
status.

This doesn't happen with GNU xargs, busybox or toybox xargs.
Affected (atleast): OpenBSD, NetBSD, FreeBSD and DragonFlyBSD xargs which share
similar code.

Simple way to reproduce the difference:

	printf 'a\0\0c\0' | xargs -0 echo

Prints "a c" on *BSD.
Prints "a  c" on GNU xargs (and some other implementations).
2021-03-02 13:13:19 +01:00
Hiltjo Posthuma f0e0326248 sfeed_update: fix baseurl substitution
Follow-up from a rushed commit:

commit 58555779d1
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date:   Sun Feb 28 13:33:21 2021 +0100

    sfeed_update: simplify, use feedurl directly

    This also make it possible to use non-authoritive URLs as a baseurl, like
    "magnet:" URLs.
2021-03-01 22:27:11 +01:00
Hiltjo Posthuma 16b7cc14e0 util.c: uri_makeabs: check initial base URI field, not dest `a` (style)
No functional difference because the base URI host is copied beforehand.
2021-03-01 18:50:43 +01:00
Hiltjo Posthuma fef85e3c39 sfeed.1: reference sfeed_update and sfeedrc
The shellscript is optional, but reference it in the documentation.
2021-03-01 18:41:27 +01:00
Hiltjo Posthuma 58555779d1 sfeed_update: simplify, use feedurl directly
This also make it possible to use non-authoritive URLs as a baseurl, like
"magnet:" URLs.
2021-03-01 18:41:27 +01:00
Hiltjo Posthuma f305b032bc util: improve/refactor URI parsing and formatting
Removed/rewritten the functions:
absuri, parseuri, and encodeuri() for percent-encoding.

The functions are now split separately with the following purpose:

- uri_format: format struct uri into a string.
- uri_hasscheme: quick check if a string is absolute or not.
- uri_makeabs: make a URI absolute using a base uri and the original URI.
- uri_parse: parse a string into a struct uri.

The following URLs are better parsed:

- URLs with extra "/"'s in the path prepended are kept as is, no "/" is added
  either for empty paths.
- URLs like "http://codemadness.org" are not changed to
  "http://codemadness.org/" anymore (paths are kept as is, unless they are
  non-empty and not start with "/").
- Paths are not percent-encoded anymore.
- URLs with userinfo field (username, password) are parsed.
  like: ftp://user:password@[2001:db8::7]:2121/rfc/rfc1808.txt
- Non-authoritive URLs like mailto:some@email.org, magnet URIs, ISBN URIs/urn,
  like: urn:isbn:0-395-36341-1 are allowed and parsed correctly.
- Both local (file:///) and non-local (file://) are supported.
- Specifying a base URL with a port will now only use it when the relative URL
  has no host and port set and follows RFC3986 5.2.2 more closely.
- Parsing numeric port: parse as signed long and check <= 0, empty port is
  allowed.
- Parsing URIs containing query, fragment, but no path separator (/) will now
  parse the component properly.

For sfeed:

- Parse the baseURI only once (no need to do it every time for making absolute
  URIs).
- If a link/enclosure is absolute already or if there is no base URL specified
  then just print the link directly. There have also been other small performance
  improvements related to handling URIs.

References:
- https://tools.ietf.org/html/rfc3986
  - Section "5.2.2. Transform References" have also been helpful.
2021-03-01 18:41:27 +01:00
Hiltjo Posthuma 30476d2230 README: combine bandwidth saving options into one section
Combine E-Tags, If-Modified-Since in one section. Also mention the curl
--compression option for typically GZIP decompression.

Note that E-Tags were broken in curl <7.73 due to a bug with "weak" e-tags.
https://github.com/curl/curl/issues/5610

From a question/feedback by e-mail from Hadrien Lacour, thanks.
2021-03-01 18:41:27 +01:00