1170 lines
32 KiB
Plaintext
1170 lines
32 KiB
Plaintext
sfeed
|
|
-----
|
|
|
|
RSS and Atom parser (and some format programs).
|
|
|
|
It converts RSS or Atom feeds from XML to a TAB-separated file. There are
|
|
formatting programs included to convert this TAB-separated format to various
|
|
other formats. There are also some programs and scripts included to import and
|
|
export OPML and to fetch, filter, merge and order feed items.
|
|
|
|
|
|
Build and install
|
|
-----------------
|
|
|
|
$ make
|
|
# make install
|
|
|
|
|
|
To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:
|
|
|
|
$ make SFEED_CURSES=""
|
|
# make SFEED_CURSES="" install
|
|
|
|
|
|
To change the theme for sfeed_curses you can set SFEED_THEME. See the themes/
|
|
directory for the theme names.
|
|
|
|
$ make SFEED_THEME="templeos"
|
|
# make SFEED_THEME="templeos" install
|
|
|
|
|
|
Usage
|
|
-----
|
|
|
|
Initial setup:
|
|
|
|
mkdir -p "$HOME/.sfeed/feeds"
|
|
cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
|
|
|
|
Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
|
|
is included and evaluated as a shellscript for sfeed_update, so it's functions
|
|
and behaviour can be overridden:
|
|
|
|
$EDITOR "$HOME/.sfeed/sfeedrc"
|
|
|
|
or you can import existing OPML subscriptions using sfeed_opml_import(1):
|
|
|
|
sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"
|
|
|
|
an example to export from an other RSS/Atom reader called newsboat and import
|
|
for sfeed_update:
|
|
|
|
newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
|
|
|
|
an example to export from an other RSS/Atom reader called rss2email (3.x+) and
|
|
import for sfeed_update:
|
|
|
|
r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
|
|
|
|
Update feeds, this script merges the new items, see sfeed_update(1) for more
|
|
information what it can do:
|
|
|
|
sfeed_update
|
|
|
|
Format feeds:
|
|
|
|
Plain-text list:
|
|
|
|
sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"
|
|
|
|
HTML view (no frames), copy style.css for a default style:
|
|
|
|
cp style.css "$HOME/.sfeed/style.css"
|
|
sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"
|
|
|
|
HTML view with the menu as frames, copy style.css for a default style:
|
|
|
|
mkdir -p "$HOME/.sfeed/frames"
|
|
cp style.css "$HOME/.sfeed/frames/style.css"
|
|
cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
|
|
|
|
To automatically update your feeds periodically and format them in a way you
|
|
like you can make a wrapper script and add it as a cronjob.
|
|
|
|
Most protocols are supported because curl(1) is used by default and also proxy
|
|
settings from the environment (such as the $http_proxy environment variable)
|
|
are used.
|
|
|
|
The sfeed(1) program itself is just a parser that parses XML data from stdin
|
|
and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS,
|
|
Gopher, SSH, etc.
|
|
|
|
See the section "Usage and examples" below and the man-pages for more
|
|
information how to use sfeed(1) and the additional tools.
|
|
|
|
|
|
Dependencies
|
|
------------
|
|
|
|
- C compiler (C99).
|
|
- libc (recommended: C99 and POSIX >= 200809).
|
|
|
|
|
|
Optional dependencies
|
|
---------------------
|
|
|
|
- POSIX make(1) for the Makefile.
|
|
- POSIX sh(1),
|
|
used by sfeed_update(1) and sfeed_opml_export(1).
|
|
- POSIX utilities such as awk(1) and sort(1),
|
|
used by sfeed_content(1), sfeed_markread(1) and sfeed_update(1).
|
|
- curl(1) binary: https://curl.haxx.se/ ,
|
|
used by sfeed_update(1), but can be replaced with any tool like wget(1),
|
|
OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
|
|
- iconv(1) command-line utilities,
|
|
used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
|
|
encoded then you don't need this. For a minimal iconv implementation:
|
|
https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
|
|
- mandoc for documentation: https://mdocml.bsd.lv/
|
|
- curses (typically ncurses), otherwise see minicurses.h,
|
|
used by sfeed_curses(1).
|
|
- a terminal (emulator) supporting UTF-8 and the used capabilities,
|
|
used by sfeed_curses(1).
|
|
|
|
|
|
Optional run-time dependencies for sfeed_curses
|
|
-----------------------------------------------
|
|
|
|
- xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
|
|
- xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it.
|
|
- awk, used by the sfeed_content and sfeed_markread script.
|
|
See the ENVIRONMENT VARIABLES section in the man page to change it.
|
|
- lynx, used by the sfeed_content script to convert HTML content.
|
|
See the ENVIRONMENT VARIABLES section in the man page to change it.
|
|
|
|
|
|
Formats supported
|
|
-----------------
|
|
|
|
sfeed supports a subset of XML 1.0 and a subset of:
|
|
|
|
- Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
|
|
- Atom 0.3 (draft, historic).
|
|
- RSS 0.91+.
|
|
- RDF (when used with RSS).
|
|
- MediaRSS extensions (media:).
|
|
- Dublin Core extensions (dc:).
|
|
|
|
Other formats like JSONfeed, twtxt or certain RSS/Atom extensions can be
|
|
supported by converting them to RSS/Atom or to the sfeed(5) format directly.
|
|
|
|
|
|
OS tested
|
|
---------
|
|
|
|
- Linux,
|
|
compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc,
|
|
libc: glibc, musl.
|
|
- OpenBSD (clang, gcc).
|
|
- NetBSD (with NetBSD curses).
|
|
- FreeBSD
|
|
- DragonFlyBSD
|
|
- GNU/Hurd
|
|
- Illumos (OpenIndiana).
|
|
- Windows (cygwin gcc + mintty, mingw).
|
|
- HaikuOS
|
|
- SerenityOS
|
|
- FreeDOS (djgpp).
|
|
- FUZIX (sdcc -mz80, with the sfeed parser program).
|
|
|
|
|
|
Architectures tested
|
|
--------------------
|
|
|
|
amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.
|
|
|
|
|
|
Files
|
|
-----
|
|
|
|
sfeed - Read XML RSS or Atom feed data from stdin. Write feed data
|
|
in TAB-separated format to stdout.
|
|
sfeed_atom - Format feed data (TSV) to an Atom feed.
|
|
sfeed_content - View item content, for use with sfeed_curses.
|
|
sfeed_curses - Format feed data (TSV) to a curses interface.
|
|
sfeed_frames - Format feed data (TSV) to HTML file(s) with frames.
|
|
sfeed_gopher - Format feed data (TSV) to Gopher files.
|
|
sfeed_html - Format feed data (TSV) to HTML.
|
|
sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
|
|
sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
|
|
sfeed_markread - Mark items as read/unread, for use with sfeed_curses.
|
|
sfeed_mbox - Format feed data (TSV) to mbox.
|
|
sfeed_plain - Format feed data (TSV) to a plain-text list.
|
|
sfeed_twtxt - Format feed data (TSV) to a twtxt feed.
|
|
sfeed_update - Update feeds and merge items.
|
|
sfeed_web - Find URLs to RSS/Atom feed from a webpage.
|
|
sfeed_xmlenc - Detect character-set encoding from a XML stream.
|
|
sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
|
|
style.css - Example stylesheet to use with sfeed_html(1) and
|
|
sfeed_frames(1).
|
|
|
|
|
|
Files read at runtime by sfeed_update(1)
|
|
----------------------------------------
|
|
|
|
sfeedrc - Config file. This file is evaluated as a shellscript in
|
|
sfeed_update(1).
|
|
|
|
At least the following functions can be overridden per feed:
|
|
|
|
- fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
|
|
- filter: to filter on fields.
|
|
- merge: to change the merge logic.
|
|
- order: to change the sort order.
|
|
|
|
See also the sfeedrc(5) man page documentation for more details.
|
|
|
|
The feeds() function is called to process the feeds. The default feed()
|
|
function is executed concurrently as a background job in your sfeedrc(5) config
|
|
file to make updating faster. The variable maxjobs can be changed to limit or
|
|
increase the amount of concurrent jobs (8 by default).
|
|
|
|
|
|
Files written at runtime by sfeed_update(1)
|
|
-------------------------------------------
|
|
|
|
feedname - TAB-separated format containing all items per feed. The
|
|
sfeed_update(1) script merges new items with this file.
|
|
The format is documented in sfeed(5).
|
|
|
|
|
|
File format
|
|
-----------
|
|
|
|
man 5 sfeed
|
|
man 5 sfeedrc
|
|
man 1 sfeed
|
|
|
|
|
|
Usage and examples
|
|
------------------
|
|
|
|
Find RSS/Atom feed URLs from a webpage:
|
|
|
|
url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"
|
|
|
|
output example:
|
|
|
|
https://codemadness.org/atom.xml application/atom+xml
|
|
https://codemadness.org/atom_content.xml application/atom+xml
|
|
|
|
- - -
|
|
|
|
Make sure your sfeedrc config file exists, see sfeedrc.example. To update your
|
|
feeds (configfile argument is optional):
|
|
|
|
sfeed_update "configfile"
|
|
|
|
Format the feeds files:
|
|
|
|
# Plain-text list.
|
|
sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
|
|
# HTML view (no frames), copy style.css for a default style.
|
|
sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
|
|
# HTML view with the menu as frames, copy style.css for a default style.
|
|
mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*
|
|
|
|
View formatted output in your browser:
|
|
|
|
$BROWSER "$HOME/.sfeed/feeds.html"
|
|
|
|
View formatted output in your editor:
|
|
|
|
$EDITOR "$HOME/.sfeed/feeds.txt"
|
|
|
|
- - -
|
|
|
|
View formatted output in a curses interface. The interface has a look inspired
|
|
by the mutt mail client. It has a sidebar panel for the feeds, a panel with a
|
|
listing of the items and a small statusbar for the selected item/URL. Some
|
|
functions like searching and scrolling are integrated in the interface itself.
|
|
|
|
Just like the other format programs included in sfeed you can run it like this:
|
|
|
|
sfeed_curses ~/.sfeed/feeds/*
|
|
|
|
... or by reading from stdin:
|
|
|
|
sfeed_curses < ~/.sfeed/feeds/xkcd
|
|
|
|
By default sfeed_curses marks the items of the last day as new/bold. To manage
|
|
read/unread items in a different way a plain-text file with a list of the read
|
|
URLs can be used. To enable this behaviour the path to this file can be
|
|
specified by setting the environment variable $SFEED_URL_FILE to the URL file:
|
|
|
|
export SFEED_URL_FILE="$HOME/.sfeed/urls"
|
|
[ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
|
|
sfeed_curses ~/.sfeed/feeds/*
|
|
|
|
It then uses the shellscript "sfeed_markread" to process the read and unread
|
|
items.
|
|
|
|
- - -
|
|
|
|
Example script to view feed items in a vertical list/menu in dmenu(1). It opens
|
|
the selected URL in the browser set in $BROWSER:
|
|
|
|
#!/bin/sh
|
|
url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
|
|
sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
|
|
test -n "${url}" && $BROWSER "${url}"
|
|
|
|
dmenu can be found at: https://git.suckless.org/dmenu/
|
|
|
|
- - -
|
|
|
|
Generate a sfeedrc config file from your exported list of feeds in OPML
|
|
format:
|
|
|
|
sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc
|
|
|
|
- - -
|
|
|
|
Export an OPML file of your feeds from a sfeedrc config file (configfile
|
|
argument is optional):
|
|
|
|
sfeed_opml_export configfile > myfeeds.opml
|
|
|
|
- - -
|
|
|
|
The filter function can be overridden in your sfeedrc file. This allows
|
|
filtering items per feed. It can be used to shorten URLs, filter away
|
|
advertisements, strip tracking parameters and more.
|
|
|
|
# filter fields.
|
|
# filter(name)
|
|
filter() {
|
|
case "$1" in
|
|
"tweakers")
|
|
awk -F '\t' 'BEGIN { OFS = "\t"; }
|
|
# skip ads.
|
|
$2 ~ /^ADV:/ {
|
|
next;
|
|
}
|
|
# shorten link.
|
|
{
|
|
if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
|
|
$3 = substr($3, RSTART, RLENGTH);
|
|
}
|
|
print $0;
|
|
}';;
|
|
"yt BSDNow")
|
|
# filter only BSD Now from channel.
|
|
awk -F '\t' '$2 ~ / \| BSD Now/';;
|
|
*)
|
|
cat;;
|
|
esac | \
|
|
# replace youtube links with embed links.
|
|
sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \
|
|
|
|
awk -F '\t' 'BEGIN { OFS = "\t"; }
|
|
function filterlink(s) {
|
|
# protocol must start with http, https or gopher.
|
|
if (match(s, /^(http|https|gopher):\/\//) == 0) {
|
|
return "";
|
|
}
|
|
|
|
# shorten feedburner links.
|
|
if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) {
|
|
s = substr($3, RSTART, RLENGTH);
|
|
}
|
|
|
|
# strip tracking parameters
|
|
# urchin, facebook, piwik, webtrekk and generic.
|
|
gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s);
|
|
gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s);
|
|
|
|
gsub(/\?&/, "?", s);
|
|
gsub(/[\?&]+$/, "", s);
|
|
|
|
return s
|
|
}
|
|
{
|
|
$3 = filterlink($3); # link
|
|
$8 = filterlink($8); # enclosure
|
|
|
|
# try to remove tracking pixels: <img/> tags with 1px width or height.
|
|
gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4);
|
|
|
|
print $0;
|
|
}'
|
|
}
|
|
|
|
- - -
|
|
|
|
Aggregate feeds. This filters new entries (maximum one day old) and sorts them
|
|
by newest first. Prefix the feed name in the title. Convert the TSV output data
|
|
to an Atom XML feed (again):
|
|
|
|
#!/bin/sh
|
|
cd ~/.sfeed/feeds/ || exit 1
|
|
|
|
awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
|
|
BEGIN { OFS = "\t"; }
|
|
int($1) >= old {
|
|
$2 = "[" FILENAME "] " $2;
|
|
print $0;
|
|
}' * | \
|
|
sort -k1,1rn | \
|
|
sfeed_atom
|
|
|
|
- - -
|
|
|
|
To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and
|
|
showing them as plain-text per line similar to sfeed_plain(1):
|
|
|
|
Create a FIFO:
|
|
|
|
fifo="/tmp/sfeed_fifo"
|
|
mkfifo "$fifo"
|
|
|
|
On the reading side:
|
|
|
|
# This keeps track of unique lines so might consume much memory.
|
|
# It tries to reopen the $fifo after 1 second if it fails.
|
|
while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'
|
|
|
|
On the writing side:
|
|
|
|
feedsdir="$HOME/.sfeed/feeds/"
|
|
cd "$feedsdir" || exit 1
|
|
test -p "$fifo" || exit 1
|
|
|
|
# 1 day is old news, don't write older items.
|
|
awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
|
|
BEGIN { OFS = "\t"; }
|
|
int($1) >= old {
|
|
$2 = "[" FILENAME "] " $2;
|
|
print $0;
|
|
}' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"
|
|
|
|
cut -b is used to trim the "N " prefix of sfeed_plain(1).
|
|
|
|
- - -
|
|
|
|
For some podcast feed the following code can be used to filter the latest
|
|
enclosure URL (probably some audio file):
|
|
|
|
awk -F '\t' 'BEGIN { latest = 0; }
|
|
length($8) {
|
|
ts = int($1);
|
|
if (ts > latest) {
|
|
url = $8;
|
|
latest = ts;
|
|
}
|
|
}
|
|
END { if (length(url)) { print url; } }'
|
|
|
|
... or on a file already sorted from newest to oldest:
|
|
|
|
awk -F '\t' '$8 { print $8; exit }'
|
|
|
|
- - -
|
|
|
|
Over time your feeds file might become quite big. You can archive items of a
|
|
feed from (roughly) the last week by doing for example:
|
|
|
|
awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new
|
|
mv feed feed.bak
|
|
mv feed.new feed
|
|
|
|
This could also be run weekly in a crontab to archive the feeds. Like throwing
|
|
away old newspapers. It keeps the feeds list tidy and the formatted output
|
|
small.
|
|
|
|
- - -
|
|
|
|
Convert mbox to separate maildirs per feed and filter duplicate messages using the
|
|
fdm program.
|
|
fdm is available at: https://github.com/nicm/fdm
|
|
|
|
fdm config file (~/.sfeed/fdm.conf):
|
|
|
|
set unmatched-mail keep
|
|
|
|
account "sfeed" mbox "%[home]/.sfeed/mbox"
|
|
$cachepath = "%[home]/.sfeed/fdm.cache"
|
|
cache "${cachepath}"
|
|
$maildir = "%[home]/feeds/"
|
|
|
|
# Check if message is in the cache by Message-ID.
|
|
match case "^Message-ID: (.*)" in headers
|
|
action {
|
|
tag "msgid" value "%1"
|
|
}
|
|
continue
|
|
|
|
# If it is in the cache, stop.
|
|
match matched and in-cache "${cachepath}" key "%[msgid]"
|
|
action {
|
|
keep
|
|
}
|
|
|
|
# Not in the cache, process it and add to cache.
|
|
match case "^X-Feedname: (.*)" in headers
|
|
action {
|
|
# Store to local maildir.
|
|
maildir "${maildir}%1"
|
|
|
|
add-to-cache "${cachepath}" key "%[msgid]"
|
|
keep
|
|
}
|
|
|
|
Now run:
|
|
|
|
$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
|
|
$ fdm -f ~/.sfeed/fdm.conf fetch
|
|
|
|
Now you can view feeds in mutt(1) for example.
|
|
|
|
- - -
|
|
|
|
Read from mbox and filter duplicate messages using the fdm program and deliver
|
|
it to a SMTP server. This works similar to the rss2email program.
|
|
fdm is available at: https://github.com/nicm/fdm
|
|
|
|
fdm config file (~/.sfeed/fdm.conf):
|
|
|
|
set unmatched-mail keep
|
|
|
|
account "sfeed" mbox "%[home]/.sfeed/mbox"
|
|
$cachepath = "%[home]/.sfeed/fdm.cache"
|
|
cache "${cachepath}"
|
|
|
|
# Check if message is in the cache by Message-ID.
|
|
match case "^Message-ID: (.*)" in headers
|
|
action {
|
|
tag "msgid" value "%1"
|
|
}
|
|
continue
|
|
|
|
# If it is in the cache, stop.
|
|
match matched and in-cache "${cachepath}" key "%[msgid]"
|
|
action {
|
|
keep
|
|
}
|
|
|
|
# Not in the cache, process it and add to cache.
|
|
match case "^X-Feedname: (.*)" in headers
|
|
action {
|
|
# Connect to a SMTP server and attempt to deliver the
|
|
# mail to it.
|
|
# Of course change the server and e-mail below.
|
|
smtp server "codemadness.org" to "hiltjo@codemadness.org"
|
|
|
|
add-to-cache "${cachepath}" key "%[msgid]"
|
|
keep
|
|
}
|
|
|
|
Now run:
|
|
|
|
$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
|
|
$ fdm -f ~/.sfeed/fdm.conf fetch
|
|
|
|
Now you can view feeds in mutt(1) for example.
|
|
|
|
- - -
|
|
|
|
Convert mbox to separate maildirs per feed and filter duplicate messages using
|
|
procmail(1).
|
|
|
|
procmail_maildirs.sh file:
|
|
|
|
maildir="$HOME/feeds"
|
|
feedsdir="$HOME/.sfeed/feeds"
|
|
procmailconfig="$HOME/.sfeed/procmailrc"
|
|
|
|
# message-id cache to prevent duplicates.
|
|
mkdir -p "${maildir}/.cache"
|
|
|
|
if ! test -r "${procmailconfig}"; then
|
|
printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2
|
|
echo "See procmailrc.example for an example." >&2
|
|
exit 1
|
|
fi
|
|
|
|
find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
|
|
name=$(basename "${d}")
|
|
mkdir -p "${maildir}/${name}/cur"
|
|
mkdir -p "${maildir}/${name}/new"
|
|
mkdir -p "${maildir}/${name}/tmp"
|
|
printf 'Mailbox %s\n' "${name}"
|
|
sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}"
|
|
done
|
|
|
|
Procmailrc(5) file:
|
|
|
|
# Example for use with sfeed_mbox(1).
|
|
# The header X-Feedname is used to split into separate maildirs. It is
|
|
# assumed this name is sane.
|
|
|
|
MAILDIR="$HOME/feeds/"
|
|
|
|
:0
|
|
* ^X-Feedname: \/.*
|
|
{
|
|
FEED="$MATCH"
|
|
|
|
:0 Wh: "msgid_$FEED.lock"
|
|
| formail -D 1024000 ".cache/msgid_$FEED.cache"
|
|
|
|
:0
|
|
"$FEED"/
|
|
}
|
|
|
|
Now run:
|
|
|
|
$ procmail_maildirs.sh
|
|
|
|
Now you can view feeds in mutt(1) for example.
|
|
|
|
- - -
|
|
|
|
The fetch function can be overridden in your sfeedrc file. This allows to
|
|
replace the default curl(1) for sfeed_update with any other client to fetch the
|
|
RSS/Atom data or change the default curl options:
|
|
|
|
# fetch a feed via HTTP/HTTPS etc.
|
|
# fetch(name, url, feedfile)
|
|
fetch() {
|
|
hurl -m 1048576 -t 15 "$2" 2>/dev/null
|
|
}
|
|
|
|
- - -
|
|
|
|
Caching, incremental data updates and bandwidth-saving
|
|
|
|
For servers that support it some incremental updates and bandwidth-saving can
|
|
be done by using the "ETag" HTTP header.
|
|
|
|
Create a directory for storing the ETags per feed:
|
|
|
|
mkdir -p ~/.sfeed/etags/
|
|
|
|
The curl ETag options (--etag-save and --etag-compare) can be used to store and
|
|
send the previous ETag header value. curl version 7.73+ is recommended for it
|
|
to work properly.
|
|
|
|
The curl -z option can be used to send the modification date of a local file as
|
|
a HTTP "If-Modified-Since" request header. The server can then respond if the
|
|
data is modified or not or respond with only the incremental data.
|
|
|
|
The curl --compressed option can be used to indicate the client supports
|
|
decompression. Because RSS/Atom feeds are textual XML content this generally
|
|
compresses very well.
|
|
|
|
These options can be set by overriding the fetch() function in the sfeedrc
|
|
file:
|
|
|
|
# fetch(name, url, feedfile)
|
|
fetch() {
|
|
etag="$HOME/.sfeed/etags/$(basename "$3")"
|
|
curl \
|
|
-L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \
|
|
--compressed \
|
|
--etag-save "${etag}" --etag-compare "${etag}" \
|
|
-z "${etag}" \
|
|
"$2" 2>/dev/null
|
|
}
|
|
|
|
These options can come at a cost of some privacy, because it exposes
|
|
additional metadata from the previous request.
|
|
|
|
- - -
|
|
|
|
CDNs blocking requests due to a missing HTTP User-Agent request header
|
|
|
|
sfeed_update will not send the "User-Agent" header by default for privacy
|
|
reasons. Some CDNs like Cloudflare don't like this and will block such HTTP
|
|
requests.
|
|
|
|
A custom User-Agent can be set by using the curl -H option, like so:
|
|
|
|
curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
|
|
|
|
The above example string pretends to be a Windows 10 (x86-64) machine running
|
|
Firefox 78.
|
|
|
|
- - -
|
|
|
|
Page redirects
|
|
|
|
For security and efficiency reasons by default redirects are not allowed and
|
|
are treated as an error.
|
|
|
|
For example to prevent hijacking an unencrypted http:// to https:// redirect or
|
|
to not add time of an unnecessary page redirect each time. It is encouraged to
|
|
use the final redirected URL in the sfeedrc config file.
|
|
|
|
If you want to ignore this advise you can override the fetch() function in the
|
|
sfeedrc file and change the curl options "-L --max-redirs 0".
|
|
|
|
- - -
|
|
|
|
Shellscript to update feeds in parallel more efficiently using xargs -P.
|
|
|
|
It creates a queue of the feeds with its settings, then uses xargs to process
|
|
them in parallel using the common, but non-POSIX -P option. This is more
|
|
efficient than the more portable solution in sfeed_update which can stall a
|
|
batch of $maxjobs in the queue if one item is slow.
|
|
|
|
sfeed_update_xargs shellscript:
|
|
|
|
#!/bin/sh
|
|
# update feeds, merge with old feeds using xargs in parallel mode (non-POSIX).
|
|
|
|
# include script and reuse its functions, but do not start main().
|
|
SFEED_UPDATE_INCLUDE="1" . sfeed_update
|
|
# load config file, sets $config.
|
|
loadconfig "$1"
|
|
|
|
# process a single feed.
|
|
# args are: config, tmpdir, name, feedurl, basesiteurl, encoding
|
|
if [ "${SFEED_UPDATE_CHILD}" = "1" ]; then
|
|
sfeedtmpdir="$2"
|
|
_feed "$3" "$4" "$5" "$6"
|
|
exit $?
|
|
fi
|
|
|
|
# ...else parent mode:
|
|
|
|
# feed(name, feedurl, basesiteurl, encoding)
|
|
feed() {
|
|
# workaround: *BSD xargs doesn't handle empty fields in the middle.
|
|
name="${1:-$$}"
|
|
feedurl="${2:-http://}"
|
|
basesiteurl="${3:-${feedurl}}"
|
|
encoding="$4"
|
|
|
|
printf '%s\0%s\0%s\0%s\0%s\0%s\0' "${config}" "${sfeedtmpdir}" \
|
|
"${name}" "${feedurl}" "${basesiteurl}" "${encoding}"
|
|
}
|
|
|
|
# fetch feeds and store in temporary directory.
|
|
sfeedtmpdir="$(mktemp -d '/tmp/sfeed_XXXXXX')"
|
|
mkdir -p "${sfeedtmpdir}/feeds"
|
|
touch "${sfeedtmpdir}/ok"
|
|
# make sure path exists.
|
|
mkdir -p "${sfeedpath}"
|
|
# print feeds for parallel processing with xargs.
|
|
feeds | SFEED_UPDATE_CHILD="1" xargs -r -0 -P "${maxjobs}" -L 6 "$(readlink -f "$0")"
|
|
status=$?
|
|
# check error exit status indicator for parallel jobs.
|
|
test -f "${sfeedtmpdir}/ok" || status=1
|
|
# cleanup temporary files etc.
|
|
cleanup
|
|
exit ${status}
|
|
|
|
- - -
|
|
|
|
Shellscript to handle URLs and enclosures in parallel using xargs -P.
|
|
|
|
This can be used to download and process URLs for downloading podcasts,
|
|
webcomics, download and convert webpages, mirror videos, etc. It uses a
|
|
plain-text cache file for remembering processed URLs. The match patterns are
|
|
defined in the shellscript fetch() function and in the awk script and can be
|
|
modified to handle items differently depending on their context.
|
|
|
|
The arguments for the script are files in the sfeed(5) format. If no file
|
|
arguments are specified then the data is read from stdin.
|
|
|
|
#!/bin/sh
|
|
# sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
|
|
# Dependencies: awk, curl, flock, xargs (-P), youtube-dl.
|
|
|
|
cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
|
|
jobs="${SFEED_JOBS:-4}"
|
|
lockfile="${HOME}/.sfeed/sfeed_download.lock"
|
|
|
|
# log(feedname, s, status)
|
|
log() {
|
|
if [ "$1" != "-" ]; then
|
|
s="[$1] $2"
|
|
else
|
|
s="$2"
|
|
fi
|
|
printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3"
|
|
}
|
|
|
|
# fetch(url, feedname)
|
|
fetch() {
|
|
case "$1" in
|
|
*youtube.com*)
|
|
youtube-dl "$1";;
|
|
*.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
|
|
# allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
|
|
curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
|
|
esac
|
|
}
|
|
|
|
# downloader(url, title, feedname)
|
|
downloader() {
|
|
url="$1"
|
|
title="$2"
|
|
feedname="${3##*/}"
|
|
|
|
msg="${title}: ${url}"
|
|
|
|
# download directory.
|
|
if [ "${feedname}" != "-" ]; then
|
|
mkdir -p "${feedname}"
|
|
if ! cd "${feedname}"; then
|
|
log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2
|
|
return 1
|
|
fi
|
|
fi
|
|
|
|
log "${feedname}" "${msg}" "START"
|
|
if fetch "${url}" "${feedname}"; then
|
|
log "${feedname}" "${msg}" "OK"
|
|
|
|
# append it safely in parallel to the cachefile on a
|
|
# successful download.
|
|
(flock 9 || exit 1
|
|
printf '%s\n' "${url}" >> "${cachefile}"
|
|
) 9>"${lockfile}"
|
|
else
|
|
log "${feedname}" "${msg}" "FAIL" >&2
|
|
return 1
|
|
fi
|
|
return 0
|
|
}
|
|
|
|
if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
|
|
# Downloader helper for parallel downloading.
|
|
# Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
|
|
# It should write the URI to the cachefile if it is successful.
|
|
downloader "$1" "$2" "$3"
|
|
exit $?
|
|
fi
|
|
|
|
# ...else parent mode:
|
|
|
|
tmp=$(mktemp)
|
|
trap "rm -f ${tmp}" EXIT
|
|
|
|
[ -f "${cachefile}" ] || touch "${cachefile}"
|
|
cat "${cachefile}" > "${tmp}"
|
|
echo >> "${tmp}" # force it to have one line for awk.
|
|
|
|
LC_ALL=C awk -F '\t' '
|
|
# fast prefilter what to download or not.
|
|
function filter(url, field, feedname) {
|
|
u = tolower(url);
|
|
return (match(u, "youtube\\.com") ||
|
|
match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$"));
|
|
}
|
|
function download(url, field, title, filename) {
|
|
if (!length(url) || urls[url] || !filter(url, field, filename))
|
|
return;
|
|
# NUL-separated for xargs -0.
|
|
printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
|
|
urls[url] = 1; # print once
|
|
}
|
|
{
|
|
FILENR += (FNR == 1);
|
|
}
|
|
# lookup table from cachefile which contains downloaded URLs.
|
|
FILENR == 1 {
|
|
urls[$0] = 1;
|
|
}
|
|
# feed file(s).
|
|
FILENR != 1 {
|
|
download($3, 3, $2, FILENAME); # link
|
|
download($8, 8, $2, FILENAME); # enclosure
|
|
}
|
|
' "${tmp}" "${@:--}" | \
|
|
SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")"
|
|
|
|
- - -
|
|
|
|
Shellscript to export existing newsboat cached items from sqlite3 to the sfeed
|
|
TSV format.
|
|
|
|
#!/bin/sh
|
|
# Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format.
|
|
# The data is split per file per feed with the name of the newsboat title/url.
|
|
# It writes the URLs of the read items line by line to a "urls" file.
|
|
#
|
|
# Dependencies: sqlite3, awk.
|
|
#
|
|
# Usage: create some directory to store the feeds then run this script.
|
|
|
|
# newsboat cache.db file.
|
|
cachefile="$HOME/.newsboat/cache.db"
|
|
test -n "$1" && cachefile="$1"
|
|
|
|
# dump data.
|
|
# .mode ascii: Columns/rows delimited by 0x1F and 0x1E
|
|
# get the first fields in the order of the sfeed(5) format.
|
|
sqlite3 "$cachefile" <<!EOF |
|
|
.headers off
|
|
.mode ascii
|
|
.output
|
|
SELECT
|
|
i.pubDate, i.title, i.url, i.content, i.content_mime_type,
|
|
i.guid, i.author, i.enclosure_url,
|
|
f.rssurl AS rssurl, f.title AS feedtitle, i.unread
|
|
-- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base
|
|
FROM rss_feed f
|
|
INNER JOIN rss_item i ON i.feedurl = f.rssurl
|
|
ORDER BY
|
|
i.feedurl ASC, i.pubDate DESC;
|
|
.quit
|
|
!EOF
|
|
# convert to sfeed(5) TSV format.
|
|
LC_ALL=C awk '
|
|
BEGIN {
|
|
FS = "\x1f";
|
|
RS = "\x1e";
|
|
}
|
|
# normal non-content fields.
|
|
function field(s) {
|
|
gsub("^[[:space:]]*", "", s);
|
|
gsub("[[:space:]]*$", "", s);
|
|
gsub("[[:space:]]", " ", s);
|
|
gsub("[[:cntrl:]]", "", s);
|
|
return s;
|
|
}
|
|
# content field.
|
|
function content(s) {
|
|
gsub("^[[:space:]]*", "", s);
|
|
gsub("[[:space:]]*$", "", s);
|
|
# escape chars in content field.
|
|
gsub("\\\\", "\\\\", s);
|
|
gsub("\n", "\\n", s);
|
|
gsub("\t", "\\t", s);
|
|
return s;
|
|
}
|
|
function feedname(feedurl, feedtitle) {
|
|
if (feedtitle == "") {
|
|
gsub("/", "_", feedurl);
|
|
return feedurl;
|
|
}
|
|
gsub("/", "_", feedtitle);
|
|
return feedtitle;
|
|
}
|
|
{
|
|
fname = feedname($9, $10);
|
|
if (!feed[fname]++) {
|
|
print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr";
|
|
}
|
|
|
|
contenttype = field($5);
|
|
if (contenttype == "")
|
|
contenttype = "html";
|
|
else if (index(contenttype, "/html") || index(contenttype, "/xhtml"))
|
|
contenttype = "html";
|
|
else
|
|
contenttype = "plain";
|
|
|
|
print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \
|
|
contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \
|
|
> fname;
|
|
|
|
# write URLs of the read items to a file line by line.
|
|
if ($11 == "0") {
|
|
print $3 > "urls";
|
|
}
|
|
}'
|
|
|
|
- - -
|
|
|
|
Progress indicator
|
|
------------------
|
|
|
|
The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc
|
|
config. It then calls sfeed_update and pipes the output lines to a function
|
|
that counts the current progress. It writes the total progress to stderr.
|
|
Alternative: pv -l -s totallines
|
|
|
|
#!/bin/sh
|
|
# Progress indicator script.
|
|
|
|
# Pass lines as input to stdin and write progress status to stderr.
|
|
# progress(totallines)
|
|
progress() {
|
|
total="$(($1 + 0))" # must be a number, no divide by zero.
|
|
test "${total}" -le 0 -o "$1" != "${total}" && return
|
|
LC_ALL=C awk -v "total=${total}" '
|
|
{
|
|
counter++;
|
|
percent = (counter * 100) / total;
|
|
printf("\033[K") > "/dev/stderr"; # clear EOL
|
|
print $0;
|
|
printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr";
|
|
fflush(); # flush all buffers per line.
|
|
}
|
|
END {
|
|
printf("\033[K") > "/dev/stderr";
|
|
}'
|
|
}
|
|
|
|
# Counts the feeds from the sfeedrc config.
|
|
countfeeds() {
|
|
count=0
|
|
. "$1"
|
|
feed() {
|
|
count=$((count + 1))
|
|
}
|
|
feeds
|
|
echo "${count}"
|
|
}
|
|
|
|
config="${1:-$HOME/.sfeed/sfeedrc}"
|
|
total=$(countfeeds "${config}")
|
|
sfeed_update "${config}" 2>&1 | progress "${total}"
|
|
|
|
- - -
|
|
|
|
Counting unread and total items
|
|
-------------------------------
|
|
|
|
It can be useful to show the counts of unread items, for example in a
|
|
windowmanager or statusbar.
|
|
|
|
The below example script counts the items of the last day in the same way the
|
|
formatting tools do:
|
|
|
|
#!/bin/sh
|
|
# Count the new items of the last day.
|
|
LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
|
|
{
|
|
total++;
|
|
}
|
|
int($1) >= old {
|
|
totalnew++;
|
|
}
|
|
END {
|
|
print "New: " totalnew;
|
|
print "Total: " total;
|
|
}' ~/.sfeed/feeds/*
|
|
|
|
The below example script counts the unread items using the sfeed_curses URL
|
|
file:
|
|
|
|
#!/bin/sh
|
|
# Count the unread and total items from feeds using the URL file.
|
|
LC_ALL=C awk -F '\t' '
|
|
# URL file: amount of fields is 1.
|
|
NF == 1 {
|
|
u[$0] = 1; # lookup table of URLs.
|
|
next;
|
|
}
|
|
# feed file: check by URL or id.
|
|
{
|
|
total++;
|
|
if (length($3)) {
|
|
if (u[$3])
|
|
read++;
|
|
} else if (length($6)) {
|
|
if (u[$6])
|
|
read++;
|
|
}
|
|
}
|
|
END {
|
|
print "Unread: " (total - read);
|
|
print "Total: " total;
|
|
}' ~/.sfeed/urls ~/.sfeed/feeds/*
|
|
|
|
- - -
|
|
|
|
Running custom commands inside the sfeed_curses program
|
|
-------------------------------------------------------
|
|
|
|
Running commands inside the sfeed_curses program can be useful for example to
|
|
sync items or mark all items across all feeds as read. It can be comfortable to
|
|
have a keybind for this inside the program to perform a scripted action and
|
|
then reload the feeds by sending the signal SIGHUP.
|
|
|
|
In the input handling code you can then add a case:
|
|
|
|
case 'M':
|
|
forkexec((char *[]) { "markallread.sh", NULL }, 0);
|
|
break;
|
|
|
|
or
|
|
|
|
case 'S':
|
|
forkexec((char *[]) { "syncnews.sh", NULL }, 1);
|
|
break;
|
|
|
|
The specified script should be in $PATH or be an absolute path.
|
|
|
|
Example of a `markallread.sh` shellscript to mark all URLs as read:
|
|
|
|
#!/bin/sh
|
|
# mark all items/URLs as read.
|
|
tmp=$(mktemp)
|
|
(cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
|
|
awk '!x[$0]++' > "$tmp" &&
|
|
mv "$tmp" ~/.sfeed/urls &&
|
|
pkill -SIGHUP sfeed_curses # reload feeds.
|
|
|
|
Example of a `syncnews.sh` shellscript to update the feeds and reload them:
|
|
|
|
#!/bin/sh
|
|
sfeed_update
|
|
pkill -SIGHUP sfeed_curses
|
|
|
|
|
|
Running programs in a new session
|
|
---------------------------------
|
|
|
|
By default processes are spawned in the same session and process group as
|
|
sfeed_curses. When sfeed_curses is closed this can also close the spawned
|
|
process in some cases.
|
|
|
|
When the setsid command-line program is available the following wrapper command
|
|
can be used to run the program in a new session, for a plumb program:
|
|
|
|
setsid -f xdg-open "$@"
|
|
|
|
Alternatively the code can be changed to call setsid() before execvp().
|
|
|
|
|
|
Open an URL directly in the same terminal
|
|
-----------------------------------------
|
|
|
|
To open an URL directly in the same terminal using the text-mode lynx browser:
|
|
|
|
SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/*
|
|
|
|
|
|
Yank to tmux buffer
|
|
-------------------
|
|
|
|
This changes the yank command to set the tmux buffer, instead of X11 xclip:
|
|
|
|
SFEED_YANKER="tmux set-buffer \`cat\`"
|
|
|
|
|
|
Known terminal issues
|
|
---------------------
|
|
|
|
Below lists some bugs or missing features in terminals that are found while
|
|
testing sfeed_curses. Some of them might be fixed already upstream:
|
|
|
|
- cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for
|
|
scrolling.
|
|
- HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the
|
|
middle-button, right-button is incorrect / reversed.
|
|
- putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
|
|
window title.
|
|
- Mouse button encoding for extended buttons (like side-buttons) in some
|
|
terminals are unsupported or map to the same button: for example side-buttons 7
|
|
and 8 map to the scroll buttons 4 and 5 in urxvt.
|
|
|
|
|
|
License
|
|
-------
|
|
|
|
ISC, see LICENSE file.
|
|
|
|
|
|
Author
|
|
------
|
|
|
|
Hiltjo Posthuma <hiltjo@codemadness.org>
|