club-gemini-capsule/content/2022-09-13-scrapping-public...

26 lines
882 B
Plaintext

---
title: trying to detect new content on tilde.club
date: 2022-09-13
---
I didn't update yesterday but I tried to parse gemini folder of user of
this tilde server and did a small bash scripts which find all *.gmi files which
has been updated recently.
I discarded home folder with default gmi files and belive I found an interesting result.
Unfortunately it discovers unpublished pages: draft page.
So if I want to do something interesting I have to discover links from
/home/<user>/public_gemini/index.gmi and browse like do a search bot.
I've swtiched to python to do some algorithm but it is more complex than it
looks. Identifying links with gemini syntax is easy but I have to manage if the
link is in user base path.
Then having all the links and their update time I could generate a page with
latest updates. Let's see if it is not too difficult / time consuming.