Post about the ITSB NTSB custom feed

This commit is contained in:
~lucidiot 2023-08-10 09:19:13 +02:00
parent 7d8082e782
commit 938f3a80b5
1 changed files with 19 additions and 0 deletions

View File

@ -1260,5 +1260,24 @@
<p>As of posting this, the feeds unfortunately do not use <a href="https://www.rssboard.org/media-rss" target="_blank">Media RSS</a> or embed the image into the description as an <code>&lt;img&gt;</code> tag, so you will have to open each item in your browser to view the image with most feedreaders.</p>
]]></description>
</item>
<item>
<title>NTSB</title>
<pubDate>Thu, 10 Aug 2023 09:15:18 +0200</pubDate>
<guid isPermaLink="false">ntsb</guid>
<category domain="https://envs.net/~lucidiot/rsrsss/">Feed</category>
<link>https://tilde.town/~lucidiot/itsb/feeds/ntsb.xml</link>
<description><![CDATA[
<p>I already mentioned <a href="https://tilde.town/~lucidiot/itsb/" target="_blank">ITSB</a> multiple times in this feed, my project that generates hundreds of feeds for transport accident investigation reports. But this particular feed is worth a post in itself.</p>
<p>The NTSB is the one investigation agency I really must have in ITSB. It might just be the largest agency for transportation safety investigations worldwide, and anyone who ever watched a <em>Mayday</em> documentary or looked into plane crashes has heard of it. They produce the largest amount of reports out of all the agencies I found through ITSB.</p>
<p>Fortunately, they provided an official RSS feed for their released investigation reports. I'm using the past tense though, because they unfortunately decided to shut it down. The feeds were still available for a little while, but they would be completely empty. I have yet to see anyone ever sunsetting a feed properly, by adding a post to warn everyone for a few days before just killing the feed completely, so this issue went unnoticed for a while.</p>
<p>To generate a feed when there is no official one available, I usually just run <code>curl</code> on a webpage that lists investigation reports, then use <a href="https://github.com/ericchiang/pup" target="_blank">pup</a> to select some HTML elements and convert them to a JSON structure, then mess around with said JSON with <a href="https://stedolan.github.io/jq/" target="_blank">jq</a>, and finally convert that back into XML using <a href="https://pypi.org/project/xmltodict/" target="_blank">xmltodict</a>. But after looking around on the NTSB's website, I went for a much weirder method.</p>
<p>The NTSB provides a service called <a href="https://data.ntsb.gov/carol-main-public" target="_blank">CAROL</a>, a tool to search through all the investigation reports and safety recommendations the NTSB ever published. Getting a lot of structured data sounds a lot more interesting than having to parse the scant details I can get from unnecessarily complex HTML pages, so I wanted to use that as my source for my custom feed.</p>
<p>After a lot of experimenting, I ended up writing <a href="https://tildegit.org/lucidiot/itsb/src/branch/main/bin/ntsb-carol" target="_blank">a separate script</a> that exports 1 year of completed investigation reports as a large JSON file. I could have exported 10 or more years of reports, but that resulted in an extremely large RSS feed that would make most feedreaders blow up, so I only got one year.</p>
<p>I then use <a href="https://tildegit.org/lucidiot/itsb/src/branch/main/jq/ntsb.jq" target="_blank">a 671 lines long jq script</a> to process this JSON file into an RSS feed, including as much information as I can within the <code>&lt;description&gt;</code> so that you sometimes do not need to read the PDF report at all.</p>
<p>This mess results in a feed that is far, far better than any other feed I have in ITSB, especially any official feed. If every webmaster wants to remove RSS and replace it with newsletters, since that's what I gathered from my few attempts at reaching out to those agencies, maybe the real solution is to push for more open data instead. Let the people who know and use RSS make proper RSS feeds without scraping your website…</p>
]]></description>
</item>
</channel>
</rss>