ATSB feed requires rewrite #119

Closed
opened 2022-11-19 07:25:55 +00:00 by lucidiot · 1 comment
Owner

The safety investigation reports page that the ATSB custom feed was using has been removed in favor of three different report pages: aviation, rail, marine. Both the marine and rail investigation pages however do allow removing the "mode of transportation" filter using ?field_mode_of_transport_target_id=All, so we should be able to rewrite the feed and also add new feeds for separate means of transportation.

Note that the ATSB now hides itself behind AkamaiGHost. From the very little documentation I can find online it appears to just be a CDN. From testing with curl, it seems it only works over HTTP/2, and it returns HTTP 403 if any of the following conditions are met:

  • Accept-Encoding header is not set
  • Accept-Language header does not match the case-sensitive regex ^[a-z][a-z]([-,]|$)
  • Cache-Control header is not set
  • User-Agent header is not set to something that contains a slash
  • User-Agent header is set to Chrome/ or Firefox/ followed by a version number that is missing a minor, such as Chrome/100 (having any number after the minor such as 100.0.0.0.0.0 is accepted)
  • User-Agent header is set to Chrome/ followed by a version number that is below 89.0
  • User-Agent header is set to Firefox/ followed by a version number that is below 71.0
  • User-Agent header contains wget or curl
The [safety investigation reports](https://www.atsb.gov.au/publications/safety-investigation-reports/?s=1&sort=OccurrenceDate&sortAscending=descending&investigationStatus=&occurrenceClass=&typeOfOperation=&initialTab=&investigationStatus=Completed,Discontinued) page that the ATSB custom feed was using has been removed in favor of three different report pages: [aviation](https://www.atsb.gov.au/aviation-investigation-reports), [rail](https://www.atsb.gov.au/rail-investigation-reports), [marine](https://www.atsb.gov.au/marine-investigation-reports). Both the marine and rail investigation pages however do allow removing the "mode of transportation" filter using `?field_mode_of_transport_target_id=All`, so we should be able to rewrite the feed and also add new feeds for separate means of transportation. Note that the ATSB now hides itself behind AkamaiGHost. From the very little documentation I can find online it appears to just be a CDN. From testing with `curl`, it seems it only works over HTTP/2, and it returns HTTP 403 if any of the following conditions are met: * `Accept-Encoding` header is not set * `Accept-Language` header does not match the case-sensitive regex `^[a-z][a-z]([-,]|$)` * `Cache-Control` header is not set * `User-Agent` header is not set to something that contains a slash * `User-Agent` header is set to `Chrome/` or `Firefox/` followed by a version number that is missing a minor, such as `Chrome/100` (having any number after the minor such as `100.0.0.0.0.0` is accepted) * `User-Agent` header is set to `Chrome/` followed by a version number that is below 89.0 * `User-Agent` header is set to `Firefox/` followed by a version number that is below 71.0 * `User-Agent` header contains `wget` or `curl`
lucidiot added the
bug
feed
labels 2022-11-19 07:25:55 +00:00
lucidiot self-assigned this 2022-11-19 07:25:55 +00:00
Author
Owner

curl + pup command that could be used to get both the discontinued and completed investigations (just like what we are currently doing):

curl 'https://www.atsb.gov.au/marine-investigation-reports?field_investigation_status_target_id={163,168}&field_mode_of_transport_target_id=All' \
-H 'Accept-Encoding: lol' \
-H 'User-Agent: a/1' \
-H 'Accept-Language: zz,' \
-H 'Cache-Control: lol' | pup 'table.views-table tbody tr'
curl + pup command that could be used to get both the discontinued and completed investigations (just like what we are currently doing): ``` curl 'https://www.atsb.gov.au/marine-investigation-reports?field_investigation_status_target_id={163,168}&field_mode_of_transport_target_id=All' \ -H 'Accept-Encoding: lol' \ -H 'User-Agent: a/1' \ -H 'Accept-Language: zz,' \ -H 'Cache-Control: lol' | pup 'table.views-table tbody tr' ```
Sign in to join this conversation.
No description provided.