A public domain ebook catalog scraper
Go to file
sloum a3ee46c254 Initial commit 2024-03-26 19:51:53 -07:00
.gitignore Initial commit 2024-03-26 19:51:53 -07:00
README.md Initial commit 2024-03-26 19:51:53 -07:00
globalGrey.go Initial commit 2024-03-26 19:51:53 -07:00
go.mod Initial commit 2024-03-26 19:51:53 -07:00
go.sum Initial commit 2024-03-26 19:51:53 -07:00
gutenberg.go Initial commit 2024-03-26 19:51:53 -07:00
helpers.go Initial commit 2024-03-26 19:51:53 -07:00
main.go Initial commit 2024-03-26 19:51:53 -07:00
se.go Initial commit 2024-03-26 19:51:53 -07:00
types.go Initial commit 2024-03-26 19:51:53 -07:00

README.md

bookscrape

A scraper for Standard Ebooks, Project Gutenberg, and Global Grey Ebooks. It produces json output listing books in a format compatible with libman. The goal being to have a searchable, but modular, ebook manager (like a package manager, but for ebooks and their sources). That said, the json documents produced are flexible enough to be ingested and used by any number of other systems that wish to use these book catalogs.

Building

go build

or

go install

Running

bookscrape -se # fetch standard ebooks
bookscrape -gg # fetch global grey
bookscrape -pg # fetch project gutenberg
# There is also a convenient `-all` flag to do all of the above in one command

They will produce a json file each (even when -all is used). The sizes vary. Gutenberg is the largest file since their catalog is many times larger than the other two combined. However, Gutenberg is also the fastest to build since their website does not need to be crawled and scraped: they provide a CSV file, which this program ingests and modifies into the, much larger, json file.