A public domain ebook catalog scraper

Go to file

sloum a3ee46c254 Initial commit		2024-03-26 19:51:53 -07:00
.gitignore	Initial commit	2024-03-26 19:51:53 -07:00
README.md	Initial commit	2024-03-26 19:51:53 -07:00
globalGrey.go	Initial commit	2024-03-26 19:51:53 -07:00
go.mod	Initial commit	2024-03-26 19:51:53 -07:00
go.sum	Initial commit	2024-03-26 19:51:53 -07:00
gutenberg.go	Initial commit	2024-03-26 19:51:53 -07:00
helpers.go	Initial commit	2024-03-26 19:51:53 -07:00
main.go	Initial commit	2024-03-26 19:51:53 -07:00
se.go	Initial commit	2024-03-26 19:51:53 -07:00
types.go	Initial commit	2024-03-26 19:51:53 -07:00

README.md

bookscrape

A scraper for Standard Ebooks, Project Gutenberg, and Global Grey Ebooks. It produces json output listing books in a format compatible with libman. The goal being to have a searchable, but modular, ebook manager (like a package manager, but for ebooks and their sources). That said, the json documents produced are flexible enough to be ingested and used by any number of other systems that wish to use these book catalogs.

Building

go build

go install

Running

bookscrape -se # fetch standard ebooks
bookscrape -gg # fetch global grey
bookscrape -pg # fetch project gutenberg
# There is also a convenient `-all` flag to do all of the above in one command

They will produce a json file each (even when -all is used). The sizes vary. Gutenberg is the largest file since their catalog is many times larger than the other two combined. However, Gutenberg is also the fastest to build since their website does not need to be crawled and scraped: they provide a CSV file, which this program ingests and modifies into the, much larger, json file.