bookscrape/README.md

# bookscrape

A scraper for Standard Ebooks, Project Gutenberg, and Global Grey Ebooks. It produces json output listing books in a format compatible with [libman](https://tildegit.org/sloum/libman). The goal being to have a searchable, but modular, ebook manager (like a package manager, but for ebooks and their sources). That said, the json documents produced are flexible enough to be ingested and used by any number of other systems that wish to use these book catalogs.

## Building

```sh
go build
```

or

```sh
go install
```

## Running

```sh
bookscrape -se # fetch standard ebooks
bookscrape -gg # fetch global grey
bookscrape -pg # fetch project gutenberg
# There is also a convenient `-all` flag to do all of the above in one command
```

They will produce a json file each (even when `-all` is used). The sizes vary. Gutenberg is the largest file since their catalog is many times larger than the other two combined. However, Gutenberg is also the fastest to build since their website does not need to be crawled and scraped: they provide a CSV file, which this program ingests and modifies into the, much larger, json file.
Initial commit 2024-03-27 02:51:53 +00:00			`# bookscrape`

			`A scraper for Standard Ebooks, Project Gutenberg, and Global Grey Ebooks. It produces json output listing books in a format compatible with [libman](https://tildegit.org/sloum/libman). The goal being to have a searchable, but modular, ebook manager (like a package manager, but for ebooks and their sources). That said, the json documents produced are flexible enough to be ingested and used by any number of other systems that wish to use these book catalogs.`

			`## Building`

			```sh
			`go build`
			```

			`or`

			```sh
			`go install`
			```

			`## Running`

			```sh
			`bookscrape -se # fetch standard ebooks`
			`bookscrape -gg # fetch global grey`
			`bookscrape -pg # fetch project gutenberg`
			# There is also a convenient `-all` flag to do all of the above in one command
			```

			They will produce a json file each (even when `-all` is used). The sizes vary. Gutenberg is the largest file since their catalog is many times larger than the other two combined. However, Gutenberg is also the fastest to build since their website does not need to be crawled and scraped: they provide a CSV file, which this program ingests and modifies into the, much larger, json file.