Represent the Gutenberg catalog more faithfully...

Added a copy of the catalog to this distribution, after the project's
confirmation that it's in the public domain.  Added the Gutenberg identifier
to the book table.  Increased the lengths of book.title and author.name
to load a larger proportion of the catalog.  The tests now work with
bogus records which they clean up after themselves.
This commit is contained in:
barnold 2022-09-25 15:05:57 +01:00
parent 2e614f7db4
commit 8ccc2a80c4
12 changed files with 81317 additions and 55 deletions

13
INSTALL
View File

@ -4,17 +4,22 @@
psql
it connects and lets you run SQL. Then run
it connects and lets you run SQL. To create the tables, run
make -f Makefile.orig create-tables
make -f Makefile.orig test
1. To install the perl modules, run
To load (most of) the Gutenberg catalog, run
make -f Makefile.orig load-gutenberg
1. To install the perl modules, run (for example)
perl Makefile.PL INSTALL_BASE=~/.local
make test
make install
2. To use these modules from your program,
2. To use these modules from your program, assuming the INSTALL_BASE
suggested above,
declare -x PERL5LIB="$HOME/.local/lib/perl5"

View File

@ -1,7 +1,7 @@
.PHONY: drop-tables create-tables delete-rows test
.PHONY: drop-tables create-tables delete-rows load-gutenberg
drop-tables:
psql --file="drop-tables.sql"
psql --command="DROP TABLE IF EXISTS book, author;"
create-tables: drop-tables
psql --file="create-tables.sql"
@ -9,5 +9,5 @@ create-tables: drop-tables
delete-rows:
psql --command="TRUNCATE TABLE book, author;"
test: delete-rows
prove -v -Ilib
load-gutenberg: delete-rows
bin/load-gutenberg.pl pg_catalog.csv

31
README
View File

@ -1,16 +1,7 @@
This is to help me get acquainted with DBIx::Class, based on the
tutorial at
ABOUT
<https://metacpan.org/release/JROBINSON/DBIx-Class-Tutorial-0.0001/view/lib/DBIx/Class/Tutorial/Part1.pod#Getting-data>
plus a little testing using Test2::Suite.
Typical usage in development after making changes is
prove -l
which clears out the database tables early in the tests, leaving test
rows in place when it's finished.
This is a small project using DBIx::Class and Test2::Suite to provide
some of the gutenberg.org catalog in a relational database (PostgreSQL).
A script creates the tables with SQL, I didn't attempt to use DBIx for
that as yet. The SQL and the make-schema script are for postgres
@ -18,11 +9,17 @@ since that was expedient for me. It also assumes the simplest case
where you have a default database that postgres will connect you to if
you don't name one.
The tests run by prove leave a handful of rows in the tables. For a
larger number, suitable for demonstrating paging or searching, there
is a script in bin/. The load-gutenberg script loads from the catalog
file available at gutenberg.org, providing thousands of authors and
book titles. See the script comments for details.
The script bin/load-gutenberg.sh loads from the catalog file available
from gutenberg.org, providing thousands of authors and book titles.
Run it without arguments for help.
COPYING
A copy of the Gutenberg catalog is included as pg_catalog.csv. The
Gutenberg project release their catalogs into the public domain.
Otherwise, you may redistribute under the same terms as perl itself or
under the GPL V3 or a later version, at your option.
barnold <barnold@tilde.club>

View File

@ -43,7 +43,7 @@ my $dbh_csv = DBI->connect(
$fname => {
col_names => [
qw(
id type issued title lang authors subjects locc bookshelves
pgid type issued title lang authors subjects locc bookshelves
)
],
}
@ -75,6 +75,7 @@ $dbh_pg->do('TRUNCATE TABLE book, author');
# to creating a function to load a book-author pair.
$dbh_pg->do(<<EOF1);
CREATE OR REPLACE FUNCTION addbook (
IN i_gutenberg_id book.gutenberg_id\%TYPE,
IN i_title book.title\%TYPE,
IN i_auth author.name\%TYPE
) RETURNS book.id\%TYPE
@ -84,9 +85,9 @@ DECLARE
BEGIN
INSERT INTO author AS a (name) VALUES (i_auth)
ON CONFLICT (name) DO NOTHING;
INSERT INTO book AS b (title, author_id)
SELECT i_title, id FROM author WHERE name = i_auth
ON CONFLICT (title) DO NOTHING
INSERT INTO book AS b (gutenberg_id, title, author_id)
SELECT i_gutenberg_id, i_title, id FROM author WHERE name = i_auth
ON CONFLICT DO NOTHING
RETURNING b.id INTO bid;
RETURN bid;
END;\$\$
@ -94,29 +95,37 @@ LANGUAGE plpgsql
EOF1
# Statements for reading a row from the catalog and loading it into postgres.
my $sth_pg = $dbh_pg->prepare('SELECT addbook(?, ?)');
my $sth_csv = $dbh_csv->prepare('SELECT title, authors FROM pg_catalog');
my $sth_pg = $dbh_pg->prepare('SELECT addbook(?, ?, ?)');
my $sth_csv = $dbh_csv->prepare('SELECT pgid, title, authors FROM pg_catalog');
$sth_csv->execute();
# The load itself.
my ($rowcount, $loadcount) = (0, 0);
STDOUT->autoflush(1);
print("Dots should now appear as loading progresses: ");
print("Dots should now appear, per thousand books: ");
while (my $row = $sth_csv->fetchrow_arrayref) {
$rowcount++;
next if (1 == $rowcount); # It ignores skip_first_row so DIY.
my ($title, $auth) = @$row;
next if (length($title) > 30 || length($auth) > 30);
next if (length($title) < 1 || length($auth) < 1);
$sth_pg->execute($title, $auth);
$loadcount++;
print(".") if (0 == $loadcount % 1000);
my ($pgid, $title, $auth) = @$row;
# Some titles contain a carriage-return.
$title =~ s/\r//g;
# Some books have no author.
$auth = "[no author]" if (0 == length($auth));
# Discard what we deem excessively long.
if (length($title) > 128 || length($auth) > 64) {
next;
}
$sth_pg->execute($pgid, $title, $auth);
my ($bid) = $sth_pg->fetchrow_array;
if (defined $bid) {
print(".") if (0 == ++$loadcount % 1000);
}
}
printf("\n%d books loaded.\n", $loadcount);
printf("\n%d books loaded from %d records.\n", $loadcount, $rowcount);
# Cleanup.
$sth_pg->finish;
$dbh_pg->do('DROP FUNCTION addbook (VARCHAR, VARCHAR)');
$dbh_pg->do('DROP FUNCTION addbook (INTEGER, VARCHAR, VARCHAR)');
$dbh_pg->commit;
$dbh_pg->disconnect;

View File

@ -1,12 +1,23 @@
CREATE TABLE author
( id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
when_created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
name VARCHAR(30) NOT NULL CONSTRAINT author_name_unique UNIQUE
name VARCHAR(64) NOT NULL CONSTRAINT author_name_unique UNIQUE
);
CREATE TABLE book
( id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
author_id INTEGER NOT NULL CONSTRAINT book_author_fk REFERENCES author (id),
gutenberg_id INTEGER NOT NULL CONSTRAINT book_gutid_unique UNIQUE,
when_created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
title VARCHAR(30) NOT NULL CONSTRAINT book_title_unique UNIQUE
title VARCHAR(128) NOT NULL
);
COMMENT ON TABLE book IS 'A book present in the catalog of Project Gutenberg,
<https://www.gutenberg.org/>. Book titles are intentionally not unique,
not even within a given author (see for example "Paradise Lost" by John
Milton). In the catalog some titles have no author but for this table
the foreign key to author is required, so there is a bogus "[no author]"
row in the author table.';
CREATE INDEX book_authid ON book (author_id);
CREATE INDEX book_title ON book (title);

View File

@ -1,2 +0,0 @@
DROP TABLE IF EXISTS book;
DROP TABLE IF EXISTS author;

View File

@ -35,6 +35,129 @@ __PACKAGE__->table("author");
=head1 ACCESSORS
=head2 id
data_type: 'integer'
is_nullable: 0
=head2 when_created
data_type: 'timestamp with time zone'
default_value: current_timestamp
is_nullable: 0
=head2 name
data_type: 'varchar'
is_nullable: 0
size: 64
=cut
__PACKAGE__->add_columns(
"id",
{ data_type => "integer", is_nullable => 0 },
"when_created",
{
data_type => "timestamp with time zone",
default_value => \"current_timestamp",
is_nullable => 0,
},
"name",
{ data_type => "varchar", is_nullable => 0, size => 64 },
);
=head1 PRIMARY KEY
=over 4
=item * L</id>
=back
=cut
__PACKAGE__->set_primary_key("id");
=head1 UNIQUE CONSTRAINTS
=head2 C<author_name_unique>
=over 4
=item * L</name>
=back
=cut
__PACKAGE__->add_unique_constraint("author_name_unique", ["name"]);
=head1 RELATIONS
=head2 books
Type: has_many
Related object: L<Book::Schema::Result::Book>
=cut
__PACKAGE__->has_many(
"books",
"Book::Schema::Result::Book",
{ "foreign.author_id" => "self.id" },
{ cascade_copy => 0, cascade_delete => 0 },
);
# Created by DBIx::Class::Schema::Loader v0.07049 @ 2022-09-24 10:45:01
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:shJXsxbKrxkxGu7bwRS+rQ
# These lines were loaded from '/home/nick/.local/lib/perl5/Book/Schema/Result/Author.pm' found in @INC.
# They are now part of the custom portion of this file
# for you to hand-edit. If you do not either delete
# this section or remove that file from @INC, this section
# will be repeated redundantly when you re-create this
# file again via Loader! See skip_load_external to disable
# this feature.
use utf8;
package Book::Schema::Result::Author;
# Created by DBIx::Class::Schema::Loader
# DO NOT MODIFY THE FIRST PART OF THIS FILE
=head1 NAME
Book::Schema::Result::Author
=cut
use strict;
use warnings;
use base 'DBIx::Class::Core';
=head1 COMPONENTS LOADED
=over 4
=item * L<DBIx::Class::InflateColumn::DateTime>
=back
=cut
__PACKAGE__->load_components("InflateColumn::DateTime");
=head1 TABLE: C<author>
=cut
__PACKAGE__->table("author");
=head1 ACCESSORS
=head2 id
data_type: 'integer'
@ -115,3 +238,132 @@ __PACKAGE__->has_many(
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:Qk0Z7HCPs/wL7PMWKSIk0w
1;
# End of lines loaded from '/home/nick/.local/lib/perl5/Book/Schema/Result/Author.pm'
# These lines were loaded from '/home/nick/.local/lib/perl5/Book/Schema/Result/Author.pm' found in @INC.
# They are now part of the custom portion of this file
# for you to hand-edit. If you do not either delete
# this section or remove that file from @INC, this section
# will be repeated redundantly when you re-create this
# file again via Loader! See skip_load_external to disable
# this feature.
use utf8;
package Book::Schema::Result::Author;
# Created by DBIx::Class::Schema::Loader
# DO NOT MODIFY THE FIRST PART OF THIS FILE
=head1 NAME
Book::Schema::Result::Author
=cut
use strict;
use warnings;
use base 'DBIx::Class::Core';
=head1 COMPONENTS LOADED
=over 4
=item * L<DBIx::Class::InflateColumn::DateTime>
=back
=cut
__PACKAGE__->load_components("InflateColumn::DateTime");
=head1 TABLE: C<author>
=cut
__PACKAGE__->table("author");
=head1 ACCESSORS
=head2 id
data_type: 'integer'
is_nullable: 0
=head2 when_created
data_type: 'timestamp with time zone'
default_value: current_timestamp
is_nullable: 0
=head2 name
data_type: 'varchar'
is_nullable: 0
size: 30
=cut
__PACKAGE__->add_columns(
"id",
{ data_type => "integer", is_nullable => 0 },
"when_created",
{
data_type => "timestamp with time zone",
default_value => \"current_timestamp",
is_nullable => 0,
},
"name",
{ data_type => "varchar", is_nullable => 0, size => 30 },
);
=head1 PRIMARY KEY
=over 4
=item * L</id>
=back
=cut
__PACKAGE__->set_primary_key("id");
=head1 UNIQUE CONSTRAINTS
=head2 C<author_name_unique>
=over 4
=item * L</name>
=back
=cut
__PACKAGE__->add_unique_constraint("author_name_unique", ["name"]);
=head1 RELATIONS
=head2 books
Type: has_many
Related object: L<Book::Schema::Result::Book>
=cut
__PACKAGE__->has_many(
"books",
"Book::Schema::Result::Book",
{ "foreign.author_id" => "self.id" },
{ cascade_copy => 0, cascade_delete => 0 },
);
# Created by DBIx::Class::Schema::Loader v0.07049 @ 2022-09-18 10:43:18
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:Qk0Z7HCPs/wL7PMWKSIk0w
1;
# End of lines loaded from '/home/nick/.local/lib/perl5/Book/Schema/Result/Author.pm'
1;

View File

@ -35,6 +35,144 @@ __PACKAGE__->table("book");
=head1 ACCESSORS
=head2 id
data_type: 'integer'
is_nullable: 0
=head2 author_id
data_type: 'integer'
is_foreign_key: 1
is_nullable: 0
=head2 gutenberg_id
data_type: 'integer'
is_nullable: 0
=head2 when_created
data_type: 'timestamp with time zone'
default_value: current_timestamp
is_nullable: 0
=head2 title
data_type: 'varchar'
is_nullable: 0
size: 128
=cut
__PACKAGE__->add_columns(
"id",
{ data_type => "integer", is_nullable => 0 },
"author_id",
{ data_type => "integer", is_foreign_key => 1, is_nullable => 0 },
"gutenberg_id",
{ data_type => "integer", is_nullable => 0 },
"when_created",
{
data_type => "timestamp with time zone",
default_value => \"current_timestamp",
is_nullable => 0,
},
"title",
{ data_type => "varchar", is_nullable => 0, size => 128 },
);
=head1 PRIMARY KEY
=over 4
=item * L</id>
=back
=cut
__PACKAGE__->set_primary_key("id");
=head1 UNIQUE CONSTRAINTS
=head2 C<book_gutid_unique>
=over 4
=item * L</gutenberg_id>
=back
=cut
__PACKAGE__->add_unique_constraint("book_gutid_unique", ["gutenberg_id"]);
=head1 RELATIONS
=head2 author
Type: belongs_to
Related object: L<Book::Schema::Result::Author>
=cut
__PACKAGE__->belongs_to(
"author",
"Book::Schema::Result::Author",
{ id => "author_id" },
{ is_deferrable => 0, on_delete => "NO ACTION", on_update => "NO ACTION" },
);
# Created by DBIx::Class::Schema::Loader v0.07049 @ 2022-09-24 10:45:01
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:jn2J3I4s9utgM2u0qwDp3Q
# These lines were loaded from '/home/nick/.local/lib/perl5/Book/Schema/Result/Book.pm' found in @INC.
# They are now part of the custom portion of this file
# for you to hand-edit. If you do not either delete
# this section or remove that file from @INC, this section
# will be repeated redundantly when you re-create this
# file again via Loader! See skip_load_external to disable
# this feature.
use utf8;
package Book::Schema::Result::Book;
# Created by DBIx::Class::Schema::Loader
# DO NOT MODIFY THE FIRST PART OF THIS FILE
=head1 NAME
Book::Schema::Result::Book
=cut
use strict;
use warnings;
use base 'DBIx::Class::Core';
=head1 COMPONENTS LOADED
=over 4
=item * L<DBIx::Class::InflateColumn::DateTime>
=back
=cut
__PACKAGE__->load_components("InflateColumn::DateTime");
=head1 TABLE: C<book>
=cut
__PACKAGE__->table("book");
=head1 ACCESSORS
=head2 id
data_type: 'integer'
@ -123,3 +261,140 @@ __PACKAGE__->belongs_to(
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:FYMjLIaXS+mX/Rlzefb8UQ
1;
# End of lines loaded from '/home/nick/.local/lib/perl5/Book/Schema/Result/Book.pm'
# These lines were loaded from '/home/nick/.local/lib/perl5/Book/Schema/Result/Book.pm' found in @INC.
# They are now part of the custom portion of this file
# for you to hand-edit. If you do not either delete
# this section or remove that file from @INC, this section
# will be repeated redundantly when you re-create this
# file again via Loader! See skip_load_external to disable
# this feature.
use utf8;
package Book::Schema::Result::Book;
# Created by DBIx::Class::Schema::Loader
# DO NOT MODIFY THE FIRST PART OF THIS FILE
=head1 NAME
Book::Schema::Result::Book
=cut
use strict;
use warnings;
use base 'DBIx::Class::Core';
=head1 COMPONENTS LOADED
=over 4
=item * L<DBIx::Class::InflateColumn::DateTime>
=back
=cut
__PACKAGE__->load_components("InflateColumn::DateTime");
=head1 TABLE: C<book>
=cut
__PACKAGE__->table("book");
=head1 ACCESSORS
=head2 id
data_type: 'integer'
is_nullable: 0
=head2 author_id
data_type: 'integer'
is_foreign_key: 1
is_nullable: 0
=head2 when_created
data_type: 'timestamp with time zone'
default_value: current_timestamp
is_nullable: 0
=head2 title
data_type: 'varchar'
is_nullable: 0
size: 30
=cut
__PACKAGE__->add_columns(
"id",
{ data_type => "integer", is_nullable => 0 },
"author_id",
{ data_type => "integer", is_foreign_key => 1, is_nullable => 0 },
"when_created",
{
data_type => "timestamp with time zone",
default_value => \"current_timestamp",
is_nullable => 0,
},
"title",
{ data_type => "varchar", is_nullable => 0, size => 30 },
);
=head1 PRIMARY KEY
=over 4
=item * L</id>
=back
=cut
__PACKAGE__->set_primary_key("id");
=head1 UNIQUE CONSTRAINTS
=head2 C<book_title_unique>
=over 4
=item * L</title>
=back
=cut
__PACKAGE__->add_unique_constraint("book_title_unique", ["title"]);
=head1 RELATIONS
=head2 author
Type: belongs_to
Related object: L<Book::Schema::Result::Author>
=cut
__PACKAGE__->belongs_to(
"author",
"Book::Schema::Result::Author",
{ id => "author_id" },
{ is_deferrable => 0, on_delete => "NO ACTION", on_update => "NO ACTION" },
);
# Created by DBIx::Class::Schema::Loader v0.07049 @ 2022-09-18 10:43:18
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:FYMjLIaXS+mX/Rlzefb8UQ
1;
# End of lines loaded from '/home/nick/.local/lib/perl5/Book/Schema/Result/Book.pm'
1;

80694
pg_catalog.csv Normal file

File diff suppressed because it is too large Load Diff

View File

@ -22,10 +22,6 @@ isa_ok(
"Acquired DBIx::Class::ResultSet for Author."
);
# Reset before further tests.
$schema->resultset('Book')->delete_all;
$rset_author->delete_all;
# Create an author to do things with.
my $tolstoy0 = $rset_author->create({ name => "Tolstoy" });
isa_ok(

View File

@ -56,13 +56,19 @@ isa_ok(
diag("Done with context tests.");
is($austen_books->count, 0, "count() returns zero, no books yet.");
my $persuasion = $austen_books->create({ title => "Persuasion" });
# Use negative gutenberg_ids to avoid collisions.
my $gid = -1;
my $persuasion = $austen_books->create({
gutenberg_id => $gid--, title => "Persuasion",
});
isa_ok(
$persuasion, ["Book::Schema::Result::Book"],
"Created a Book::Schema::Result::Book for Austen."
);
$austen_books->create({ title => "Emma" });
$austen_books->create({
gutenberg_id => $gid--, title => "Emma",
});
is($austen_books->count, 2, "Austen wrote two books now.");
my @books = ($austen->books);
@ -76,9 +82,15 @@ isa_ok($books[0], ["Book::Schema::Result::Book"],
# Time for Dickens to write something.
my $dickens = $rset_author->find({ name => "Charles Dickens" });
my $dickens_books = $dickens->books_rs; # preferring _rs for its consistency.
$dickens_books->create({ title => "Oliver Twist" });
$dickens_books->create({ title => "A Christmas Carol" });
$dickens_books->create({ title => "Great Expectations" });
$dickens_books->create({
gutenberg_id => $gid--, title => "Oliver Twist"
});
$dickens_books->create({
gutenberg_id => $gid--, title => "A Christmas Carol",
});
$dickens_books->create({
gutenberg_id => $gid--, title => "Great Expectations",
});
is($dickens_books->count, 3, "Dickens has 3 books.");
# Retrieve a book and its author together using prefetch.
@ -89,7 +101,7 @@ isa_ok(
);
# Now we need a second resultset, returned by a search on the first.
my $rset_oliver_twist = $rset_book->search(
{ "me.title" => "Oliver Twist" },
{ gutenberg_id => { '<' => 0 }, "me.title" => "Oliver Twist" },
{ prefetch => [ "author" ] },
);
isa_ok(
@ -112,7 +124,9 @@ is(
# Try getting books whose title contains the letter "e". Also use the
# _rs variant of search.
my $rset_e = $rset_book->search_rs(
{ "me.title" => { ilike => '%e%' } }, # case-insensitive
{ "me.title" => { ilike => '%e%' }, # case-insensitive
gutenberg_id => { '<' => 0 },
},
{ prefetch => [ "author" ],
order_by => [ "me.title" ],
},

View File

@ -39,4 +39,15 @@ like(
"Error on insert 'violates unique constraint'."
);
# Delete the test books.
my $rset_book = $schema->resultset('Book');
my $rset_tbooks = $rset_book->search_rs({ gutenberg_id => { '<' => 0 } });
$rset_tbooks->delete;
# Delete the test authors.
my $rset_tauthors = $schema->resultset('Author')->search(
{ name => { 'in' => [ "Charles Dickens", "Jane Austen" ] } },
);
$rset_tauthors->delete;
done_testing;