blah blah blah generator for cosmic.voyage verse from prose
Go to file
terris Station 3d3a35a711 added mark8e.py for epic poetry lookalike 2019-11-20 14:21:02 -05:00
corpus/prose heebie jeebies 2019-11-14 23:50:15 -05:00
samples don't even know what changes i made. oops 2019-11-20 14:10:29 -05:00
mark8.py don't even know what changes i made. oops 2019-11-20 14:10:29 -05:00
mark8e.py added mark8e.py for epic poetry lookalike 2019-11-20 14:21:02 -05:00
markchainer.py heebie jeebies 2019-11-14 23:50:15 -05:00
readme.md heebie jeebies 2019-11-14 23:50:15 -05:00
sedtest.txt heebie jeebies 2019-11-14 23:50:15 -05:00
sedtest.txt.std heebie jeebies 2019-11-14 23:50:15 -05:00
stdtxt.sh heebie jeebies 2019-11-14 23:50:15 -05:00

readme.md

* * * * * * * * * * * * * * * * * * * * * * * *
*    ___                _   __  _____    ___  * 
*   / _ )_______ ____ _(_) /  |/  / /__ ( _ ) *
*  / _  / __/ _ `/ _ `/ / / /|_/ /  '_// _  | *
* /____/_/  \_,_/\_, /_/ /_/  /_/_/\_(_)___/  * 
*               /___/                         * 
* * * * * * * * * * * * * * * * * * * * * * * *

This is a project (work in progress) to generate verses that look like poetry (using markov chains) for the ship stjörnuvagn Bragi on https://cosmic.voyage/

It is presented here without text-sources or markdown chains because you can get the sources from project gutenberg like I did. And besides, I don't want to distribute gutenberg texts without the license verbiage (I had to remove it before generating the models). Support Project Gutenberg! Great old texts are not just for mining, they are also for reading. https://www.gutenberg.org/

Requirements

I did all this in a virtualenv, and installed the following packages with pip3:

Included

  • mark8.py - the main generator proof of concept
  • markchainer.py - generates models from text files already processed by:
  • stdtxt.sh - sed pipeline to clean up the text (numbers, blank lines, underscores, brackets)
  • samples/mark8test.txt - rough-looking samples produced by rough-looking code during debugging.

Procedure

  1. download some large textfiles from project gutenberg or from https://www.archive.org 1a. alternatively build your own large corpus through other means (web scraping, download corpora archives, etc.)
  2. trim each text files as needed, so they contain the kinds of things you want to generate text from
  3. use iconv or other means to make sure the texts are all of the same kind of encoding. (utf-8, ascii were tested)
  4. use stdtxt.sh on the main input files. this should produce something like inputfile.txt.std
  5. supernice python3 markchainer.py (this will look in './corpus/prose/' for *.std files, and generate a model for each. (will be found in './corpus/prose/chains' called something like inputfile.txt.std.mkdch )
  6. supernice python3 mark8.py >> output.txt
  • supernice is just a bash alias:
alias supernice='nice -n 19 ionice -c 3'

...to help reduce load on the server from running this toy. the markovify package (esp. when using nltk stuf) can consume a lot of resources (especially when combined with langauge-check/LanguageTool!) so the python scripts were slowed down even more using time.sleep().