andinus

grus

Grus is a simple word unjumbler written in Go https://andinus.nand.sh/grus/

Go to file

Andinus cd94bf2bdd Add GRUS_PRINT_PATH variable & bump version to v0.2.1		2020-04-08 15:43:11 +05:30
lexical	Add tests for Sort and SlowSort	2020-04-07 01:34:17 +05:30
scripts	Add install script	2020-04-08 05:10:28 +05:30
LICENSE	Initial commit	2020-04-06 21:53:07 +05:30
README.org	Add GRUS_PRINT_PATH variable & bump version to v0.2.1	2020-04-08 15:43:11 +05:30
go.mod	Prepare for next rewrite	2020-04-08 01:44:15 +05:30
go.sum	Initial rewrite of grus	2020-04-08 04:37:48 +05:30
grus.go	Add GRUS_PRINT_PATH variable & bump version to v0.2.1	2020-04-08 15:43:11 +05:30
main_openbsd.go	Initial rewrite of grus	2020-04-08 04:37:48 +05:30
main_other.go	Initial rewrite of grus	2020-04-08 04:37:48 +05:30

README.org

Grus

Documentation
- Default Dictionaries
- Examples
Installation
- Pre-built binaries
  - v0.2.0
- Post install
History

Grus is a simple word unjumbler written in Go.

Project Home	Grus
Source Code	Andinus / Grus
GitHub (Mirror)	Grus - GitHub

Tested on:

OpenBSD 6.6 (with pledge & unveil)

Documentation

Demo Video	System Information
Grus v0.2.0	OpenBSD 6.6 (with pledge & unveil)

Grus stops the search as soon as it unjumbles the word, so no anagrams are returned & maybe all dictionaries were not searched. However, this behaviour can be changed with two environment variables documented below.

Note: If grus couldn't unjumble the word with first dictionary then it'll search in next dictionary, search stops once the word gets unjumbled.

Environment variable	Explanation
`GRUS_SEARCH_ALL`	Search in all dictionaries
`GRUS_ANAGRAMS`	Print all anagrams
`GRUS_PRINT_PATH` `(v0.2.1+)`	Print dictionary path before words

Set these environment variable to 1 / true to change behaviour.

Default Dictionaries

These files will be checked by default (in order).

/usr/local/share/dict/words
/usr/local/share/dict/web2
/usr/share/dict/words
/usr/share/dict/web2
/usr/share/dict/special/4bsd
/usr/share/dict/special/math

Examples

# unjumble word
grus word

# print all anagrams
GRUS_ANAGRAMS=true grus word

# search for word in all dictionaries
GRUS_SEARCH_ALL=true grus word

# search for word in custom dictionaries too
grus word /path/to/dict1 /path/to/dict2

# search for word in all dictionaries
GRUS_SEARCH_ALL=1 grus word /path/to/dict1 /path/to/dict2

# search for word in all dictionaries & print all anagrams
GRUS_SEARCH_ALL=1 GRUS_ANAGRAMS=1 grus word

# print path to dictionary
GRUS_PRINT_PATH=1 grus word

Installation

Pre-built binaries

Pre-built binaries are available for OpenBSD, FreeBSD, NetBSD, DragonFly BSD, Linux & macOS.

This will just print the steps to install grus & you have to run those commands manually. Piping directly to sh is not a good idea, don't run this unless you understand what you're doing.

v0.2.0

curl -s https://tildegit.org/andinus/grus/raw/tag/v0.2.0/scripts/install.sh | sh

Post install

You need to have a dictionary for grus to work, if you don't have one then you can download the Webster's Second International Dictionary, all 234,936 words worth. The 1934 copyright has lapsed.

curl -L -o /usr/local/share/dict/web2 \
     https://archive.org/download/grus-v0.2.0/web2

There is also another big dictionary with around half a million english words. I'm not allowed to distribute it, you can get it directly from GitHub.

curl -o /usr/local/share/dict/words \
     https://raw.githubusercontent.com/dwyl/english-words/master/words.txt

History

Initial version of Grus was just a simple shell script that used the slowest method of unjumbling words, it checked every permutation of the word with all words in the file with same length.

Later I rewrote the above logic in python, I wanted to use a better method. Next version used logic similar to the current one. It still had to iterate through all the words in the file but it eliminated lots of cases very quickly so it was faster. It first used the length check then it used this little thing to match the words.

import collections

match = lambda s1, s2: collections.Counter(s1) == collections.Counter(s2)

I don't understand how it works but it's fast, faster than convert the string to list & sorting the list. Actually I did that initially & you'll still find it in grus-add script.

lexical = ''.join(sorted(word))
if word == lexical:
    print(word)

This is equivalent to lexical.SlowSort in current version.

package lexical

import (
	"sort"
	"strings"
)

// SlowSort returns string in lexical order. This function is slower
// than Lexical.
func SlowSort(word string) (sorted string) {
	// Convert word to a slice, sort the slice.
	t := strings.Split(word, "")
	sort.Strings(t)

	sorted = strings.Join(t, "")
	return
}

Next version was also in python & it was stupid, for some reason using a database didn't cross my mind then. It sorted the word & then created a file with name as lexical order of that word (if word is "test" then filename would be "estt"), and it appended the word to that file.

It took user input & sorted the word, then it just had to print the file (if word is "test" then it had to print "estt"). This was a lot faster than iterating through all the words but we had to prepare the files before we could do this.

This was very stupid because the dictionary I was using had around 1/2 million words so this meant we got around half a million files, actually less than that because anagrams got appended into a single file but it was still a lot of small files. Handling that many small files is stupid.

I don't have previous versions of this program. I decided to rewrite this in Go, this version does things differently & is faster than all previous versions. Currently we first sort the word in lexical order, we do that by converting the string to []rune & sorting it, this is faster than lexical.SlowSort. lexical.SlowSort converts the string to []string & sorts it.

package lexical

import "sort"

// Sort takes a string as input and returns the lexical order.
func Sort(word string) (sorted string) {
	// Convert the string to []rune.
	var r []rune
	for _, char := range word {
		r = append(r, char)
	}

	sort.Slice(r, func(i, j int) bool {
		return r[i] < r[j]
	})

	sorted = string(r)
	return
}

Instead of creating lots of small files, entries are stored in a sqlite3 database.

This was true till v0.1.0, v0.2.0 was rewritten & it dropped the use of database or any form of pre-parsing the dictionary. Instead it would look through each line of dictionary & unjumble the word, while this may be slower than previous version but this is simpler.