# awk

an awk-ward (or not really) language.

some notes in process.

# projects

g2e is an opinionated gempub to epub converter written in awk:

=> https://tildegit.org/sejo/g2e g2e

my solutions for {advent of code 2021} have been written in awk.

# resources

=> https://www.gnu.org/software/gawk/manual/html_node/index.html gawk user's guide
=> https://www.tutorialspoint.com/awk/index.htm awk tutorial: tutorialspoint

# some built-in variables

* FS: input field separator (by default, space)
* RS: record field separator (by default, newline)
* NF: number of fields in current record
* NR: number of current record
* FNR: number of current record relative to current file
* OFS: output field separator (by default, space)
* ORS: output record separator (by default, newline)
* RSTART: index of the first position in the string matched by match( )
* RLENGTH: length of the string matched by match( )
* ARGV: array that stores the command-line arguments. index starts at 0
* ARGC: number of command-line provided arguments
* ARGIND: index in ARGV that is currently being processed (not necessarily compatible with all awks)

$0 represents the entire input record, and $n the nth field in the current record (starting to count from 1)

=> https://www.tutorialspoint.com/awk/awk_built_in_variables.htm awk tutorial: built-in variables

# some built-in functions

=> https://www.tutorialspoint.com/awk/awk_built_in_functions.htm awk tutorial: built-in functions
=> https://www.tutorialspoint.com/awk/awk_miscellaneous_functions.htm miscellaneous functions
=> https://www.tutorialspoint.com/awk/awk_string_functions.htm string functions

## strings

the index of the first character is 1!

* index( string, sub) : index of sub as a substring of string
* length( string )
* match( string, regex ) : index of the longest match of regex in string
* split( string, arr, regex ) : split string into array using regex as separator
* printf( format, expr-list)
* strtonum(string): useful to convert from hexadecimal (0x prefix) or octal (0 prefix)
* gsub( regex, sub, string): global substitution of regex with sub in string. if string is ommited, $0 is used
* sub(regex, sub, string): substitute regex with sub in string, once. if string is omitted, $0 is used
* substr(string, start, len): returns the substring from start index, with length len. if len is ommitted, it goes until the end of the string
* tolower( str )
* toupper( str )

## misc

* getline: read the next line. and other possibilities:
=> https://www.gnu.org/software/gawk/manual/html_node/Getline.html Explicit Input with getline

> The getline command is used in several different ways and should not be used by beginners.
(?)

* next: stops the current processing and start over with next line
* system: execute the specified command and returns its exit status
* delete: delete an element from an array, or an array

# gawk

=> https://www.gnu.org/software/gawk/manual/html_node/index.html gawk user's guide
=> https://www.gnu.org/software/gawk/manual/html_node/Bitwise-Functions.html bit-manipulation functions

note: to run in compatibility mode, without gnu extensions, use the -c or --traditional flag:

```
$ awk -c
```

# other notes

* apparently counters/accumulators don't need to be initialized at 0
* boolean values are 0 or 1

## record separation

records separated by empty lines can be extracted with:

```
RS = ""
```

without modifying FS, fields will be separated by any whitespace, including newlines.

gawk allows regexp in RS, traditional awk will only accept one character.

## field separation

one can get one character at a time by setting (in gawk?) :

```
FS = ""
```

## loop through the elements of an array

this approach might yield the results in different order depending on the awk implementation.

```
arr["a"] = 1
arr["b"] = 2

for(key in arr)
  print key ": " arr[key]
```