compudanzas/src/awk.gmo

120 lines
3.8 KiB
Plaintext

# awk
an awk-ward (or not really) language.
some notes in process.
# projects
g2e is an opinionated gempub to epub converter written in awk:
=> https://tildegit.org/sejo/g2e g2e
my solutions for {advent of code 2021} have been written in awk.
# resources
=> https://www.gnu.org/software/gawk/manual/html_node/index.html gawk user's guide
=> https://www.tutorialspoint.com/awk/index.htm awk tutorial: tutorialspoint
# some built-in variables
* FS: input field separator (by default, space)
* RS: record field separator (by default, newline)
* NF: number of fields in current record
* NR: number of current record
* FNR: number of current record relative to current file
* OFS: output field separator (by default, space)
* ORS: output record separator (by default, newline)
* RSTART: index of the first position in the string matched by match( )
* RLENGTH: length of the string matched by match( )
* ARGV: array that stores the command-line arguments. index starts at 0
* ARGC: number of command-line provided arguments
* ARGIND: index in ARGV that is currently being processed (not necessarily compatible with all awks)
$0 represents the entire input record, and $n the nth field in the current record (starting to count from 1)
=> https://www.tutorialspoint.com/awk/awk_built_in_variables.htm awk tutorial: built-in variables
# some built-in functions
=> https://www.tutorialspoint.com/awk/awk_built_in_functions.htm awk tutorial: built-in functions
=> https://www.tutorialspoint.com/awk/awk_miscellaneous_functions.htm miscellaneous functions
=> https://www.tutorialspoint.com/awk/awk_string_functions.htm string functions
## strings
the index of the first character is 1!
* index( string, sub) : index of sub as a substring of string
* length( string )
* match( string, regex ) : index of the longest match of regex in string
* split( string, arr, regex ) : split string into array using regex as separator
* printf( format, expr-list)
* strtonum(string): useful to convert from hexadecimal (0x prefix) or octal (0 prefix)
* gsub( regex, sub, string): global substitution of regex with sub in string. if string is ommited, $0 is used
* sub(regex, sub, string): substitute regex with sub in string, once. if string is omitted, $0 is used
* substr(string, start, len): returns the substring from start index, with length len. if len is ommitted, it goes until the end of the string
* tolower( str )
* toupper( str )
## misc
* getline: read the next line. and other possibilities:
=> https://www.gnu.org/software/gawk/manual/html_node/Getline.html Explicit Input with getline
> The getline command is used in several different ways and should not be used by beginners.
(?)
* next: stops the current processing and start over with next line
* system: execute the specified command and returns its exit status
* delete: delete an element from an array, or an array
# gawk
=> https://www.gnu.org/software/gawk/manual/html_node/index.html gawk user's guide
=> https://www.gnu.org/software/gawk/manual/html_node/Bitwise-Functions.html bit-manipulation functions
note: to run in compatibility mode, without gnu extensions, use the -c or --traditional flag:
```
$ awk -c
```
# other notes
* apparently counters/accumulators don't need to be initialized at 0
* boolean values are 0 or 1
## record separation
records separated by empty lines can be extracted with:
```
RS = ""
```
without modifying FS, fields will be separated by any whitespace, including newlines.
gawk allows regexp in RS, traditional awk will only accept one character.
## field separation
one can get one character at a time by setting (in gawk?) :
```
FS = ""
```
## loop through the elements of an array
this approach might yield the results in different order depending on the awk implementation.
```
arr["a"] = 1
arr["b"] = 2
for(key in arr)
print key ": " arr[key]
```