# awk an awk-ward (or not really) language. some notes in process. # projects g2e is an opinionated gempub to epub converter written in awk: => https://tildegit.org/sejo/g2e g2e my solutions for {advent of code 2021} have been written in awk. # resources => https://www.gnu.org/software/gawk/manual/html_node/index.html gawk user's guide => https://www.tutorialspoint.com/awk/index.htm awk tutorial: tutorialspoint # some built-in variables * FS: input field separator (by default, space) * RS: record field separator (by default, newline) * NF: number of fields in current record * NR: number of current record * FNR: number of current record relative to current file * OFS: output field separator (by default, space) * ORS: output record separator (by default, newline) * RSTART: index of the first position in the string matched by match( ) * RLENGTH: length of the string matched by match( ) * ARGV: array that stores the command-line arguments. index starts at 0 * ARGC: number of command-line provided arguments * ARGIND: index in ARGV that is currently being processed (not necessarily compatible with all awks) $0 represents the entire input record, and $n the nth field in the current record (starting to count from 1) => https://www.tutorialspoint.com/awk/awk_built_in_variables.htm awk tutorial: built-in variables # some built-in functions => https://www.tutorialspoint.com/awk/awk_built_in_functions.htm awk tutorial: built-in functions => https://www.tutorialspoint.com/awk/awk_miscellaneous_functions.htm miscellaneous functions => https://www.tutorialspoint.com/awk/awk_string_functions.htm string functions ## strings the index of the first character is 1! * index( string, sub) : index of sub as a substring of string * length( string ) * match( string, regex ) : index of the longest match of regex in string * split( string, arr, regex ) : split string into array using regex as separator * printf( format, expr-list) * strtonum(string): useful to convert from hexadecimal (0x prefix) or octal (0 prefix) * gsub( regex, sub, string): global substitution of regex with sub in string. if string is ommited, $0 is used * sub(regex, sub, string): substitute regex with sub in string, once. if string is omitted, $0 is used * substr(string, start, len): returns the substring from start index, with length len. if len is ommitted, it goes until the end of the string * tolower( str ) * toupper( str ) ## misc * getline: read the next line. and other possibilities: => https://www.gnu.org/software/gawk/manual/html_node/Getline.html Explicit Input with getline > The getline command is used in several different ways and should not be used by beginners. (?) * next: stops the current processing and start over with next line * system: execute the specified command and returns its exit status * delete: delete an element from an array, or an array # gawk => https://www.gnu.org/software/gawk/manual/html_node/index.html gawk user's guide => https://www.gnu.org/software/gawk/manual/html_node/Bitwise-Functions.html bit-manipulation functions note: to run in compatibility mode, without gnu extensions, use the -c or --traditional flag: ``` $ awk -c ``` # other notes * apparently counters/accumulators don't need to be initialized at 0 * boolean values are 0 or 1 ## record separation records separated by empty lines can be extracted with: ``` RS = "" ``` without modifying FS, fields will be separated by any whitespace, including newlines. gawk allows regexp in RS, traditional awk will only accept one character. ## field separation one can get one character at a time by setting (in gawk?) : ``` FS = "" ``` ## loop through the elements of an array this approach might yield the results in different order depending on the awk implementation. ``` arr["a"] = 1 arr["b"] = 2 for(key in arr) print key ": " arr[key] ```