commit 002

This commit is contained in:
nikhotmsk 2024-07-30 16:56:14 +03:00
commit 44888c95c0
8 changed files with 967 additions and 0 deletions

9
md5-c/Makefile Normal file
View File

@ -0,0 +1,9 @@
CC = gcc
CFLAGS = -Wall -Wextra -O3
md5: md5.c main.c
@$(CC) $(CFLAGS) -o md5 md5.c main.c
clean:
@$(RM) md5

291
md5-c/README.md Normal file
View File

@ -0,0 +1,291 @@
# MD5
Takes an input string or file and outputs its MD5 hash.
This repo is gaining a little more traffic than I expected, so I'll put this here as a little disclaimer. I wrote this code as a side project in college in an attempt to better understand the algorithm. I consider this repository to be a reference implementation with a good step by step walkthrough of the algorithm, not necessarily code to be built upon. I did verify the correctness of the output by comparing to other existing standalone programs. However, I did not research edge cases, set up automated testing, or attempt to run the program on any machine other than the laptop I had at the time, so here's the warning:
This code may be generally correct, but you should consider it untested to be on the safe side. There may be edge cases, vulnerabilities, or optimizations I did not consider when I wrote this. I can only confirm that this code probably worked correctly on a single computer in 2017.
Knowing that, do feel free to use this code in any way you wish, no credit needed. And if you find a problem, raise an issue.
### Implementing into Code
If you want to include the md5 algorithm in your own code, you'll only need `md5.c` and `md5.h`.
```c
#include "md5.h"
...
void foo(){
uint8_t result[16];
md5String("Hello, World!", result); // *result = 65a8e27d8879283831b664bd8b7f0ad4
FILE bar = fopen("bar.txt", "r");
md5File(bar, result); // Reads a file from a file pointer
md5File(stdin, result); // Can easily read from stdin
// Manual use
..
MD5Context ctx;
md5Init(&ctx);
...
md5Update(&ctx, input1, input1_size);
...
md5Update(&ctx, input2, input2_size);
...
md5Update(&ctx, input3, input3_size);
...
md5Finalize(&ctx);
ctx.digest; // Result of hashing (as uint8_t* with 16 bytes)
}
```
### Command Line
You can directly use the binary built with this Makefile to process text or files in the command line.
Any arguments will be interpreted as strings. Each argument will be interpreted as a separate string to hash, and will be given its own output (in the order of input).
```shell
$ make
$ ./md5 "Hello, World!"
65a8e27d8879283831b664bd8b7f0ad4
$ ./md5 "Multiple" Strings
a0bf169f2539e893e00d7b1296bc4d8e
89be9433646f5939040a78971a5d103a
$ ./md5 ""
d41d8cd98f00b204e9800998ecf8427e
$ ./md5 "Can use \" escapes"
7bf94222f6dbcd25d6fa21d5985f5634
```
If no arguments are given, input is taken from standard input.
```shell
$ make
$ echo -n "Hello, World!" | ./md5
65a8e27d8879283831b664bd8b7f0ad4
$ echo "Hello, World!" | ./md5
bea8252ff4e80f41719ea13cdf007273
$ echo "File Input" > testFile | ./md5
d41d8cd98f00b204e9800998ecf8427e
$ cat testFile | ./md5
7dacda86e382b27c25a92f8f2f6a5cd8
```
As seen above, it is important to note that many programs will output a newline character after their output. This newline *will* affect the output of the MD5 algorithm. `echo` has the `-n` flag that prevents the output of said character.
If entering input by hand, end collection of data by entering an EOF character (`Ctrl+D` in some cases).
# The Algorithm
While researching this algorithm, the only relatively complete description I found came from RSA Data Security itself in [this memo][1]. And while the description is adequate, any confusion is very difficult to clear up, especially given the nature of the algorithm's output. So here I will try to describe the algorithm used in these implementations with examples.
The algorithm considers all words to be little-endian. I will also specify where this may be confusing.
The algorithm takes in an input of arbitrary length in bits. This can be a string, a file, a number, a struct, etc... It also doesn't need to be byte-aligned, though it almost always is. We'll call this input the message. The output is the digest.
#### Step 1: Padding
The provided message is padded by appending bits to the end until its length is congruent to `448 mod 512` bits. In other words, the message is padded so that its length is 64 bits less than the next multiple of 512. If the original message's length already meets this requirement before padding, it is still padded with 512 bits.
The padding is simply a single "1" bit at the end of the message followed by enough "0" bits to satisfy the length condition above.
##### Example
Let's pass the string "Hello, world!" to the algorithm. Those characters converted to hexadecimal numbers look like this:
```
48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21
```
(Note: Strings are often null-terminated. This null character is not taken into account, as you will see.)
Now we have to pad our message bits:
```
0x 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 80 00 00
0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x 00 00 00 00 00 00 00 00
```
Note the `0x80` right after the end of our message. We're writing a stream of bits, not bytes. Setting the bit after our message to "1" and the next 7 bits to "0" means writing the byte `1000 0000` or `0x80`.
#### Step 2: Appending the Length
Next, the length of the message modulus 2^64 is appended in little endian to the message to round out the total length to a multiple of 512. This length is the number of *bits* in the original message, modulus 2^64. It's common to split this number into two 32-bit words, so keep careful track of which bytes are put where; the highest order byte should be the last byte in the message. This will round out the length of the whole message to a multiple of 512.
##### Example 1
The length of our message is 104 bits. The 64-bit representation of the number 104 in hexadecimal is `0x00000000 00000068`. So we'll append that number to the end.
```
0x 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 80 00 00
0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x 00 00 00 00 00 00 00 00 68 00 00 00 00 00 00 00
```
(We're writing in little-endian, so the lowest order byte is written first.)
If you're holding the length in two separate 32-bit words, make sure to append the lower order bytes first.
##### Example 2
Because our "Hello, world!" example is so small and doesn't give a length with more than two digits, let's say we have a different, bigger message of `0x12345678 90ABCDEF` bits and this chunk we're looking at is just the tail end that we have to pad out. The appended length would look like this:
```
0x 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 80 00 00
0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x 00 00 00 00 00 00 00 00 EF CD AB 90 78 56 34 12
```
#### Step 3: Initializing the Buffer
The variables that will eventually hold our digest must be initialized to the following:
```
A = 0x01234567
B = 0x89abcdef
C = 0xfedcba98
D = 0x76543210
```
#### Step 4: Processing
There are four functions defined in the RSA memo that are used to collapse three 32-bit words into one 32-bit word:
```
F(X, Y, Z) = (X & Y) | (~X & Z)
G(X, Y, Z) = (X & Z) | (Y & ~Z)
H(X, Y, Z) = X ^ Y ^ Z
I(X, Y, Z) = Y ^ (X | ~Z)
```
These are bitwise operations.
We also have to do a left rotate on the bits in a word. That is, shift the bits left, and move overflow to the right. Like spinning a bottle and seeing the label loop around. The function is defined as follows:
```
rotate_left(x, n) = (x << n) | (x >> (32 - n))
```
The constants in K and S can be found at the bottom of this section.
The message is split into blocks of 512 bits. Each block is split into 16 32-bit words. For each block, do the following:
```c
AA = A;
BB = B;
CC = C;
DD = D;
for(i in 0 to 63){
if(0 <= i <= 15){
E = F(BB, CC, DD);
j = i;
}
else if(16 <= i <= 31){
E = G(BB, CC, DD);
j = ((i * 5) + 1) % 16;
}
else if(32 <= i <= 47){
E = H(BB, CC, DD);
j = ((i * 3) + 5) % 16;
}
else{
E = I(BB, CC, DD);
j = (i * 7) % 16;
}
temp = DD;
DD = CC;
CC = BB;
BB = BB + rotate_left(AA + E + K[i] + input[j], S[i]);
AA = temp;
}
A += AA;
B += BB;
C += CC;
D += DD;
```
The RSA memo explicitly lists each step instead of using control structures. The result is the same.
An example for this step is not particularly useful, as the data produced by the loop is not very meaningful for observation.
#### Step 5: Output
The digest is a 128-bit number written in little endian, and is contained in A, B, C, and D after the algorithm is finished. Just arrange the bytes so that the lowest-order byte of the digest is the lowest-order byte of A, and the highest-order byte of the digest is the highest-order byte of D.
##### Example
Here is the output of a few strings to check against:
"Hello, world!"
```
6cd3556deb0da54bca060b4c39479839
```
"" (empty string)
```
d41d8cd98f00b204e9800998ecf8427e
```
"The quick brown fox jumps over the lazy dog."
```
e4d909c290d0fb1ca068ffaddf22cbd0
```
#### Constants and Functions
```c
A = 0x01234567
B = 0x89abcdef
C = 0xfedcba98
D = 0x76543210
K[] = {0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee,
0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501,
0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be,
0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821,
0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa,
0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8,
0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed,
0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a,
0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c,
0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70,
0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05,
0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665,
0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039,
0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1,
0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1,
0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391}
S[] = {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20,
4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21}
F(X, Y, Z) = (X & Y) | (~X & Z)
G(X, Y, Z) = (X & Z) | (Y & ~Z)
H(X, Y, Z) = X ^ Y ^ Z
I(X, Y, Z) = Y ^ (X | ~Z)
rotate_left(x, n) = (x << n) | (x >> (32 - n))
```
[1]: https://tools.ietf.org/html/rfc1321

22
md5-c/UNLICENSE Normal file
View File

@ -0,0 +1,22 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

26
md5-c/main.c Normal file
View File

@ -0,0 +1,26 @@
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include "md5.h"
void print_hash(uint8_t *p){
for(unsigned int i = 0; i < 16; ++i){
printf("%02x", p[i]);
}
printf("\n");
}
int main(int argc, char *argv[]){
uint8_t result[16];
if(argc > 1){
for(int i = 1; i < argc; ++i){
md5String(argv[i], result);
print_hash(result);
}
}
else{
md5File(stdin, result);
print_hash(result);
}
}

223
md5-c/md5.c Normal file
View File

@ -0,0 +1,223 @@
/*
* Derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm
* and modified slightly to be functionally identical but condensed into control structures.
*/
#include "md5.h"
/*
* Constants defined by the MD5 algorithm
*/
#define A 0x67452301
#define B 0xefcdab89
#define C 0x98badcfe
#define D 0x10325476
static uint32_t S[] = {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,
5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20,
4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,
6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21};
static uint32_t K[] = {0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee,
0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501,
0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be,
0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821,
0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa,
0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8,
0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed,
0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a,
0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c,
0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70,
0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05,
0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665,
0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039,
0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1,
0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1,
0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391};
/*
* Padding used to make the size (in bits) of the input congruent to 448 mod 512
*/
static uint8_t PADDING[] = {0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
/*
* Bit-manipulation functions defined by the MD5 algorithm
*/
#define F(X, Y, Z) ((X & Y) | (~X & Z))
#define G(X, Y, Z) ((X & Z) | (Y & ~Z))
#define H(X, Y, Z) (X ^ Y ^ Z)
#define I(X, Y, Z) (Y ^ (X | ~Z))
/*
* Rotates a 32-bit word left by n bits
*/
uint32_t rotateLeft(uint32_t x, uint32_t n){
return (x << n) | (x >> (32 - n));
}
/*
* Initialize a context
*/
void md5Init(MD5Context *ctx){
ctx->size = (uint64_t)0;
ctx->buffer[0] = (uint32_t)A;
ctx->buffer[1] = (uint32_t)B;
ctx->buffer[2] = (uint32_t)C;
ctx->buffer[3] = (uint32_t)D;
}
/*
* Add some amount of input to the context
*
* If the input fills out a block of 512 bits, apply the algorithm (md5Step)
* and save the result in the buffer. Also updates the overall size.
*/
void md5Update(MD5Context *ctx, uint8_t *input_buffer, size_t input_len){
uint32_t input[16];
unsigned int offset = ctx->size % 64;
ctx->size += (uint64_t)input_len;
// Copy each byte in input_buffer into the next space in our context input
for(unsigned int i = 0; i < input_len; ++i){
ctx->input[offset++] = (uint8_t)*(input_buffer + i);
// If we've filled our context input, copy it into our local array input
// then reset the offset to 0 and fill in a new buffer.
// Every time we fill out a chunk, we run it through the algorithm
// to enable some back and forth between cpu and i/o
if(offset % 64 == 0){
for(unsigned int j = 0; j < 16; ++j){
// Convert to little-endian
// The local variable `input` our 512-bit chunk separated into 32-bit words
// we can use in calculations
input[j] = (uint32_t)(ctx->input[(j * 4) + 3]) << 24 |
(uint32_t)(ctx->input[(j * 4) + 2]) << 16 |
(uint32_t)(ctx->input[(j * 4) + 1]) << 8 |
(uint32_t)(ctx->input[(j * 4)]);
}
md5Step(ctx->buffer, input);
offset = 0;
}
}
}
/*
* Pad the current input to get to 448 bytes, append the size in bits to the very end,
* and save the result of the final iteration into digest.
*/
void md5Finalize(MD5Context *ctx){
uint32_t input[16];
unsigned int offset = ctx->size % 64;
unsigned int padding_length = offset < 56 ? 56 - offset : (56 + 64) - offset;
// Fill in the padding and undo the changes to size that resulted from the update
md5Update(ctx, PADDING, padding_length);
ctx->size -= (uint64_t)padding_length;
// Do a final update (internal to this function)
// Last two 32-bit words are the two halves of the size (converted from bytes to bits)
for(unsigned int j = 0; j < 14; ++j){
input[j] = (uint32_t)(ctx->input[(j * 4) + 3]) << 24 |
(uint32_t)(ctx->input[(j * 4) + 2]) << 16 |
(uint32_t)(ctx->input[(j * 4) + 1]) << 8 |
(uint32_t)(ctx->input[(j * 4)]);
}
input[14] = (uint32_t)(ctx->size * 8);
input[15] = (uint32_t)((ctx->size * 8) >> 32);
md5Step(ctx->buffer, input);
// Move the result into digest (convert from little-endian)
for(unsigned int i = 0; i < 4; ++i){
ctx->digest[(i * 4) + 0] = (uint8_t)((ctx->buffer[i] & 0x000000FF));
ctx->digest[(i * 4) + 1] = (uint8_t)((ctx->buffer[i] & 0x0000FF00) >> 8);
ctx->digest[(i * 4) + 2] = (uint8_t)((ctx->buffer[i] & 0x00FF0000) >> 16);
ctx->digest[(i * 4) + 3] = (uint8_t)((ctx->buffer[i] & 0xFF000000) >> 24);
}
}
/*
* Step on 512 bits of input with the main MD5 algorithm.
*/
void md5Step(uint32_t *buffer, uint32_t *input){
uint32_t AA = buffer[0];
uint32_t BB = buffer[1];
uint32_t CC = buffer[2];
uint32_t DD = buffer[3];
uint32_t E;
unsigned int j;
for(unsigned int i = 0; i < 64; ++i){
switch(i / 16){
case 0:
E = F(BB, CC, DD);
j = i;
break;
case 1:
E = G(BB, CC, DD);
j = ((i * 5) + 1) % 16;
break;
case 2:
E = H(BB, CC, DD);
j = ((i * 3) + 5) % 16;
break;
default:
E = I(BB, CC, DD);
j = (i * 7) % 16;
break;
}
uint32_t temp = DD;
DD = CC;
CC = BB;
BB = BB + rotateLeft(AA + E + K[i] + input[j], S[i]);
AA = temp;
}
buffer[0] += AA;
buffer[1] += BB;
buffer[2] += CC;
buffer[3] += DD;
}
/*
* Functions that run the algorithm on the provided input and put the digest into result.
* result should be able to store 16 bytes.
*/
void md5String(char *input, uint8_t *result){
MD5Context ctx;
md5Init(&ctx);
md5Update(&ctx, (uint8_t *)input, strlen(input));
md5Finalize(&ctx);
memcpy(result, ctx.digest, 16);
}
void md5File(FILE *file, uint8_t *result){
char *input_buffer = malloc(1024);
size_t input_size = 0;
MD5Context ctx;
md5Init(&ctx);
while((input_size = fread(input_buffer, 1, 1024, file)) > 0){
md5Update(&ctx, (uint8_t *)input_buffer, input_size);
}
md5Finalize(&ctx);
free(input_buffer);
memcpy(result, ctx.digest, 16);
}

24
md5-c/md5.h Normal file
View File

@ -0,0 +1,24 @@
#ifndef MD5_H
#define MD5_H
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
typedef struct{
uint64_t size; // Size of input in bytes
uint32_t buffer[4]; // Current accumulation of hash
uint8_t input[64]; // Input to be used in the next step
uint8_t digest[16]; // Result of algorithm
}MD5Context;
void md5Init(MD5Context *ctx);
void md5Update(MD5Context *ctx, uint8_t *input, size_t input_len);
void md5Finalize(MD5Context *ctx);
void md5Step(uint32_t *buffer, uint32_t *input);
void md5String(char *input, uint8_t *result);
void md5File(FILE *file, uint8_t *result);
#endif

BIN
treekeeper Executable file

Binary file not shown.

372
treekeeper.c Normal file
View File

@ -0,0 +1,372 @@
#include <stdio.h>
#include <err.h>
#include <limits.h>
#include <stdlib.h>
#include <wordexp.h>
#include <string.h>
#include "md5-c/md5.c"
#define ANSI_BRIGHT_YELLOW_BOLD "\e[93;1m"
#define ANSI_BRIGHT_GREEN_BOLD "\e[92;1m"
#define ANSI_NORMAL "\e[0m"
#define DRIVE_MARK_NAME_SIZE 16
#define DRIVE_MARK_PATH_SIZE 500
#define TREEKEEPER_MAX_ENTRIES 120000000
#define DRIVE_MARK_MAX 250
struct drive_mark {char name [DRIVE_MARK_NAME_SIZE]; char path [DRIVE_MARK_PATH_SIZE]; };
struct entry {long pos; int xlink;};
void parse_entry_for_display (int i, struct entry *array, long array_i, FILE *f, FILE *fx); /* definition at the end */
char *p_buffer = 0;
size_t p_buffer_size = 0;
char *p_drive_name;
long p_seen_at [200]; /* use this vars for display */
int p_seen_at_i;
int p_uniq;
int main (int argc, char **argv) {
fprintf(stderr, ANSI_NORMAL "\n*** Tree Keeper Tool ***\n"
"by @nikhotmsk and @theg4sh\n\n"
"A tool that remembers what do you store on your external drives :)\n\n");
warnx("entry size is %i, the cap on entry quantity is set to %li", sizeof (struct entry), TREEKEEPER_MAX_ENTRIES);
warnx("checking if the tree command is present on the system");
int ret = system("which tree");
if (ret) errx(1, ANSI_BRIGHT_YELLOW_BOLD "you need to have tree command in your system, it is needed for scanning" ANSI_NORMAL);
warnx("checking if the sort command is present on the system");
ret = system("which sort");
if (ret) errx(1, ANSI_BRIGHT_YELLOW_BOLD "please install sort binary, needed for sorting large lists");
char *database_file_name = "~/treekeeper_database_file";
wordexp_t exp_result;
wordexp(database_file_name, &exp_result, 0);
char *database_file_name_resolved = exp_result.we_wordv[0]; /* just get the home directory */
int create_database_no_prompt = 1;
FILE *database_file = fopen(database_file_name_resolved, "r");
if (!database_file) {
warnx(ANSI_BRIGHT_YELLOW_BOLD "I do not see a database file, going to create one at "
ANSI_BRIGHT_GREEN_BOLD "%s" ANSI_NORMAL, database_file_name_resolved);
if (!create_database_no_prompt) {
fprintf(stderr, ANSI_BRIGHT_GREEN_BOLD "Press 'y' and then 'enter' to start scanning and then enter ncurses mode... " ANSI_NORMAL);
char l [30];
char *p = l;
size_t len = sizeof(l);
getline(&p, &len, stdin);
if ('y' == l[0] || 'Y' == l[0]) {
} else errx(1, "Exiting"); /* user abort */
}
database_file = fopen(database_file_name_resolved, "a");
if (!database_file) err(1, "unable to create database file");
fclose(database_file);
database_file = 0;
}
warnx("scanning tree, please wait (circa 3 minutes)...");
ret = system("tree -sfipa /home /mnt /media > /tmp/treekeeper_tree_output");
if (ret) warnx(ANSI_BRIGHT_YELLOW_BOLD "tree failed (because of user abort or other reason)" ANSI_NORMAL);
warnx("processing...");
FILE *tree_output = fopen("/tmp/treekeeper_tree_output", "r");
if (!tree_output) err(1, "unable to open /tmp/treekeeper_tree_output");
struct drive_mark drive_mark_array [DRIVE_MARK_MAX];
int drive_mark_array_i = 0;
char *buffer = 0; /* ask getline to allocate a new buffer */
size_t buffer_size = 0;
int drive_mark_counter = 0;
while (-1 != getline(&buffer, &buffer_size, tree_output)) { /* read file line by line */
if ('-' != buffer[1]) continue; /* ignore directories, named pipes, fifo, sockets, links */
char *pos = strstr(buffer, "treekeeper_drive_");
if (!pos) continue;
char *running_i = pos+17;
while ('\0' != *running_i) {
if ('\n' == *running_i) *running_i = 0; /* remove carret return */
running_i++;
}
drive_mark_counter++;
warnx("found a treekeeper drive mark: %s", pos);
strncpy(drive_mark_array[drive_mark_array_i].name, pos+17, DRIVE_MARK_NAME_SIZE);
drive_mark_array[drive_mark_array_i].name[DRIVE_MARK_NAME_SIZE - 1] = '\0';
drive_mark_array[drive_mark_array_i].path[0] = '\0';
pos[0] = '\0'; /* mark path end here */
if ('.' == pos[-1]) pos[-1] = '\0'; /* hidden file also works */
char *pos2 = buffer;
while (*pos2) {
if ('/' == *pos2) {
strncpy(drive_mark_array[drive_mark_array_i].path, pos2, DRIVE_MARK_PATH_SIZE);
drive_mark_array[drive_mark_array_i].path[DRIVE_MARK_PATH_SIZE - 1] = '\0';
break;
}
pos2++;
}
warnx("the mark name is " ANSI_BRIGHT_GREEN_BOLD "%s" ANSI_NORMAL, drive_mark_array[drive_mark_array_i].name);
warnx("the mark path is %s", drive_mark_array[drive_mark_array_i].path);
int i;
int more_then_once = 0;
for (i = 0; i < drive_mark_array_i; i++) {
if (0 == strcmp(drive_mark_array[drive_mark_array_i].name, drive_mark_array[i].name)) {
warnx(ANSI_BRIGHT_YELLOW_BOLD "treekeeper drive mark found more then once: %s" ANSI_NORMAL,
drive_mark_array[drive_mark_array_i].name);
} /* store it anyway */
}
drive_mark_array_i++;
if (drive_mark_array_i >= DRIVE_MARK_MAX) break;
}
if (!drive_mark_counter) warnx(ANSI_BRIGHT_YELLOW_BOLD "this program relies on special named files for identifying removable media. Consider creating a file named \"treekeeper_drive_mydrive\" or \".treekeeper_drive_mydrive\" at the removable media root directory" ANSI_NORMAL);
char cmd_text [2400];
snprintf(cmd_text, sizeof (cmd_text), "mv %s %s_previous", database_file_name_resolved, database_file_name_resolved);
warnx("trying to mv a file: %s", cmd_text);
system(cmd_text); /* move database file */
database_file = fopen(database_file_name_resolved, "w");
if (!database_file) err(1, "unable to open file for writing %s", database_file_name_resolved);
rewind(tree_output);
while (-1 != getline(&buffer, &buffer_size, tree_output)) { /* read file line by line */
if ('-' != buffer[1]) continue; /* ignore directories, named pipes, fifo, sockets, links */
/* try to decide drive based on path */
char *deliiter_pos = strstr(buffer, "/");
if (!deliiter_pos) continue;
char *drive_name = "n-a";
int i;
for (i = 0; i < drive_mark_array_i; i++) {
char *needle_pos = strstr(buffer, drive_mark_array[i].path);
if (needle_pos == deliiter_pos) drive_name = drive_mark_array[i].name; /* found a match */
}
fprintf(database_file, "%s\t%s", drive_name, buffer); /* write into database */
}
fclose(tree_output);
char prev_database_filename [1900];
snprintf(prev_database_filename, sizeof prev_database_filename, "%s_previous", database_file_name_resolved);
warnx("opening a file %s for offline entries", prev_database_filename);
FILE *database_prev = fopen(prev_database_filename, "r");
if (!database_prev) err(1, "can't open file for reading");
while (-1 != getline(&buffer, &buffer_size, database_prev)) {
int i;
int this_drive_is_updating = 0;
for (i = 0; i < drive_mark_array_i; i++) {
char *needle_pos = strstr(buffer, drive_mark_array[i].name);
if (needle_pos == buffer) { /* trying to find drive name in the beginning of the line */
this_drive_is_updating = 1;
break;
}
}
if (!this_drive_is_updating) fprintf(database_file, "%s", buffer); /* write entry as it is */
/* only not connected drives go through */
}
fclose(database_prev);
fclose(database_file);
snprintf(cmd_text, sizeof cmd_text, "rm %s /tmp/treekeeper_tree_output", prev_database_filename);
warnx("%s", cmd_text);
system(cmd_text);
/* read new database into memory */
/* and also generate a hash table */
struct entry *array_of_entries = malloc(TREEKEEPER_MAX_ENTRIES * sizeof(struct entry));
if (!array_of_entries) err(1, "malloc failed");
long array_of_entries_i = 0;
warnx("generating hash table");
database_file = fopen(database_file_name_resolved, "r");
if (!database_file) err(1, "unable to open database file");
char database_xlink_filename [1900];
snprintf(database_xlink_filename, sizeof database_xlink_filename, "%s_xlink", database_file_name_resolved);
warnx("xlink database file: %s", database_xlink_filename);
FILE *database_xlink = fopen(database_xlink_filename, "w");
if (!database_xlink) err(1, "can't open file for write");
long database_pos;
while (1) {
database_pos = ftell(database_file);
if (-1 == getline(&buffer, &buffer_size, database_file)) break; /* read line */
char *drive_name = strtok(buffer, "\t");
if (!drive_name) continue;
if (!strtok(0, " ")) continue;
char *size_string = strtok(0, "]");
if (!size_string) continue;
char *file_name = 0;
while (1) {
char *t = strtok(0, "/");
if (!t) break;
file_name = t;
}
if (!file_name) continue;
char name_and_size [1800];
snprintf(name_and_size, sizeof name_and_size, "%s\t%s", size_string, file_name); /* this is debug info, there is no need to put it into xlink file */
char result [16];
md5String(name_and_size, result);
/* for(unsigned int e = 0; e < 16; ++e) {
fprintf(stderr, "%02x", (unsigned char) (result[e]));
}
fprintf(stderr, "\n"); */
array_of_entries[array_of_entries_i].pos = database_pos;
array_of_entries[array_of_entries_i].xlink = -1;
for(unsigned int e = 0; e < 16; ++e) {
fprintf(database_xlink, "%02x", (unsigned char) (result[e]));
}
fprintf(database_xlink, " %li %s", array_of_entries_i, name_and_size); /* write hash to file */
array_of_entries_i++;
if (array_of_entries_i >= TREEKEEPER_MAX_ENTRIES) {
warnx(ANSI_BRIGHT_YELLOW_BOLD "entry cap reached, too many entries" ANSI_NORMAL);
break;
}
}
/* system sort */
fclose(database_xlink);
snprintf(cmd_text, sizeof cmd_text, "mv %s %s_unsorted", database_xlink_filename, database_xlink_filename);
warnx("%s", cmd_text);
system(cmd_text);
snprintf(cmd_text, sizeof cmd_text, "sort %s_unsorted > %s", database_xlink_filename, database_xlink_filename);
warnx("%s", cmd_text);
system(cmd_text);
database_xlink = fopen(database_xlink_filename, "r");
int first_hash_written = 0;
long num_previous = 0;
long xlink_pos_prev = 0;
char hash_previous [33];
memset(hash_previous, 0, 32);
while (1) {
long xlink_pos = ftell(database_xlink);
if (-1 == getline(&buffer, &buffer_size, database_xlink)) break;
char *md5 = strtok(buffer, " ");
char *num_str = strtok(0, " ");
if (!num_str) continue;
long num = strtol(num_str, 0, 10);
hash_previous[32] = '\0';
//warnx("comparing %s == %s", md5, hash_previous);
ret = memcmp(md5, hash_previous, 32);
if (0 == ret) {
//warnx("match found %s", md5);
if (!first_hash_written) {
/* try to write to entry for first occurance */
if (num_previous >= array_of_entries_i) continue;
if (num_previous < 0) continue; /* boundary checks */
array_of_entries[num_previous].xlink = xlink_pos_prev;
first_hash_written = 1;
}
if (num >= array_of_entries_i) continue;
if (num < 0) continue; /* boundary checks */
array_of_entries[num].xlink = xlink_pos_prev;
} else first_hash_written = 0;
memcpy(hash_previous, md5, 32);
num_previous = num;
if (!first_hash_written) xlink_pos_prev = xlink_pos;
/* find repeating haches */
/* write offset to the entry, so later code can read those crosslinks */
}
/* remove unsorted xlink */
snprintf(cmd_text, sizeof cmd_text, "rm %s_unsorted", database_xlink_filename);
warnx("%s", cmd_text);
system(cmd_text);
/* memory dump here */
long i;
for (i = 0; i < array_of_entries_i; i++) {
parse_entry_for_display(i, array_of_entries, array_of_entries_i, database_file, database_xlink);
fprintf(stdout, "%li %s%s", i, (p_uniq) ? ANSI_BRIGHT_GREEN_BOLD " unique " ANSI_NORMAL : " ", p_buffer);
int ii;
if (p_seen_at_i > 0) fprintf(stdout, "==== Also seen as:" ANSI_BRIGHT_YELLOW_BOLD);
for (ii = 0; ii < p_seen_at_i; ii++) {
if (p_seen_at[ii] == i) continue;
fseek(database_file, array_of_entries[p_seen_at[ii]].pos, SEEK_SET);
char buf_mark [DRIVE_MARK_NAME_SIZE + 1];
fread(buf_mark, DRIVE_MARK_NAME_SIZE, 1, database_file);
buf_mark[DRIVE_MARK_NAME_SIZE] = '\0';
char *mark = strtok(buf_mark, "\t");
fprintf(stdout, " (%s %li)", mark, p_seen_at[ii]);
}
if (p_seen_at_i > 0) fprintf(stdout, "\n" ANSI_NORMAL);
}
return 0;
}
/*
* char *p_buffer = 0;
* size_t p_buffer_size = 0;
* char *p_drive_name;
* long p_seen_at [200]; use this vars for display
* int p_seen_at_i;
* int p_uniq;
*/
void parse_entry_for_display (int i, struct entry *array, long array_i, FILE *f, FILE *fx) {
p_uniq = 0;
p_seen_at_i = 0;
fseek(f, array[i].pos, SEEK_SET);
ssize_t ret = getline(&p_buffer, &p_buffer_size, f);
if (-1 == array[i].xlink) {
p_uniq = 1;
return;
}
memset(p_seen_at, 0, sizeof p_seen_at);
fseek(fx, array[i].xlink, SEEK_SET);
char *b;
char md5_prev [33];
int md5_prev_set = 0;
while (1) {
long b_size = 0; /* allocate a new buffer */
ret = getline(&b, &b_size, fx); /* read the whole line from xlink file */
if (-1 != ret) {
char *md5 = strtok(b, " ");
char *num_string = strtok(0, " ");
if (md5_prev_set) {
ret = memcmp(md5, md5_prev, 32);
if (ret) break;
}
memcpy(md5_prev, md5, 32);
md5_prev_set = 1;
long xlink_num = strtol(num_string, 0, 10);
if (xlink_num < 0) break;
//fprintf(stdout, "found xlink num %li\n", xlink_num);
p_seen_at[p_seen_at_i] = xlink_num;
p_seen_at_i++;
if (p_seen_at_i >= DRIVE_MARK_MAX) break;
} else break;
}
free(b);
}
/* TODO check if backup folder is writeable at the very beginning */