vulpine
|
c58681421e
|
dont re-crawl the same page you dingus
|
2020-06-23 16:04:44 +00:00 |
lickthecheese
|
13a3e1d128
|
fixed cleaning script
|
2020-03-20 10:23:33 -04:00 |
lickthecheese
|
6d7639e693
|
lol i wasint using uniq correctly
|
2020-03-19 21:07:11 -04:00 |
lickthecheese
|
5425d8a56c
|
um whoops that deleted all my data...
|
2020-03-19 20:58:13 -04:00 |
lickthecheese
|
3ebc3d970f
|
more strictly dont keep non valid data
|
2020-03-19 20:56:23 -04:00 |
lickthecheese
|
fb6b06ad7f
|
remove duplicates
|
2020-03-19 16:35:05 -04:00 |
lickthecheese
|
2d759ed8f7
|
allow the user to not specify a new url while crawling
|
2020-03-19 16:16:11 -04:00 |
lickthecheese
|
285a583492
|
clean command for when you cancel crawler early
|
2020-03-19 16:07:34 -04:00 |
lickthecheese
|
d94901f6f2
|
fix slep
|
2020-03-19 10:35:34 -04:00 |
lickthecheese
|
1f83896371
|
check urls
|
2020-03-19 10:25:44 -04:00 |
lickthecheese
|
70f0c4d573
|
timeout
|
2020-03-19 09:19:29 -04:00 |
lickthecheese
|
d8715b66a2
|
wait a little so you dont get rate limited
|
2020-03-19 08:57:43 -04:00 |
lickthecheese
|
06cffd61a2
|
recursive crawling
|
2020-03-19 08:50:46 -04:00 |
lickthecheese
|
a1ffddda7d
|
more agressive filtering of what is an actual site
|
2020-03-19 08:46:36 -04:00 |
lickthecheese
|
063f715700
|
dont download videos lol
|
2020-03-19 08:35:22 -04:00 |
lickthecheese
|
ff13c039e7
|
crawl websites
|
2020-03-19 08:19:11 -04:00 |
lickthecheese
|
743b72ad22
|
gitignore
|
2020-03-19 07:23:19 -04:00 |