Commit Graph

17 Commits

Author SHA1 Message Date
vulpine c58681421e dont re-crawl the same page you dingus 2020-06-23 16:04:44 +00:00
lickthecheese 13a3e1d128 fixed cleaning script 2020-03-20 10:23:33 -04:00
lickthecheese 6d7639e693 lol i wasint using uniq correctly 2020-03-19 21:07:11 -04:00
lickthecheese 5425d8a56c um whoops that deleted all my data... 2020-03-19 20:58:13 -04:00
lickthecheese 3ebc3d970f more strictly dont keep non valid data 2020-03-19 20:56:23 -04:00
lickthecheese fb6b06ad7f remove duplicates 2020-03-19 16:35:05 -04:00
lickthecheese 2d759ed8f7 allow the user to not specify a new url while crawling 2020-03-19 16:16:11 -04:00
lickthecheese 285a583492 clean command for when you cancel crawler early 2020-03-19 16:07:34 -04:00
lickthecheese d94901f6f2 fix slep 2020-03-19 10:35:34 -04:00
lickthecheese 1f83896371 check urls 2020-03-19 10:25:44 -04:00
lickthecheese 70f0c4d573 timeout 2020-03-19 09:19:29 -04:00
lickthecheese d8715b66a2 wait a little so you dont get rate limited 2020-03-19 08:57:43 -04:00
lickthecheese 06cffd61a2 recursive crawling 2020-03-19 08:50:46 -04:00
lickthecheese a1ffddda7d more agressive filtering of what is an actual site 2020-03-19 08:46:36 -04:00
lickthecheese 063f715700 dont download videos lol 2020-03-19 08:35:22 -04:00
lickthecheese ff13c039e7 crawl websites 2020-03-19 08:19:11 -04:00
lickthecheese 743b72ad22 gitignore 2020-03-19 07:23:19 -04:00