Scrape-The-Internet-For-Our.../README.md

# Scrape the web because you're bad at documentation :)

Since 2010, [Babycastles](http://babycastles.com) has been throwing great events and doing a bad job documenting them. It's hard to describe what we do without photos and videos so I'm documenting the tools and process I use to scrape social media sites for photos of our events. 

I'm writing these using a Mac, if it's different for a PC/Linux - sorry. Hope this helps other people too.

Also, we're a 9 year old 501c3 non-profit run by volunteers. If you like what we do, including this, please consider making a [tax-deductable donation](https://babycastles.com/Information) so we can keep doing it!

## Downloading from Instagram
This will save the videos & photos and save the filename with the date it was posted to instagram & the username who posted it. the files look like this "2012-10-10_02-09-30_UTC_babycastles.jpg"

1. Uning Terminal, install [pip](https://pip.pypa.io/en/stable/installing/), if its not on your machine
2. Install [Instaloader](https://instaloader.github.io/) using pip `pip install instaloader`
3. Download all Babycastles posts tagged (both with and '@' and a '#') & posted to our account using this command:
`instaloader --login=babycastles babycastles '#babycastles' --filename-pattern={date_utc}_UTC_{profile} --stories --tagged --comments`
4. Download all posts with the hashtag #babycastles using:
`instaloader --login=babycastles "#babycastles" --filename-pattern={date_utc}_UTC_{profile}`
5. And anything posted to our location_id (271251592)
*this doesn’t seem to work yet but it was [just released](https://github.com/instaloader/instaloader/pull/212) 5 hours before I tried it so maybe it still has some bugs to work out. i'm on 4.1.1, try `pip install --upgrade instaloader later`*
`instaloader --login=babycastles "%271251592" --filename-pattern={date_utc}_UTC_{profile}`
6. After you download everything once, you can just append `--fast-update` and it will only download the latest posts since the last time you ran the query

## Export events on Google Calendar to Google Sheets
This will work with both calendars you have admin access to and ones you just follow. Instructions are [here](https://www.cloudbakers.com/blog/how-to-export-a-shared-calendar-to-a-google-spreadsheet).

## Downloading from YouTube
1. Install [youtube-dl](https://rg3.github.io/youtube-dl/) `sudo pip install --upgrade youtube_dl`
-												adding YouTube instructions
											
										
										
											2018-12-18 15:22:44 +00:00
+								# Scrape the web because you're bad at documentation :)
-												included instagram download commands
											
										
										
											2018-12-17 23:20:16 +00:00
-												typo, add link to bbc
											
										
										
											2018-12-21 19:07:54 +00:00
+								Since 2010, [Babycastles](http://babycastles.com) has been throwing great events and doing a bad job documenting them. It's hard to describe what we do without photos and videos so I'm documenting the tools and process I use to scrape social media sites for photos of our events.
-												included instagram download commands
											
										
										
											2018-12-17 23:20:16 +00:00
-												adding YouTube instructions
											
										
										
											2018-12-18 15:22:44 +00:00
+								I'm writing these using a Mac, if it's different for a PC/Linux - sorry. Hope this helps other people too.
-												finished instagram instructions

added hashtag & location queries
											
										
										
											2018-12-18 02:15:08 +00:00
-												changed URL
											
										
										
											2018-12-21 19:18:26 +00:00
+								Also, we're a 9 year old 501c3 non-profit run by volunteers. If you like what we do, including this, please consider making a [tax-deductable donation](https://babycastles.com/Information) so we can keep doing it!
-												added the non-profit stuff
											
										
										
											2018-12-21 19:15:44 +00:00
-												adding YouTube instructions
											
										
										
											2018-12-18 15:22:44 +00:00
+								## Downloading from Instagram
 								This will save the videos & photos and save the filename with the date it was posted to instagram & the username who posted it. the files look like this "2012-10-10_02-09-30_UTC_babycastles.jpg"
 . Uning Terminal, install [pip](https://pip.pypa.io/en/stable/installing/), if its not on your machine
 . Install [Instaloader](https://instaloader.github.io/) using pip `pip install instaloader`
-												finished instagram instructions

added hashtag & location queries
											
										
										
											2018-12-18 02:15:08 +00:00
+. Download all Babycastles posts tagged (both with and '@' and a '#') & posted to our account using this command:
-												added stories

pulling stories from instagram now
											
										
										
											2019-01-05 18:29:50 +00:00
+								`instaloader --login=babycastles babycastles '#babycastles' --filename-pattern={date_utc}_UTC_{profile} --stories --tagged --comments`
-												finished instagram instructions

added hashtag & location queries
											
										
										
											2018-12-18 02:15:08 +00:00
+. Download all posts with the hashtag #babycastles using:
-												text fixes
											
										
										
											2018-12-18 03:17:17 +00:00
+								`instaloader --login=babycastles "#babycastles" --filename-pattern={date_utc}_UTC_{profile}`
-												finished instagram instructions

added hashtag & location queries
											
										
										
											2018-12-18 02:15:08 +00:00
+. And anything posted to our location_id (271251592)
-												adding YouTube instructions
											
										
										
											2018-12-18 15:22:44 +00:00
+								*this doesn’t seem to work yet but it was [just released](https://github.com/instaloader/instaloader/pull/212) 5 hours before I tried it so maybe it still has some bugs to work out. i'm on 4.1.1, try `pip install --upgrade instaloader later`*
-												text fixes
											
										
										
											2018-12-18 03:17:17 +00:00
+								`instaloader --login=babycastles "%271251592" --filename-pattern={date_utc}_UTC_{profile}`
-												adding YouTube instructions
											
										
										
											2018-12-18 15:22:44 +00:00
+. After you download everything once, you can just append `--fast-update` and it will only download the latest posts since the last time you ran the query
-												Added GCal export info
											
										
										
											2018-12-28 00:53:28 +00:00
+								## Export events on Google Calendar to Google Sheets
 								This will work with both calendars you have admin access to and ones you just follow. Instructions are [here](https://www.cloudbakers.com/blog/how-to-export-a-shared-calendar-to-a-google-spreadsheet).
-												adding YouTube instructions
											
										
										
											2018-12-18 15:22:44 +00:00
+								## Downloading from YouTube
 . Install [youtube-dl](https://rg3.github.io/youtube-dl/) `sudo pip install --upgrade youtube_dl`