This commit is contained in:
lolcat 2023-11-07 08:04:56 -05:00
parent 64b090ee05
commit 785452873f
59 changed files with 2592 additions and 1277 deletions

132
README.md
View File

@ -3,12 +3,21 @@
# 4get
4get is a metasearch engine that doesn't suck (they live in our walls!)
## About 4get
# About 4get
https://4get.ca/about
## Try it out
# Try it out
https://4get.ca
# Totally unbiased comparison between alternatives
| | 4get | searx(ng) | librex | araa |
|----------------------------|-------------------------|-----------|-------------|----------|
| RAM usage | 200-400mb~ | 2GB~ | 200-400mb~ | 2GB~ |
| Does it suck | no (debunked by snopes) | yes | yes | a little |
| Does it work | ye | no | no | ye |
| Did the dev commit suicide | not until my 30s | idk | yes | no |
## Supported websites
1. Web
- DuckDuckGo
@ -36,7 +45,6 @@ https://4get.ca
4. News
- DuckDuckGo
- Brave
- Google
- Mojeek
5. Music
@ -55,15 +63,15 @@ https://4get.ca
More scrapers are coming soon. I currently want to add Google web/video/news search, HackerNews (durr orange site!!) and Qwant. A shopping and files tab is also in my todo list.
# Setup
# Installation
This section is still to-do. You will need to figure shit out for some of the apache2 and nginx stuff. Everything else should be OK.
## Apache
## Install on Apache
Login as root.
```sh
apt install apache2 certbot php-dom php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-php python3-certbot-apache
apt install apache2 certbot php-imagick imagemagick php-curl curl php-apcu git libapache2-mod-php python3-certbot-apache
service apache2 start
a2enmod rewrite
```
@ -90,7 +98,7 @@ chmod 777 -R icons/
Restart the service for good measure... `service apache2 restart`
## NGINX
## Install on NGINX
Login as root.
@ -138,10 +146,54 @@ ln -s /etc/nginx/sites-available/4get.conf /etc/nginx/sites-available/4get.conf
Now test the nginx config with `nginx -t`, if it says that everything is good, restart nginx using `systemctl restart nginx`
## Setup encryption
## Install using Docker (lol u lazy fuck)
```
docker run -d -p 80:80 -e FOURGET_SERVER_NAME="4get.ca" -e FOURGET_SERVER_ADMIN_EMAIL="you@example.com" luuul/4get:latest
```
...Or with SSL:
```
docker run -d -p 443:443 -e FOURGET_SERVER_NAME="4get.ca" -e FOURGET_SERVER_ADMIN_EMAIL="you@example.com" -v /etc/letsencrypt/live/domain.tld:/etc/4get/certs luuul/4get:latest
```
replace enviroment variables FOURGET_SERVER_NAME and FOURGET_SERVER_ADMIN_EMAIL with relevant values
if the certificate files are not mounted to /etc/4get/certs the service listens to port 80
the certificate directory expects files named `cert.pem`, `chain.pem`, `privkey.pem`
## Install using Docker Compose
copy `docker-compose.yaml`
create a directory with images named `banners` for example and mount to `/var/www/html/4get/banner`
to serve custom banners
```
version: "3.7"
services:
fourget:
image: luuul/4get:latest
restart: always
environment:
- FOURGET_SERVER_NAME=4get.ca
- FOURGET_SERVER_ADMIN_EMAIL="you@example.com"
ports:
- "80:80"
- "443:443"
volumes:
- /etc/letsencrypt/live/domain.tld:/etc/4get/certs
- ./banners:/var/www/html/4get/banner
```
Replace relevant values and start with `docker-compose up -d`
# Encryption setup
I'm schizoid (as you should) so I'm gonna setup 4096bit key encryption. To complete this step, you need a domain or subdomain in your possession. Make sure that the DNS shit for your domain has propagated properly before continuing, because certbot is a piece of shit that will error out the ass once you reach 5 attempts under an hour.
### Apache
## Encryption setup on Apache
```sh
certbot --apache --rsa-key-size 4096 -d www.yourdomain.com -d yourdomain.com
@ -169,7 +221,7 @@ Restart again
service apache2 restart
```
### NGINX
## Encryption setup on NGINX
Generate a certificate for the domain using:
@ -180,15 +232,13 @@ certbot --nginx --key-type ecdsa -d www.yourdomain.com -d yourdomain.com
After doing that certbot should deploy the certificate automatically into your 4get nginx config file. It should be ready to use at that point.
## Captcha
# Jesse it is time to configure the server the fucking bots are back
Right now the setup for this shit is absolutely awful.
Wohoo the awful piece of shit setup and fiddling with 3 gazillion files is GONE. All you need to do to configure your shit is to go in `data/config.php` and edit the self-documenting configuration file. You can also specify proxies in `data/proxies/whatever.txt` and captcha images in `data/captcha/category/1.png`... I further explain how to deal with that garbage in the config file I mentionned.
Edit line 190 in `lib/captcha_gen.php` and specify your image sets. You can't disable the captcha right now lol. Just use a previous commit if you want to do that. Call me a shitcoder all you want I've had no energy lately. Images must be stored in `data/captcha`. Create a folder for each category. All files in there should be named from `1.png` to `321839.png`, for example.
# (Optional) Tor setup
## Tor Setup
1. Install tor.
1. Install `tor`.
2. Open `/etc/tor/torrc`
3. Go to the line that contains `HiddenServiceDir` and `HiddenServicePort`
4. Uncomment those 2 lines and set them like this:
@ -205,7 +255,7 @@ After you get your onion address you will need to configure your Apache or Nginx
I don't know to configure this shit on Apache so here is the NGINX one.
### NGINX
## Tor setup on NGINX
Open your current 4get NGINX config (that is under `/etc/nginx/sites-available/`) and append this to the end of the file:
@ -240,49 +290,5 @@ server {
Obviously replace `<youronionaddress>` by the onion address of `/var/lib/tor/4get/hostname` and then check if the nginx config is valid with `nginx -t` if yes, then restart the nginx service and try opening the onion address into the Tor Browser. You can see a real world example [here](https://git.zzls.xyz/Fijxu/etc-configs/src/branch/selfhost/nginx/sites-available/4get.zzls.xyz.conf)
## Docker Install
```
docker run -d -p 80:80 -e FOURGET_SERVER_NAME="4get.ca" -e FOURGET_SERVER_ADMIN_EMAIL="you@example.com" luuul/4get:latest
```
With SSL
```
docker run -d -p 443:443 -e FOURGET_SERVER_NAME="4get.ca" -e FOURGET_SERVER_ADMIN_EMAIL="you@example.com" -v /etc/letsencrypt/live/domain.tld:/etc/4get/certs luuul/4get:latest
```
replace enviroment variables FOURGET_SERVER_NAME and FOURGET_SERVER_ADMIN_EMAIL with relevant values
if the certificate files are not mounted to /etc/4get/certs the service listens to port 80
the certificate directory expects files named `cert.pem`, `chain.pem`, `privkey.pem`
## Docker compose
copy `docker-compose.yaml`
create a directory with images named `banners` for example and mount to `/var/www/html/4get/banner`
to serve custom banners
```
version: "3.7"
services:
fourget:
image: luuul/4get:latest
restart: always
environment:
- FOURGET_SERVER_NAME=4get.ca
- FOURGET_SERVER_ADMIN_EMAIL="you@example.com"
ports:
- "80:80"
- "443:443"
volumes:
- /etc/letsencrypt/live/domain.tld:/etc/4get/certs
- ./banners:/var/www/html/4get/banner
```
Replace relevant values and start with `docker-compose up -d`
# Contact
shit breaks all the time but I repair it all the time too. Email me here: will<at>lolcat(dot)ca

129
about.php
View File

@ -1,128 +1,23 @@
<?php
include "data/config.php";
include "lib/frontend.php";
$frontend = new frontend();
echo
'<!DOCTYPE html>' .
'<html lang="en">' .
'<head>' .
'<meta http-equiv="Content-Type" content="text/html;charset=utf-8">' .
'<title>About</title>' .
'<link rel="stylesheet" href="/static/style.css">' .
'<meta name="viewport" content="width=device-width,initial-scale=1">' .
'<meta name="robots" content="index,follow">' .
'<link rel="icon" type="image/x-icon" href="/favicon.ico">' .
'<meta name="description" content="4get.ca: About">' .
'<link rel="search" type="application/opensearchdescription+xml" title="4get" href="/opensearch.xml">' .
'</head>' .
'<body class="' . $frontend->getthemeclass(false) . 'about">';
include "data/instances.php";
$compiledinstancelist = "";
foreach ($instancelist as $instance)
{
$compiledinstancelist .= "<tr> <td>".$instance["name"]."</td>";
$compiledinstancelist .= "<td> <a href=\"".$instance["address"]["uri"]."\">".$instance["address"]["displayname"]."</a>";
foreach ($instance["altaddresses"] as $alt)
{
$compiledinstancelist .= "<a href=\"".$alt["uri"]."\">(".$alt["displayname"].")</a></td>";
}
$compiledinstancelist .= "</tr>";
}
$frontend->load(
"header_nofilters.html",
[
"title" => "About",
"class" => " class=\"about\""
]
);
$left =
'<a href="/" class="link">&lt; Go back</a>
<h1>Set as default search engine</h1>
<a href="#firefox"><h2 id="firefox">On Firefox and other Gecko based browsers</h2></a>
To set this as your default search engine on Firefox, right click the URL bar and select <div class="code-inline">Add "4get"</div>. Then, visit <a href="about:preferences#search" target="_BLANK" class="link">about:preferences#search</a> and select <div class="code-inline">4get</div> in the dropdown menu.
<a href="#chrome"><h2 id="chrome">On Chromium and Blink based browsers</h2></a>
Click the 3 superpositioned dots at the top right of the screen and click on <div class="code-inline">Settings</div>, then search for <div class="code-inline">default search engine</div>, or visit <a href="chrome://settings/searchEngines">chrome://settings/searchEngines</a>.<br><br>
Once you\'re there, click the pencil on the last entry under "Search engines" (it\'s probably DuckDuckGo). Once you do that, a popup will appear. Populate it with the following information:
<table>
<tr>
<td><b>Field</b></td>
<td><b>Value</b></td>
</tr>
<tr>
<td>Search engine</td>
<td>4get</td>
</tr>
<tr>
<td>Shortcut</td>
<td>4get</td>
</tr>
<tr>
<td>URL with %s in place of query</td>
<td>https://4get.ca/web?s=%s</td>
</tr>
</table>
Once that\'s done, click <div class="code-inline">Save</div>. Then, on the right handside of the newly created entry, open the dropdown menu and select <div class="code-inline">Make default</div>.
<h1>Frequently asked questions</h1>
<a href="#what-is-this"><h2 id="what-is-this">What is this?</h2></a>
This is a metasearch engine that gets results from other engines, and strips away all of the tracking parameters and Microsoft/globohomo bullshit they add. Most of the other alternatives to Google jack themselves off about being ""privacy respecting"" or whatever the fuck but it always turns out to be a total lie, and I just got fed up with their shit honestly. Alternatives like Searx or YaCy all fucking sucks so I made my own thing.
<a href="#goal"><h2 id="goal">My goal</h2></a>
Provide users with a privacy oriented, extremely lightweight, ad free, free as in freedom (and free beer!) way to search for documents around the internet, with minimal, optional javascript code. My long term goal would be to build my own index (that doesn\'t suck) and provide users with an unbiased search engine, with no political inclinations.
<a href="#logs"><h2 id="logs">Do you keep logs?</h2></a>
I store data temporarly to get the next page of results. This might include search queries, tokens and other parameters. These parameters are encrypted using <div class="code-inline">aes-256-gcm</div> on the serber, for which I give you a key (also known internally as <div class="code-inline">npt</div> token). When you make a request to get the next page, you supply the token, the data is decrypted and the request is fulfilled. This encrypted data is deleted after 15 minutes, or after it\'s used, whichever comes first.<br><br>
I <b>don\'t</b> log IP addresses, user agents, or anything else. The <div class="code-inline">npt</div> tokens are the only thing that are stored (in RAM, mind you), temporarly, encrypted.
<a href="#information-sharing"><h2 id="information-sharing">Do you share information with third parties?</h2></a>
Your search queries and supplied filters are shared with the scraper you chose (so I can get the search results, duh). I don\'t share anything else (that means I don\'t share your IP address, location, or anything of this kind). There is no way that site can know you\'re the one searching for something, <u>unless you send out a search query that de-anonymises you.</u> For example, a search query like "hello my full legal name is jonathan gallindo and i want pictures of cloacas" would definitively blow your cover. 4get doesn\'t contain ads or any third party javascript applets or trackers. I don\'t profile you, and quite frankly, I don\'t give a shit about what you search on there.<br><br>
TL;DR assume those websites can see what you search for, but can\'t see who you are (unless you\'re really dumb).
<a href="#hosting"><h2 id="hosting">Where is this website hosted?</h2></a>
This website is hosted on a Contabo shitbox in the United States.
<a href="#keyboard-shortcuts"><h2 id="keyboard-shortcuts">Keyboard shortcuts?</h2></a>
Use <div class="code-inline">/</div> to focus the search box.<br><br>
When the image viewer is open, you can use the following keybinds:<br>
<div class="code-inline">Up</div>, <div class="code-inline">Down</div>, <div class="code-inline">Left</div>, <div class="code-inline">Right</div> to rotate the image.<br>
<div class="code-inline">CTRL+Up</div>, <div class="code-inline">CTRL+Down</div>, <div class="code-inline">CTRL+Left</div>, <div class="code-inline">CTRL+Right</div> to mirror the image.<br>
<div class="code-inline">Escape</div> to exit the image viewer.
<a href="#instances"><h2 id="instances">Instances</h2></a>
4get is open source, anyone can create their own 4get instance! If you wish to add your website to this list, please <a href="https://lolcat.ca">contact me</a>.
<table>
<tr>
<td>Name</td>
<td>Address</td>
</tr>
'.$compiledinstancelist.'
</table>
<a href="#schizo"><h2 id="schizo">How can I trust you?</h2></a>
You just sort of have to take my word for it right now. If you\'d rather trust yourself instead of me (I believe in you!!), all of the code on this website is available trough my <a href="https://git.lolcat.ca/lolcat" class="link">git page</a> for you to host on your own machines. Just a reminder: if you\'re the sole user of your instance, it doesn\'t take immense brain power for Microshit to figure out you basically just switched IP addresses. Invite your friends to use your instance!
<a href="#donate"><h2 id="donate">Support the project</h2></a>
Donate to me trough ko-fi: <a href="https://ko-fi.com/lolcat" target="BLANK" rel="noreferrer">ko-fi.com/lolcat</a><br>
Please donate I sent myself a donation for testing if it works and it looks fucking dumb. Reasons to donate are listed on there. Thank you!
<a href="#contact"><h2 id="contact">I want to report abuse or have erotic roleplay trough email</h2></a>
I don\'t know about that second part but if you want to talk to me, just drop me an email...<br><br>
<b>Message to all DMCA enforcers:</b> I don\'t host any of the content. Everything you see here is <u>proxied</u> trough my shitbox with no moderation. Please reach out to the people hosting the infringing content instead.<br><br>
<a href="https://lolcat.ca" rel="dofollow" class="link">Click here to contact me!</a><br><br>
<a href="https://validator.w3.org/nu/?doc=https%3A%2F%2F4get.ca" title="W3 Valid!">
<img src="/static/icon/w3html.png" alt="Valid W3C HTML 4.01" width="88" height="31">
</a>';
// trim out whitespace
$left = explode("\n", $left);
explode(
"\n",
file_get_contents("template/about.html")
);
$out = "";

27
ami4get.php Normal file
View File

@ -0,0 +1,27 @@
<?php
header("Content-Type: application/json");
header("Access-Control-Allow-Origin: *");
include "data/config.php";
$bot_requests = apcu_fetch("captcha");
$real_requests = apcu_fetch("real_requests");
echo json_encode(
[
"status" => "ok",
"service" => "4get",
"server" => [
"name" => config::SERVER_NAME,
"description" => config::SERVER_LONG_DESCRIPTION,
"bot_protection" => config::BOT_PROTECTION,
"real_requests" => $real_requests === false ? 0 : $real_requests,
"bot_requests" => $bot_requests === false ? 0 : $bot_requests,
"api_enabled" => config::API_ENABLED,
"alt_addresses" => config::ALT_ADDRESSES,
"version" => config::VERSION
],
"instances" => config::INSTANCES
]
);

View File

@ -119,6 +119,11 @@
/_____/_/ /_/\__,_/ .___/\____/_/_/ /_/\__/____/
/_/
+ /ami4get
Tells you basic information about the 4get instance. CORS requests
are allowed on this endpoint.
+ /api/v1/web
+ &extendedsearch
When using the ddg(DuckDuckGo) scraper, you may make use of the

View File

@ -1,5 +1,6 @@
<?php
include "../../data/config.php";
new autocomplete();
class autocomplete{
@ -17,7 +18,7 @@ class autocomplete{
"yep" => "https://api.yep.com/ac/?query={searchTerms}",
"marginalia" => "https://search.marginalia.nu/suggest/?partial={searchTerms}",
"yt" => "https://suggestqueries-clients6.youtube.com/complete/search?client=youtube&q={searchTerms}",
"sc" => "https://api-v2.soundcloud.com/search/queries?q={searchTerms}&client_id=ArYppSEotE3YiXCO4Nsgid2LLqJutiww&limit=10&offset=0&linked_partitioning=1&app_version=1693487844&app_locale=en"
"sc" => "https://api-v2.soundcloud.com/search/queries?q={searchTerms}&client_id=" . config::SC_CLIENT_TOKEN . "&limit=10&offset=0&linked_partitioning=1&app_version=1693487844&app_locale=en"
];
/*
@ -107,7 +108,8 @@ class autocomplete{
[
$_GET["s"],
$json
]
],
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES
);
break;
@ -132,7 +134,8 @@ class autocomplete{
[
$_GET["s"],
$json
]
],
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES
);
break;
@ -150,7 +153,8 @@ class autocomplete{
[
$_GET["s"],
$json
]
],
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES
);
break;
@ -162,7 +166,8 @@ class autocomplete{
[
$_GET["s"],
$json[1] // ensure it contains valid key 0
]
],
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES
);
break;
}
@ -170,45 +175,54 @@ class autocomplete{
private function get($url, $query){
$curlproc = curl_init();
$url = str_replace("{searchTerms}", urlencode($query), $url);
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0",
"Accept: application/json, text/javascript, */*; q=0.01",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Connection: keep-alive",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-site"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
try{
$curlproc = curl_init();
throw new Exception(curl_error($curlproc));
}
$url = str_replace("{searchTerms}", urlencode($query), $url);
curl_setopt($curlproc, CURLOPT_URL, $url);
curl_setopt($curlproc, CURLOPT_ENCODING, ""); // default encoding
curl_setopt($curlproc, CURLOPT_HTTPHEADER,
["User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0",
"Accept: application/json, text/javascript, */*; q=0.01",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip",
"DNT: 1",
"Connection: keep-alive",
"Sec-Fetch-Dest: empty",
"Sec-Fetch-Mode: cors",
"Sec-Fetch-Site: same-site"]
);
curl_setopt($curlproc, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curlproc, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curlproc, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curlproc, CURLOPT_TIMEOUT, 30);
$data = curl_exec($curlproc);
if(curl_errno($curlproc)){
throw new Exception(curl_error($curlproc));
}
curl_close($curlproc);
return $data;
curl_close($curlproc);
return $data;
}catch(Exception $error){
do404("Curl error: " . $error->getMessage());
}
}
private function do404($error){
echo json_encode(["error" => $error]);
echo json_encode(
["error" => $error],
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES
);
die();
}
@ -218,7 +232,8 @@ class autocomplete{
[
$_GET["s"],
[]
]
],
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES
);
die();
}

View File

@ -1,8 +1,14 @@
<?php
chdir("../../");
header("Content-Type: application/json");
chdir("../../");
include "data/config.php";
if(config::API_ENABLED === false){
echo json_encode(["status" => "The server administrator disabled the API!"]);
return;
}
include "lib/frontend.php";
$frontend = new frontend();

View File

@ -1,8 +1,14 @@
<?php
chdir("../../");
header("Content-Type: application/json");
chdir("../../");
include "data/config.php";
if(config::API_ENABLED === false){
echo json_encode(["status" => "The server administrator disabled the API!"]);
return;
}
include "lib/frontend.php";
$frontend = new frontend();

View File

@ -1,8 +1,14 @@
<?php
chdir("../../");
header("Content-Type: application/json");
chdir("../../");
include "data/config.php";
if(config::API_ENABLED === false){
echo json_encode(["status" => "The server administrator disabled the API!"]);
return;
}
include "lib/frontend.php";
$frontend = new frontend();

View File

@ -1,8 +1,14 @@
<?php
chdir("../../");
header("Content-Type: application/json");
chdir("../../");
include "data/config.php";
if(config::API_ENABLED === false){
echo json_encode(["status" => "The server administrator disabled the API!"]);
return;
}
include "lib/frontend.php";
$frontend = new frontend();

View File

@ -1,8 +1,14 @@
<?php
chdir("../../");
header("Content-Type: application/json");
chdir("../../");
include "data/config.php";
if(config::API_ENABLED === false){
echo json_encode(["status" => "The server administrator disabled the API!"]);
return;
}
include "lib/frontend.php";
$frontend = new frontend();
@ -21,7 +27,13 @@ new captcha($null, $null, $null, "web", false);
$get = $frontend->parsegetfilters($_GET, $filters);
if(!isset($_GET["extendedsearch"])){
if(
isset($_GET["extendedsearch"]) &&
$_GET["extendedsearch"] == "yes"
){
$get["extendedsearch"] = "yes";
}else{
$get["extendedsearch"] = "no";
}

View File

@ -7,6 +7,7 @@ if(!isset($_GET["s"])){
die();
}
include "data/config.php";
include "lib/curlproxy.php";
$proxy = new proxy();

View File

@ -1,5 +1,6 @@
<?php
include "data/config.php";
new sc_audio();
class sc_audio{

103
data/config.php Normal file
View File

@ -0,0 +1,103 @@
<?php
class config{
// Welcome to the 4get configuration file
// When updating your instance, please make sure this file isn't missing
// any parameters.
// 4get version. Please keep this updated
const VERSION = 5;
// Will be shown pretty much everywhere.
const SERVER_NAME = "4get";
// Will be shown in <meta> tag on home page
const SERVER_SHORT_DESCRIPTION = "They live in our walls!";
// Will be shown in server list ping (null for no description)
const SERVER_LONG_DESCRIPTION = null;
// Add your own themes in "static/themes". Set to "Dark" for default theme.
// Eg. To use "static/themes/Cream.css", specify "Cream".
const DEFAULT_THEME = "Dark";
// Enable the API?
const API_ENABLED = true;
// Bot protection
// 4get.ca has been hit with 250k bot reqs every single day for months
// you probably want to enable this if your instance is public...
// 0 = disabled
// 1 = ask for image captcha (requires image dataset & imagick 6.9.11-60)
// @TODO: 2 = invite only (users needs a pass)
const BOT_PROTECTION = 0;
// if BOT_PROTECTION is set to 1, specify the available datasets here
// images should be named from 1.png to X.png, and be 100x100 in size
// Eg. data/captcha/birds/1.png up to 2263.png
const CAPTCHA_DATASET = [
// example:
// ["birds", 2263],
// ["fumo_plushies", 1006],
// ["minecraft", 848]
];
// List of domains that point to your servers. Include your tor/i2p
// addresses here! Must be a valid URL. Won't affect links placed on
// the homepage.
const ALT_ADDRESSES = [
//"https://4get.alt-tld",
//"http://4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion"
];
// Known 4get instances. MUST use the https protocol if your instance uses
// it. Is used to generate a distributed list of instances.
// To appear in the list of an instance, contact the host and if everyone added
// eachother your serber should appear everywhere.
const INSTANCES = [
"https://4get.ca",
"https://4get.zzls.xyz",
"https://4get.silly.computer",
"https://4g.opnxng.com",
"https://4get.konakona.moe"
];
// Default user agent to use for scraper requests. Sometimes ignored to get specific webpages
// Changing this might break things.
const USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0";
// Proxy pool assignments for each scraper
// false = Use server's raw IP
// string = will load a proxy list from data/proxies
// Eg. "onion" will load data/proxies/onion.txt
const PROXY_DDG = false; // duckduckgo
const PROXY_BRAVE = false;
const PROXY_FB = false; // facebook
const PROXY_GOOGLE = false;
const PROXY_MARGINALIA = false;
const PROXY_MOJEEK = false;
const PROXY_SC = false; // soundcloud
const PROXY_WIBY = false;
const PROXY_YT = false; // youtube
const PROXY_YEP = false;
const PROXY_PINTEREST = false;
const PROXY_FTM = false; // findthatmeme
const PROXY_IMGUR = false;
const PROXY_YANDEX_W = false; // yandex web
const PROXY_YANDEX_I = false; // yandex images
const PROXY_YANDEX_V = false; // yandex videos
//
// Scraper-specific parameters
//
// SOUNDCLOUD
// Get these parameters by making a search on soundcloud with network
// tab open, then filter URLs using "search?q=". (No need to login)
const SC_USER_ID = "143860-454480-469473-289775";
const SC_CLIENT_TOKEN = "qwfvRfz8PCoa2NldZALK7hhZFIH24Wyx";
// MARGINALIA
// Get an API key by contacting the Marginalia.nu maintainer. The "public" key
// works but is almost always rate-limited.
const MARGINALIA_API_KEY = "public";
}

View File

@ -1,62 +0,0 @@
<?php
// this file exists to separate instance data from the actual about page
// HTML, and to make it easier to add/modify instances cleanly.
$instancelist = [
[
"name" => "lolcat's instance (master)",
"address" => [
"uri" => "https://4get.ca/",
"displayname" => "4get.ca"
],
"altaddresses" => [
[
// all these address blocks will be linked in parentheses
// e.g. 4get.ca (tor) (i2p) etc.
"uri" => "http://4getwebfrq5zr4sxugk6htxvawqehxtdgjrbcn2oslllcol2vepa23yd.onion",
"displayname" => "tor"
]
]
],
[
"name" => "zzls's Chilean instance",
"address" => [
"uri" => "https://4get.zzls.xyz/",
"displayname" => "4get.zzls.xyz"
],
"altaddresses" => [
[
"uri" => "http://4get.zzlsghu6mvvwyy75mvga6gaf4znbp3erk5xwfzedb4gg6qqh2j6rlvid.onion",
"displayname" => "tor"
]
]
],
[
"name" => "zzls's United States instance",
"address" => [
"uri" => "https://4getus.zzls.xyz/",
"displayname" => "4getus.zzls.xyz"
],
"altaddresses" => [
[
"uri" => "http://4getus.zzlsghu6mvvwyy75mvga6gaf4znbp3erk5xwfzedb4gg6qqh2j6rlvid.onion",
"displayname" => "tor"
]
]
],
[
"name" => "4get on a silly computer",
"address" => [
"uri" => "https://4get.silly.computer",
"displayname" => "4get.silly.computer"
],
"altaddresses" => [
[
"uri" => "https://4get.cynic.moe/",
"displayname" => "fallback domain"
]
]
]
]
?>

3
data/proxies/.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
*
!.gitignore
!onion.txt

13
data/proxies/onion.txt Normal file
View File

@ -0,0 +1,13 @@
# Specify proxies by following this format:
# <type>:<address>:<port>:<username>:<password>
#
# Examples:
# https:1.3.3.7:6969:abcd:efg
# socks4:1.2.3.4:8080::
# raw_ip::::
#
# Available types:
# raw_ip, http, https, socks4, socks5, socks4a, socks5_hostname
# Local tor proxy
socks5:localhost:9050::

View File

@ -6,6 +6,7 @@ if(!isset($_GET["s"])){
die();
}
include "data/config.php";
new favicon($_GET["s"]);
class favicon{

View File

@ -3,6 +3,8 @@
/*
Initialize random shit
*/
include "data/config.php";
include "lib/frontend.php";
$frontend = new frontend();
@ -26,20 +28,7 @@ try{
}catch(Exception $error){
echo
$frontend->drawerror(
"Shit",
'This scraper returned an error:' .
'<div class="code">' . htmlspecialchars($error->getMessage()) . '</div>' .
'Things you can try:' .
'<ul>' .
'<li>Use a different scraper</li>' .
'<li>Remove keywords that could cause errors</li>' .
'<li>Use another 4get instance</li>' .
'</ul><br>' .
'If the error persists, please <a href="/about">contact the administrator</a>.'
);
die();
$frontend->drawscrapererror($error->getMessage(), $get, "images");
}
if(count($results["image"]) === 0){

View File

@ -1,5 +1,6 @@
<?php
include "data/config.php";
include "lib/frontend.php";
$frontend = new frontend();
@ -8,7 +9,7 @@ $images = glob("banner/*");
echo $frontend->load(
"home.html",
[
"body_class" => $frontend->getthemeclass(false),
"server_short_description" => htmlspecialchars(config::SERVER_SHORT_DESCRIPTION),
"banner" => $images[rand(0, count($images) - 1)]
]
);

55
instances.php Normal file
View File

@ -0,0 +1,55 @@
<?php
include "lib/frontend.php";
$frontend = new frontend();
include "data/config.php";
$params = "";
$first = true;
foreach($_GET as $key => $value){
if(
!is_string($value) ||
$key == "target"
){
continue;
}
if($first === true){
$first = false;
$params = "?";
}else{
$params .= "&";
}
$params .= urlencode($key) . "=" . urlencode($value);
}
if(
!isset($_GET["target"]) ||
!is_string($_GET["target"])
){
$target = "";
}else{
$target = "/" . urlencode($_GET["target"]);
}
$instances = "";
foreach(config::INSTANCES as $instance){
$instances .= '<tr><td class="expand"><a href="' . htmlspecialchars($instance) . $target . $params . '" target="_BLANK" rel="noreferer">' . htmlspecialchars($instance) . '</a></td></tr>';
}
echo
$frontend->load(
"instances.html",
[
"instances_html" => $instances
]
);

197
lib/backend.php Normal file
View File

@ -0,0 +1,197 @@
<?php
class backend{
public function __construct($scraper){
$this->scraper = $scraper;
$this->requestid = apcu_inc("real_requests");
}
/*
Proxy stuff
*/
public function get_ip(){
$pool = constant("config::PROXY_" . strtoupper($this->scraper));
if($pool === false){
// we don't want a proxy, fuck off!
return 'raw_ip::::';
}
// indent
$proxy_index_raw = apcu_inc("p." . $this->scraper);
$proxylist = file_get_contents("data/proxies/" . $pool . ".txt");
$proxylist = explode("\n", $proxylist);
// ignore empty or commented lines
$proxylist = array_filter($proxylist, function($entry){
$entry = ltrim($entry);
return strlen($entry) > 0 && substr($entry, 0, 1) != "#";
});
$proxylist = array_values($proxylist);
return $proxylist[$proxy_index_raw % count($proxylist)];
}
// this function is also called directly on nextpage
public function assign_proxy(&$curlproc, $ip){
// parse proxy line
[
$type,
$address,
$port,
$username,
$password
] = explode(":", $ip, 5);
switch($type){
case "raw_ip":
return;
break;
case "http":
case "https":
curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
curl_setopt($curlproc, CURLOPT_PROXY, $type . "://" . $address . ":" . $port);
break;
case "socks4":
curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS4);
curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port);
break;
case "socks5":
curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port);
break;
case "socks4a":
curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS4A);
curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port);
break;
case "socks5_hostname":
curl_setopt($curlproc, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME);
curl_setopt($curlproc, CURLOPT_PROXY, $address . ":" . $port);
break;
}
if($username != ""){
curl_setopt($curlproc, CURLOPT_PROXYUSERPWD, $username . ":" . $password);
}
}
/*
Next page stuff
*/
public function store($payload, $page, $proxy){
$page = $page[0];
$password = random_bytes(256); // 2048 bit
$salt = random_bytes(16);
$key = hash_pbkdf2("sha512", $password, $salt, 20000, 32, true);
$iv =
random_bytes(
openssl_cipher_iv_length("aes-256-gcm")
);
$tag = "";
$out = openssl_encrypt($payload, "aes-256-gcm", $key, OPENSSL_RAW_DATA, $iv, $tag, "", 16);
$key = apcu_inc("key", 1);
apcu_store(
$page . "." .
$this->scraper .
$this->requestid,
gzdeflate($proxy . "," . $salt.$iv.$out.$tag),
900 // cache information for 15 minutes blaze it
);
return
$this->scraper . $this->requestid . "." .
rtrim(strtr(base64_encode($password), '+/', '-_'), '=');
}
public function get($npt, $page){
$page = $page[0];
$explode = explode(".", $npt, 2);
if(count($explode) !== 2){
throw new Exception("Malformed nextPageToken!");
}
$apcu = $page . "." . $explode[0];
$key = $explode[1];
$payload = apcu_fetch($apcu);
if($payload === false){
throw new Exception("The nextPageToken is invalid or has expired!");
}
$key =
base64_decode(
str_pad(
strtr($key, '-_', '+/'),
strlen($key) % 4,
'=',
STR_PAD_RIGHT
)
);
$payload = gzinflate($payload);
// get proxy
[
$proxy,
$payload
] = explode(",", $payload, 2);
$key =
hash_pbkdf2(
"sha512",
$key,
substr($payload, 0, 16), // salt
20000,
32,
true
);
$ivlen = openssl_cipher_iv_length("aes-256-gcm");
$payload =
openssl_decrypt(
substr(
$payload,
16 + $ivlen,
-16
),
"aes-256-gcm",
$key,
OPENSSL_RAW_DATA,
substr($payload, 16, $ivlen),
substr($payload, -16)
);
if($payload === false){
throw new Exception("The nextPageToken is invalid or has expired!");
}
// remove the key after using
apcu_delete($apcu);
return [$payload, $proxy];
}
}

View File

@ -4,6 +4,19 @@ class captcha{
public function __construct($frontend, $get, $filters, $page, $output){
// check if we want captcha
if(config::BOT_PROTECTION !== 1){
if($output === true){
$frontend->loadheader(
$get,
$filters,
$page
);
}
return;
}
/*
Validate cookie, if it exists
*/
@ -46,6 +59,7 @@ class captcha{
if($output === false){
http_response_code(429); // too many reqs
echo json_encode([
"status" => "The \"pass\" token in your cookies is missing or has expired!!"
]);
@ -184,15 +198,6 @@ class captcha{
}
}
/*
Generate random grid data to pass to captcha.php
*/
$dataset = [
["birds", 2263],
["fumo_plushies", 1006],
["minecraft", 848]
];
// get the positions for the answers
// will return between 3 and 6 answer positions
$range = range(0, 15);
@ -216,17 +221,18 @@ class captcha{
}
// choose a dataset
$choosen = &$dataset[random_int(0, count($dataset) - 1)];
$c = count(config::CAPTCHA_DATASET);
$choosen = config::CAPTCHA_DATASET[random_int(0, $c - 1)];
$choices = [];
for($i=0; $i<count($dataset); $i++){
for($i=0; $i<$c; $i++){
if($dataset[$i][0] == $choosen[0]){
if(config::CAPTCHA_DATASET[$i][0] == $choosen[0]){
continue;
}
$choices[] = $dataset[$i];
$choices[] = config::CAPTCHA_DATASET[$i];
}
// generate grid data

View File

@ -152,7 +152,7 @@ class proxy{
$curl,
CURLOPT_HTTPHEADER,
[
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0",
"User-Agent: " . config::USER_AGENT,
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate",
@ -180,7 +180,7 @@ class proxy{
$curl,
CURLOPT_HTTPHEADER,
[
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0",
"User-Agent: " . config::USER_AGENT,
"Accept: image/avif,image/webp,*/*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate",
@ -379,7 +379,7 @@ class proxy{
$curl,
CURLOPT_HTTPHEADER,
[
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0",
"User-Agent: " . config::USER_AGENT,
"Accept: image/avif,image/webp,*/*",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br",
@ -395,7 +395,7 @@ class proxy{
$curl,
CURLOPT_HTTPHEADER,
[
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0",
"User-Agent: " . config::USER_AGENT,
"Accept: audio/webm,audio/ogg,audio/wav,audio/*;q=0.9,application/ogg;q=0.7,video/*;q=0.6,*/*;q=0.5",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br",

View File

@ -4,6 +4,41 @@ class frontend{
public function load($template, $replacements = []){
$replacements["server_name"] = htmlspecialchars(config::SERVER_NAME);
$replacements["version"] = config::VERSION;
if(isset($_COOKIE["theme"])){
$theme = str_replace(["/". "."], "", $_COOKIE["theme"]);
if(
$theme != "Dark" &&
!is_file("static/themes/" . $theme . ".css")
){
$theme = config::DEFAULT_THEME;
}
}else{
$theme = config::DEFAULT_THEME;
}
if($theme != "Dark"){
$replacements["style"] = '<link rel="stylesheet" href="/static/themes/' . $theme . '.css?v' . config::VERSION . '">';
}else{
$replacements["style"] = "";
}
if(isset($_COOKIE["scraper_ac"])){
$replacements["ac"] = '?ac=' . htmlspecialchars($_COOKIE["scraper_ac"]);
}else{
$replacements["ac"] = '';
}
$handle = fopen("template/{$template}", "r");
$data = fread($handle, filesize("template/{$template}"));
fclose($handle);
@ -29,30 +64,6 @@ class frontend{
return trim($html);
}
public function getthemeclass($raw = true){
if(
isset($_COOKIE["theme"]) &&
$_COOKIE["theme"] == "cream"
){
$body_class = "theme-white ";
}else{
$body_class = "";
}
if(
$raw &&
$body_class != ""
){
return ' class="' . rtrim($body_class) . '"';
}
return $body_class;
}
public function loadheader(array $get, array $filters, string $page){
echo
@ -62,8 +73,7 @@ class frontend{
"index" => "no",
"search" => htmlspecialchars($get["s"]),
"tabs" => $this->generatehtmltabs($page, $get["s"]),
"filters" => $this->generatehtmlfilters($filters, $get),
"body_class" => $this->getthemeclass()
"filters" => $this->generatehtmlfilters($filters, $get)
]);
if(
@ -74,18 +84,17 @@ class frontend{
){
// bot detected !!
echo
$this->drawerror(
"Tshh, blocked!",
'You were blocked from viewing this page. If you wish to scrape data from 4get, please consider running <a href="https://git.lolcat.ca/lolcat/4get" rel="noreferrer nofollow">your own 4get instance</a> or using <a href="/api.txt">the API</a>.',
);
$this->drawerror(
"Tshh, blocked!",
'You were blocked from viewing this page. If you wish to scrape data from 4get, please consider running <a href="https://git.lolcat.ca/lolcat/4get" rel="noreferrer nofollow">your own 4get instance</a> or using <a href="/api.txt">the API</a>.',
);
die();
}