All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Looking for website that are no longer indexed in google (or other big tech search engines)
ScreenReader
Member
Hello LET!
as you can see, recently google is pushing harder on ai-summary in their search result resulting in site owner / blogger complaining about how they get less and less traffic / visitor. you also probably ever heard about how a site suddenly getting "de-indexed" from google as if they're gone entirely.
aside from those two mentioned above, i personally just couldn't stomach ai summary in general. I'm looking for information on sites, so just give me the url and i'll make conclusion on my own, if i need an ayy eee summary i'd open your chat webapp or something, not in my search engine.
after brief searching for alternatives, i seen people mentioned about Kagi but i don't really like their business model. well, it's more i don't like how I have to trust a corporation for my search indexes.
so the next alternative i found is YaCY, this software looks mature for it's purpose and has good amount of data indexed.
i tried to deploy it using docker compose, easy enough and get started to crawl my sites that i usually use.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
82452305aff9 yacy_search_server 63.65% 1.793GiB / 30.96GiB 5.79% 123MB / 10.5MB 1.47GB / 413MB 168

this Junior status is intent, i haven't got the time to setup a port-forward using pangolin yet. overall layout also nice.


I'm content with the search result so far and it works for my day-to-day basis.
so here i am asking you if you know there are sites / old sites that definitely worth to index as it might be useful or interesting for someone, especially if they got kicked out from big tech search engines. let me know in the comment or dm so i can crawl them into yacy.
please note that this is not an archival effort. the website itself has to stay online, only the search index in different places
---
edit: fixing codeblock

Comments
Cool project! I might actually give it a try too.
The only (for now minor) concern i have is how they would deal with peers intentionally poisoning the search index. Probably hardly an impossible to solve problem (randomly checking entries for validity is - usually - pretty simple after all) but at least tricky one as it would likely need some kind of network wide consensus in regards to rouge nodes.
I use SearXNG, it supports multiple search engines combined into 1 result. This is how
you know you get a nice blend and not "biased" answers from one company.
But I rarely use search engines in the way I did 2+ years ago. Most of the time it's quicker
to just ask AI, when it comes to statistical facts or curiosity things. Before if you wanted
to know "How many colors can a fish see" you had to read multiple articles. Now you have
a somewhat accurate answer in 2 minutes, maybe not scientific, but good enough.
valid concern, trying to search on their forum it seems there's no mention about this yet
nice. seems SearXNG is self-hostable, i'll check that out
de-googling is the new de-microsoft-ing. All big companies at some point turn evil.
Only this is directly opposite. Site owner wants to be on google's search results, but they delete you or don't index you later. De-googling by google. But I didn't really encounter that many sites like that, except piracy sites.
It's been going on for a while. I kind of wonder where it'll lead though. Big tech already has a pretty strong grip almost everywhere and it's only tightening. There won't be any polite stepping back. That much is for sure.
IBM thought also to own the market, look where they are now...
Revenue Increase US$67.54 billion (2025)
Operating income Increase US$10.99 billion (2025)
I wish to be someday where they are now.
Well yeah but then the modern day giants seem to have taken a look too learning that making it possible for third parties to build competing products is not the correct way to world domination.
already 30 years about the same revenue
https://companiesmarketcap.com/ibm/revenue/
YaCy ranks search results in random order by design, or rather by lack of design. I browsed its source code 10 years ago and that was and had been the case for years since it was semi-abandoned by Orbiter after he got his last grant.
To your question:
objectively, the most important website that is out of Google and should become indexed is obviously vpspricetracker.com.
Gigablast is a decent search engine with crawler that works well and produces good SERP; I have a local copy that I ported to x86; it uses all of the bandwidth you give it and the disk space grows rapidly into the terabytes after a few weeks of crawling, so it's expensive to run. The guy who wrote it was running it on a dozen racks of servers in a dedicated warehouse.
Mind publishing it?
apparently i had to setup the solr index ranking to change this random ordering in defaults
added to the crawler queue
trying out https://gigablast.org/ seems impressive, it seems one of the fork has docker build so i'll make sure to try it out sometimes