Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
Home β€Ί General
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Looking for website that are no longer indexed in google (or other big tech search engines)

ScreenReaderScreenReader Member
edited June 10 in General

Hello LET!

as you can see, recently google is pushing harder on ai-summary in their search result resulting in site owner / blogger complaining about how they get less and less traffic / visitor. you also probably ever heard about how a site suddenly getting "de-indexed" from google as if they're gone entirely.

aside from those two mentioned above, i personally just couldn't stomach ai summary in general. I'm looking for information on sites, so just give me the url and i'll make conclusion on my own, if i need an ayy eee summary i'd open your chat webapp or something, not in my search engine.

after brief searching for alternatives, i seen people mentioned about Kagi but i don't really like their business model. well, it's more i don't like how I have to trust a corporation for my search indexes.
so the next alternative i found is YaCY, this software looks mature for it's purpose and has good amount of data indexed.

i tried to deploy it using docker compose, easy enough and get started to crawl my sites that i usually use.

CONTAINER ID  NAME                CPU %   MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
82452305aff9  yacy_search_server  63.65%  1.793GiB / 30.96GiB   5.79%     123MB / 10.5MB    1.47GB / 413MB    168

img

this Junior status is intent, i haven't got the time to setup a port-forward using pangolin yet. overall layout also nice.


img

img

img

I'm content with the search result so far and it works for my day-to-day basis.

so here i am asking you if you know there are sites / old sites that definitely worth to index as it might be useful or interesting for someone, especially if they got kicked out from big tech search engines. let me know in the comment or dm so i can crawl them into yacy.

please note that this is not an archival effort. the website itself has to stay online, only the search index in different places

---
edit: fixing codeblock

Comments

  • rpqurpqu Member
    • πŸ₯πŸšœπŸŒΎ
    • β˜€οΈπŸŒͺ️
    • β™ΎοΈπŸŽšοΈ
    Thanked by 1Saragoldfarb
  • edited June 10

    Cool project! I might actually give it a try too.

    The only (for now minor) concern i have is how they would deal with peers intentionally poisoning the search index. Probably hardly an impossible to solve problem (randomly checking entries for validity is - usually - pretty simple after all) but at least tricky one as it would likely need some kind of network wide consensus in regards to rouge nodes.

    Thanked by 1tentor
  • I use SearXNG, it supports multiple search engines combined into 1 result. This is how
    you know you get a nice blend and not "biased" answers from one company.
    But I rarely use search engines in the way I did 2+ years ago. Most of the time it's quicker
    to just ask AI, when it comes to statistical facts or curiosity things. Before if you wanted
    to know "How many colors can a fish see" you had to read multiple articles. Now you have
    a somewhat accurate answer in 2 minutes, maybe not scientific, but good enough.

  • @totally_not_banned said:
    Cool project! I might actually give it a try too.

    The only (for now minor) concern i have is how they would deal with peers intentionally poisoning the search index. Probably hardly an impossible to solve problem (randomly checking entries for validity is - usually - pretty simple after all) but at least tricky one as it would likely need some kind of network wide consensus in regards to rouge nodes.

    valid concern, trying to search on their forum it seems there's no mention about this yet

    @luckypenguin said:
    I use SearXNG, it supports multiple search engines combined into 1 result. This is how
    you know you get a nice blend and not "biased" answers from one company.
    But I rarely use search engines in the way I did 2+ years ago. Most of the time it's quicker
    to just ask AI, when it comes to statistical facts or curiosity things. Before if you wanted
    to know "How many colors can a fish see" you had to read multiple articles. Now you have
    a somewhat accurate answer in 2 minutes, maybe not scientific, but good enough.

    nice. seems SearXNG is self-hostable, i'll check that out

  • dbadudedbadude Member

    de-googling is the new de-microsoft-ing. All big companies at some point turn evil.

  • luckypenguinluckypenguin Member
    edited June 10

    @dbadude said: de-googling is the new de-microsoft-ing. All big companies at some point turn evil.

    Only this is directly opposite. Site owner wants to be on google's search results, but they delete you or don't index you later. De-googling by google. But I didn't really encounter that many sites like that, except piracy sites.

    Thanked by 1tentor
  • @dbadude said:
    de-googling is the new de-microsoft-ing.

    It's been going on for a while. I kind of wonder where it'll lead though. Big tech already has a pretty strong grip almost everywhere and it's only tightening. There won't be any polite stepping back. That much is for sure.

    Thanked by 1buggedout
  • dbadudedbadude Member

    @totally_not_banned said:

    @dbadude said:
    de-googling is the new de-microsoft-ing.

    It's been going on for a while. I kind of wonder where it'll lead though. Big tech already has a pretty strong grip almost everywhere and it's only tightening. There won't be any polite stepping back. That much is for sure.

    IBM thought also to own the market, look where they are now...

  • @dbadude said: IBM thought also to own the market, look where they are now...

    Revenue Increase US$67.54 billion (2025)
    Operating income Increase US$10.99 billion (2025)

    I wish to be someday where they are now.

  • @dbadude said:

    @totally_not_banned said:

    @dbadude said:
    de-googling is the new de-microsoft-ing.

    It's been going on for a while. I kind of wonder where it'll lead though. Big tech already has a pretty strong grip almost everywhere and it's only tightening. There won't be any polite stepping back. That much is for sure.

    IBM thought also to own the market, look where they are now...

    Well yeah but then the modern day giants seem to have taken a look too learning that making it possible for third parties to build competing products is not the correct way to world domination.

  • dbadudedbadude Member

    @luckypenguin said:

    @dbadude said: IBM thought also to own the market, look where they are now...

    Revenue Increase US$67.54 billion (2025)
    Operating income Increase US$10.99 billion (2025)

    I wish to be someday where they are now.

    already 30 years about the same revenue
    https://companiesmarketcap.com/ibm/revenue/

  • davidedavide Member

    YaCy ranks search results in random order by design, or rather by lack of design. I browsed its source code 10 years ago and that was and had been the case for years since it was semi-abandoned by Orbiter after he got his last grant.

    To your question:

    objectively, the most important website that is out of Google and should become indexed is obviously vpspricetracker.com.

    Thanked by 1xdb
  • davidedavide Member

    Gigablast is a decent search engine with crawler that works well and produces good SERP; I have a local copy that I ported to x86; it uses all of the bandwidth you give it and the disk space grows rapidly into the terabytes after a few weeks of crawling, so it's expensive to run. The guy who wrote it was running it on a dozen racks of servers in a dedicated warehouse.

  • tentortentor Member, Host Rep

    @davide said:
    I have a local copy that I ported to x86

    Mind publishing it?

  • @davide said:
    YaCy ranks search results in random order by design, or rather by lack of design. I browsed its source code 10 years ago and that was and had been the case for years since it was semi-abandoned by Orbiter after he got his last grant.

    apparently i had to setup the solr index ranking to change this random ordering in defaults

    To your question:

    objectively, the most important website that is out of Google and should become indexed is obviously vpspricetracker.com.

    added to the crawler queue

    @davide said:
    Gigablast is a decent search engine with crawler that works well and produces good SERP; I have a local copy that I ported to x86; it uses all of the bandwidth you give it and the disk space grows rapidly into the terabytes after a few weeks of crawling, so it's expensive to run. The guy who wrote it was running it on a dozen racks of servers in a dedicated warehouse.

    trying out https://gigablast.org/ seems impressive, it seems one of the fork has docker build so i'll make sure to try it out sometimes

Sign In or Register to comment.