A look at search engines with their own indexes

bazmatazable@reddthat.com · 4 months ago

Thank you for the post, I do like reading what experts have to say about our digital privacy. I don’t like that many of these articles/discussions focus on specific choices that a user can make to gain more privacy. Please can we stop pretending that there is any alternative to WhatsApp. The network effect is why we use their platform not for any other reason. Its like advising someone to speak Fuzhou instead of Mandarin when in China, its not that its wrong to do so just that it is poor advice, or at the very least assumes that your priority is to speak Fuzhou over actually communicating with other people. The author says as much themselves: “Collective problems need collective solutions.” This is great! But shortly after we read: “Instead of using WhatsApp, use Signal.” groan + face-palm. I want to be positive and reiterate that I am happy that this is being debated at all.

bazmatazable@reddthat.com · 4 months ago

I’m trying to do a 3-2-1 but instead I’m doing a 4-3-0. Original is on SSD with scheduled backups to two separate HDs so that I have 3 copies on two different media (if SSD + HD counts as distinct enough) so then I added in BDR as an infrequent 4th manual copy for my most irreplaceable data (and I’m very strict with what counts as irreplaceable so that the total is just over 100GB at this point). Eventually I need to get a copy of the disks off site but for now they are in the basement.

I have no illusions about how long the BDRs will last. (Seems like it is anywhere between 100 days and 100 years).My aim is to just have another copy that is distinct from magnetic or flash storage. My plan is to burn new updated copies so that any data on an old disk will get burned to a newer disk at some point. Maybe in ten years I’ll abandon this approach but for now it makes me feel better.

bazmatazable@reddthat.com · 6 months ago

Been keeping my eye on these guys hoping they can turn the tide: Taler

bazmatazable@reddthat.com · 6 months ago

A look at search engines with their own indexes

bazmatazable@reddthat.com · 6 months ago

A look at search engines with their own indexes

bazmatazable@reddthat.com · 6 months ago

Unfortunately this is mostly true…

bazmatazable@reddthat.com · 6 months ago

I had a similar idea: Could search engines be broken up and distributed instead of being just a couple of monoliths?

Reading the HN thread, the short answer is: NO.

Still, its fun to imagine what it might look like if only…

I think the OP is looking for an answer to the problem of Google having a monopoly that gives them the power to make it impossible to be challenged. The cost to replicate their search service is just so astronomical that its basically impossible to replace them. Would the OP be satisfied if we could make cheaper components that all fit together to make a competing but decentralized search service? Breaking down the technical problems is just the first step, the basic concepts for me are:

Crawling -> Indexing -> Storing/host index -> Ranking

All of them are expensive because the internet is massive! If each of these were isolated but still interoperable then we get some interesting possibilities: Basically you could have many smaller specialized companies that can focus on better ranking algorithms for example.

What if crawling was done by the owners of each website and then submitted to an index database of their choice? This flips the model around so things like robots.txt might become less relevant. Bad actors and spam however now don’t need any SEO tricks to flood a database or mislead as to their actual content, they can just submit whatever they like!. These concerns feed into the next step:
What if there were standard indexing functions similar to how you have many standard hash functions. How a site is indexed plays an important role in how ranking will work (or not) later. You could have a handful of popular general purpose index algorithms that most sites would produce and then submit (e.g. keywords, images, podcasts, etc.) combined with many more domain specific indexing algorithms (e.g. product listings, travel data, mapping, research). Also if the functions were open standards then it would be possible for a browser to run the index function on the current page and compare the result to the submitted index listing. It could warn users that the page they are viewing is probably either spam or misconfigured in some way to make the index not match what was submitted.
What if the stored indexes were hosted in a distributed way similar to DNS? Sharing the database would lower individual costs. Companies with bigger budgets could replicate the database to provide their users with a faster service. Companies with fewer resources would be able to use the publicly available indexes yet still be competitive.
Enabling more competition between different ranking methods will hopefully reduce the effectiveness of SEO gaming (or maybe make it worse as the same content is repackaged for each and every index/rank combination). Ranking could happen locally (although this would probably not be efficient at all but that fact that it might even be possible at all is quite a novel thought)

Sigh enough daydreaming already…

bazmatazable@reddthat.com · 9 months ago

I selfhost my own email and you are absolutely correct it is musch easier to receive than to send. I use a 3rd party to send all my outgoing mail on my behalf.

bazmatazable@reddthat.com · 10 months ago

This is my experience too. The sites hosting the articles that I want to read only provide the first parapraph and then a link back to the webpage. News is just headlines. I love that RSS doesn’t allow much formating so you end up with an experience focused on the content itself (and no ads). It feels like a long time ago since I really enjoyed my RSS feeds.

bazmatazable@reddthat.com · 10 months ago

No matter if it is greed, competitiveness, narcissism, another personality trait or some combination of them the point was that we as a society should not consider becoming a billionaire as model behavior. By all means be the best sports player or musician or top surgeon and make as much money as you are legally allowed. Most tech billionaires are just not that impressive to justify their current net worth.

bazmatazable@reddthat.com · 10 months ago

Great reply, thank you. OP points out that the situation appears hopeless and I often leave feeling that capitalism has truly captured all the regulators and is now free to grind all value out of society. Assume we get a decent amount of the population on the same page what is the next step? Is there no room for reforms? I have a feeling that only when public discussion consistently prioritizes human well-being above all else can any progress be even attempted.

bazmatazable@reddthat.com · 1 year ago

+1 for Kagi