Can YaCy be the new Google?

When Google was a mere babe among search giants like Yahoo, AltaVista, Lycos and WebCrawler, geeks across the world promoted its use thanks to its cleaner interface and the fact that it focussed only on the task at hand. In many ways, Google owes a lot of its success to IT professionals who quickly introduced it to the less IT-savvy as a less cluttered interface to finding the things they were looking for.

Times have changed. Google’s rise to world dominance in the search market, its advertising-oriented business model and its history of collecting user data are slowly pushing it from geeky favour. Of course, in fourteen years of building up a vast collection of search results and related data, geeks are hard-pressed to find an alternative. Enter YaCy.

YaCy is a fully decentralized P2P search engine. While it is possible to search the YaCy network without running the software*, to properly experience YaCy at work you need to install YaCy locally. That might seem a little off-putting at first, but its no different to running other P2P software like Skype. Running locally, YaCy can be used in a variety of ways.

First, it can be installed within an intranet to provide a local search engine for all of your internal pages. In general though you will probably run YaCy as a node within its global network. Running as a node, YaCy is able to share results with other peers. That means that instead of storing billions of links and search metadata within a huge server farm, the YaCy software simply uses all of the computers on the network to store its search results. The more nodes within the network, the better the results for a query.

Since YaCy has only just been released, the number of nodes on its network is still fairly small. Currently within my copy of the software, I am seeing around 1 200 unique nodes and it’s apparently indexed around 800-million pages. Obviously, while YaCy remains in the territory of IT oriented users, all those pages that are getting indexed are more heavily weighted toward IT related topics. As YaCy gains more mainstream adoption, the results will tend to cover wider areas of interest.

So, you’re probably wondering about this indexing. What is it going to do to your network, your computer, your disk space and all of that? Okay, technically disk space should not be a major concern. The whole idea behind the project is that nodes share links, which means that to have access to a huge number of search results, you don’t need to store them all locally. In fact, YaCy’s administration panel allows you to define many of these sorts of parameters to start with. So you can limit the maximum and minimum amount of RAM dedicated to its running processes, and you can define the maximum number and size of pages that you decide to crawl yourself. Disk space is another story though.

Currently YaCy doesn’t provide options to limit disk space usage, although there is a discussion about implementing this on one of the YaCy forums. On the other hand, YaCy’s FAQ suggests that for around 10-million web pages indexed you would need around 20GB of disk space. However, the fact that you can’t limit this does mean that many users will be unhappy about installing the search engine on their average workstation.

There are a number of approaches that techies are suggesting in order to limit resource usage from outside of YaCy itself. The most obvious is to run YaCy within a virtual machine. I like the approach, and have decided to do something similar for my own installation, but this is putting YaCy adoption right outside of the mainstream. That said, the Admin pages for your local YaCy install are well beyond the comprehension of your average web-surfer, so its unlikely that YaCy is going to gain any foothold with these sorts of users in the near future.

YaCy is also not your fastest search engine. Results don’t seem to pop up almost instantaneously. That will obviously improve as the network grows, but its quite a drawback right now. The fact that you are also giving up precious bandwidth in order to share information with other peers will also not appeal to most people. While outgoing traffic is actually pretty small, and the majority of your incoming traffic will depend on how much you decide to crawl the web, both activities seem to be usage of bandwidth that you could do without. That makes YaCy slightly less ‘free’ than Google, since bandwidth is something you pay for.

So, by now you are probably thinking YaCy is something to steer well-clear of. Despite all of the negatives, I actually think that YaCy has a lot of potential, and could ultimately be the way of search in the future. It offers much better privacy and security than any of the search engines that are currenly in mainstream use. It has fantastic scalability. It offers the ability to integrate localized intranet search with more global searches. And it is open-source.

There are some obvious things that need to be sorted out. For one, I think that there needs to be some kind of reward for peers that store larger indexes and stay online for longer periods. Another big point would be to provide a highly simplified interface to the Administration side of the software. But most importantly it needs better controls for resource usage. Let’s hope that the devs make some improvements fast!

*Since YaCy’s press release, its demo search portal has been hard hit, and last I checked it was down. That’s understandable, since YaCy is not really designed to be used like this.



Sign up to our newsletter to get the latest in digital insights. sign up

Welcome to Memeburn

Sign up to our newsletter to get the latest in digital insights.