retze.blogg.se - Tor search engine yacy

TOR SEARCH ENGINE YACY HOW TO
TOR SEARCH ENGINE YACY INSTALL

The reason for this is that my earlier experimentation with yaCy had been run on the same Pi, and Solr performed alright there, so I figured being able to compare the two might be helpful. The implementation was built on a NFS booted Raspberry Pi. So, the plan was to reuse this code to hopefully throw together a proof of concept quite quickly - this is a choice I came to regret a little bit. I first toyed with the idea of replacing Sphider a couple of years back, and had actually already created a spider in Python which could pull pages, index them and extract outgoing links (as well as things like markup). The title of each section is a clicky link back to itself. The intention of this post isn't to detail the process I followed, but really to document some of the issues I hit that don't seem (to me) to be too well served by the main body of existing tutorials on the net.

Enter ElasticSearch, and enter the aforementioned Internet tutorials. The time came to replace it, and experiments with off-the-shelf things like yaCy didn't go as well as hoped, so I hit the point where I considered self-implementing. It's code quality is somewhat questionable, and it's not been updated in years, but it sat there and it worked. Over the years, I've built up a lot of internal notes, JIRA tickets etc, so for years I ran a self-hosted internal search engine based upon Sphider. It's not that it can't be saved (it definitely can), so much as that most tutorials seem not to lend any thought to improving the quality of search results - it returns some results and that's good enough.

Create a simple web interface to submit searches to ElasticsearchĪt the end of it you get a working search engine. The problem is, that search engine is crap.

Create a spider/crawler or otherwise insert your content into Elasticsearch.

TOR SEARCH ENGINE YACY INSTALL

Most of the tutorials I read describe a fairly simple process, install some software, write a little bit of code to insert and extract data.

TOR SEARCH ENGINE YACY HOW TO

There are a ton of articles on the internet describing how to go about building a self-hosted fulltext search engine using ElasticSearch.