Building your own Google

email

I’ve recently (two days ago) started experimenting with building my own search engine.  The inspiration came from my experimenting with Google’s Co-op Custom Search Engine contraption.

The software I’m using to build my very own Google is Nutch which is an open source web-search software. 

Nutch builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats.

The tutorial I used to build my search engine (and its a great tutorial) can be found here. It basically guides you through the whole process of building the search engine including setting up the environment for Nutch (basically Java and Tomcat on an Apache server), installing Nutch, crawling and indexing your first site, searching your site and even using Regex URL Filters to tell the search engine to select which URLs on a page to crawl and which to will ignore.

I’ve yet to be successful at crawling my site (or any site for that matter) but will keep you posted on further developments.

In the meantime and for more information about Nutch, please see the Nutch wiki. 

Share on Tumblr


Head of SEO and Inbound Marketing at UK's largest media planning and buying agency. Omar has over 10 years experience in digital marketing with the last 6 being in large media agency environments, developing and implementing cutting edge digital campaigns for some of the world's best known brands. For the latest in digital marketing and industry news and updates, follow Omar's Twitter stream (@OmarKattan) or add him to one of your Google+ Circles: . The content of this article represents the personal views of the author and does not constitute professional advice.

Share This Post

Recent Articles

© 2012 Omar Kattan. All rights reserved. Site Admin · Entries RSS · Comments RSS
Powered by WordPress · Designed by Theme Junkie
  • RSS
  • Twitter
  • Google Plus
  • Tumblr
  • Facebook
  • LinkedIn
  • FriendFeed
  • Digg
  • Flickr
  • YouTube
  • Delicious