Building your own Google

I’ve recently (two days ago) started experimenting with building my own search engine.  The inspiration came from my experimenting with Google’s Co-op Custom Search Engine contraption.

The software I’m using to build my very own Google is Nutch which is an open source web-search software. 

Nutch builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats.

The tutorial I used to build my search engine (and its a great tutorial) can be found here. It basically guides you through the whole process of building the search engine including setting up the environment for Nutch (basically Java and Tomcat on an Apache server), installing Nutch, crawling and indexing your first site, searching your site and even using Regex URL Filters to tell the search engine to select which URLs on a page to crawl and which to will ignore.

I’ve yet to be successful at crawling my site (or any site for that matter) but will keep you posted on further developments.

In the meantime and for more information about Nutch, please see the Nutch wiki. 

Share on Tumblr


New Age AdMan & proud father of 2 incredible kids. Omar heads-up SEO and Inbound Marketing at UK's largest media agency.

Omar has over 20 years experience in advertising the last 10 years being in digital marketing within large media agency environments, developing and implementing cutting edge digital campaigns for some of the world's best known brands. For the latest in digital marketing and industry news and updates, follow Omar's Twitter stream (@OmarKattan) or add him to one of your Google+ Circles: .

Share This Post

Recent Articles

Powered by WordPress · Designed by Theme Junkie
email
  • RSS
  • Twitter
  • Google Plus
  • Tumblr
  • Facebook
  • LinkedIn
  • FriendFeed
  • Digg
  • Flickr
  • YouTube
  • Delicious