Nutch Search Engine Finally Working!

If you recall a few weeks back I posted about building your own google, well I finally did it, my very own google search engine is finally up and running.

It was not an easy job at all, and very very frustrating at times but very rewarding indeed!

There are still some issues that need to be ironed out, for instance the cache links among a few others give an error message when you click on them, but all in all these are minor issues compared to the hurdels I jumped over to get this engine going.

I managed to spider the cnn.com website (only a few pages as an experiment) and feed the resuts of the crawl into my search engine.  Try searching for weather on CNN using my search engine and check out the results.

my nutch engineI will be experimenting further with nutch including deeper and multiple crawls as well as fixing the odd bug or two that currently exist.

I will also start reading up on the nutch technology to better understand it and to get a better feel for its potential.

Hopefully in the next few months I will begin creating new websites that will cater to vertical search and see where that takes me.

I’ll keep you all posted.  In the meatime if you have any ideas or questions please feel free to post a comment or two.

email
If you enjoyed this post, make sure you subscribe to my RSS feed!


Omar Kattan is Chief Strategy Officer at Sandstorm Digital, the MENA region's first specialist content marketing agency headquartered in Dubai. His experience includes 10 years in traditional marketing and advertising in the Middle East and a further 10 years at two of the largest media agencies in the UK. Follow Omar on Twitter for updates on the latest in digital, branding, advertising and marketing.

Share This Post

Related Articles

  • http://omarkattan.com omar

    an update on this…

    I had to take the engine down as it was using up too many resources and crashing my server!

Powered by WordPress · Designed by Theme Junkie