Friday, February 29, 2008

ObjectDock Review

I found a new program the other day and so far it's earned my seal of approval. It's called ObjectDock, by Statdock. You know that bar of icons on the bottom screen of every Mac? Well, this is the PC version. I found it for free at Stardock.com where they offer all sorts of free (and paid for) packages. I've tried some of their other programs, but this one seems to actually be helpful. One of the features I like is the ability to add docklets, or buttons that do a little more than just act as a shortcut. I went to wincustomize.com and found all sorts to choose from. One docklet displays icons from my system tray so I can completely hide my task bar and gain that space back. Another one I like I found at aqua-soft.org which will put folder contents into a stack for easy viewing. There are many more, like a battery monitor, to choose from.

I really like this program because it gives me back my desktop. I don't have to share it with task bar, or deal with the task bar popping out any time something happens. You know what? This is enough reading. Go ahead and download it and try it for a week. Trust me, I think you'll like it and if you have any questions feel free to send them my way.

Thursday, February 28, 2008

Web Crawler Research

I've recently been doing a lot of research on making a web crawler and it's fairly interesting. Basically, a web crawler is a program that makes a map of the web. You start out with a url, say JamesFurlo.com, and then this application scans the page for all of the links. It creates a list of those links for later processing. Then, the application reads the current page. It can look at everything (pictures, layout, meta tags, etc.) or it could just look at a couple things, like just the text. It then takes that information and stores it - probably on a server.

Once it's done everything you want it to do on that page it goes to the first link on the list recently created, say JamesFurlo.com/eFlash, and starts all over. First, creating a list of new links (pages) and then taking an inventory (also called indexing) of the items on the page.

It's a fairly simple process, but can get kind of hairy fast. I mean, just think of all the links on Yahoo! or Digg. The list of pages to visit can get long very fast. As a matter of fact, most web crawlers are estimated to only cover up to 16% of the web at one time. The problem is that the application simply can't run fast enough to view everything out there because pages are being added and changed way too fast.

Isn't that crazy! I've found that to build a web crawler is pretty straight forward, but to make one that works efficiently is more of a challenge. The big challenge I'm facing right now is deciding what language to use. You see, web crawlers are so flexible they can be written in PHP, Perl, Python, Java or even C++. What's the best choice? Good question.

I guess the trick is to pick a language I'm kind of comfortable with and start there. I should probably just try and make one with the understanding that it won't be optimized, but at least I've got something to improve on.

We'll see if it works out. My personal goal is to have something working by the end of next week. Wish me luck.