In the beginning information was stored in the collective memories of people, passed down through the ages. Then came the “written” word, albeit in symbol form on cave wall, on papyrus, or etched into stone tablets. In the mid 15th century came the Gutenberg press and the printed book. To find some information we first went to search in an encyclopedia, and if we needed more in-depth information, searched a specialized book on the topic. Then in the early 90s came the world-wide-web, or net as it is commonly alluded to, and a new way to distribute knowledge. For in reality there was always a wealth of knowledge out there in the aether, but the means of distributing this knowledge had been missing. Recipes from a particular region were often collected in a book of “local fare”, but its influence rarely went beyond the region. Now we find ourselves with a wealth of information, available in the form of entries on wikipedia, information repositories (e.g. flickr), personal websites, and blogs. But is there too much information? Searching a book for content usually involves looking through the index for keywords. How well we are able to find the information has a direct correlation with how well the book has been indexed. Poorly indexed books will result in the reader spending more time searching the book using brute force, i.e. leafing through every page in an attempt to find the relevant information. The same could be said of searching for information on the net. Yet the problem lies in the fact that there is no “index” for the net, and no reliable way for searching it.
Google is undoubtedly the most successful search engine, based on the notion of information linkage. Google’s algorithm uses a patented system called PageRank to help rank web pages that match a given search string. The PageRank algorithm computes a recursive score for web pages, based on the weighted sum of the PageRanks of the pages linking to them. Websites that have a lot of links to them have a high accessibility index, and therefore a greater chance of “floating” to the top of the Google heap. It doesn’t mean they are the best matches, but rather that they are the most linked. As such pertinent information may be impossible to find, other than by chance occurrence. Visual information is even more challenging, because a good search relies on an image to have appropriate tags associated with it. Content Based Image Retrieval (CBIR) is a concept with heralds a new method of searching, but in actuality it relies on extracting content from an image, which is not going to happen any time soon. In reality, searching the internet for information is more akin to rummaging than practical searching. Rummaging can be described as “searching unsystematically and untidily”. See a pattern? A real search engine would be capable of indexing web pages by extracting key words from the page, and maybe incorporating meta-data such as the time stamp when the page was last modified.
In truth sometimes it may be easier to find information by going to a library and looking through books than finding it on the internet.