At Your Fingertips
Internet search engines are a wonderful means of sifting through all the information on the World Wide Web to find just the page you need. The various search engines employ spiders to crawl the web and compile lists of keywords that pinpoint pages containing sought after data. A way must then be found to save this data so that it can be drawn upon in the course of a search. Since new pages are always being added to the web, the process is continual.
In general, search engines make the amassed data available to users by linking it with other information and through the specific methods by which various search engines then index this information. One of the methods used to index information is ranking. Ranking is the indexing method that causes the pages deemed most useful to appear at the head of your search results or hits list.
For this purpose, a search engine may tabulate how many times a given word appears on a specific page. The system may then attribute a weight to a word, depending on the prominence of its location on the page, and its appearance within the title of the page, its subheadings, links, and meta tags. The various engines each have their own method for assigning words their weights within an index. This is illustrated by the fact that each search engine, when fed keywords or key phrases, will compile different lists with the results listed in different orders.
Neat And Compact
However a given search engine combines data with other pieces of information, it will first encode this data to save on storage space. Google, for instance, in its earliest days, used 2 bytes of 8 bits each for storing the information used in weighting words, such as capitalization, font size, and location. Each element employed in the weighting process would take up some 2-3 bits within this 2 byte grouping, with 8 bits of information equaling 1 byte. This type of encoding keeps the data neat and compact. At this point, the information is all set to be indexed.
Indexing allows for information to be found fast. While there are a number of ways to build an index, one of the most effective methods is to create a hash table. Hashing is the process by which numerical values are assigned to each word.
Certain letters of the alphabet are found at the head of a word more often. If you look at the entries in a dictionary, you can see that the section of words that begin with the letter M is much larger than the section of words beginning with the letter Z. Hashing is a system that is designed to even out this difference by spreading the entries over a certain number of sections. It is this process, distinct from the process of alphabetizing words, which is the key to the hash table's effectiveness.
Without hashing, words beginning with Z would take longer to find than words beginning with M, for example. Hashing speeds up the process by evening out the difference between popular and unpopular first letters, for instance. That means faster results.
Hashing also maintains a separation between the index and the word or phrase entry so that the hash table keeps the hash number as well as the address to the actual information. This is a very efficient method by which data can be indexed and stored enabling the user to obtain results with great speed even during an elaborate search.