| Search Engine Spiders What is a search engine spider? A search engine spider, also called crawler or bot, is a program designed to browse the Internet in a systematic, automated manner and retrieve information about websites. Search engine spiders extract information from the pages they visit and store them in a way that allows search engines to process and index the data and quickly retrieve relevant parts of the data in response to search queries. How do spiders work? Search engine spiders use the hyperlinks contained on web page to move from one website to the next - or crawl from one web page to another if you prefer. If a spider is given a list of URLs to visit, it begins visiting the URLs on the list, identifies the hyperlinks on those URLs and adds the hyperlinked pages to the list of URLs to visit, thus expanding the so-called "crawl frontier". The hyperlinked pages could belong to the same website or be links to external pages. With more and more web pages being added to the World Wide Web, and existing web pages being changed and updated frequently, one of the main challenges for search engine spiders is to efficiently crawl as many new and updated web pages as possible. Because of this challenge, search engine spiders use a set of rules that help them determine which pages to crawl how often and how to distribute the activities of multiple spiders that are crawling the web at the same time. Search engines generally have multiple copies of their spiders crawling the web at the same time, and do not provide much information about exactly which rules their spiders follow to avoid search engine spammers using this information to manipulate crawls (and ultimately search engine rankings). In general, search engine spiders are guided by some measure of website importance as expressed in the quality and popularity of a site. They can also "learn" how often pages are updated and when is a good time to spider the page again for new content. This is both good for the search engine and the webmaster, as it uses less bandwidth. What do spiders read?
What do spiders ignore?
Can spiders do any damage?
How to make your site spider friendly Try and provide search engine spiders with an easy way to navigate through your site (e.g. through a sitemap or through HTML links) and provide them with plenty of HTML copy to index. And, of course, make sure search engine spiders find your site in the first place - that means you need incoming links! Directory listings are a good source of incoming links, and you should also request links to your site from relevant, related websites (e.g. supplier, industry association or customer websites). |