,hl=en,siteUrl='http://0ldfox.blogspot.com/',authuser=0,security_token="v_SeT2Tv8vVdKRCcG9CCW-ZdIfQ:1429878696275"/> Old Fox KM Journal

Tuesday, July 19, 2005

Keep Search Engines from Indexing


How to Keep Search Engines from Indexing a Page: "How to Keep a Search Engine From Indexing a Page

Search engines gather information on websites by using programs called robots. Robots are also known as web crawlers, spiders, worms, web wanderers, and scooters. When a web robot crawls your web site, it goes to each page in your site and follows every link on every page to index the information and then put that information into databases.

Now this is generally good news, but sometimes you may have a page on your site that you do not want indexed in a search engine. For example, you may have a special page that you do not want a visitor to see until after they have taken some action such as clicking on an introductory or explanatory link first. Well, luckily for us, there are ways to keep a page from being indexed by search engines.

Each search engine has it?s own rules for what it looks at to index a site. Some look at meta tags, some ignore the meta tags and look at the beginning text of a page, some read page titles, and so on, but almost all search engines today first check the site for a file called robots.txt. The robots.txt file is a simple ascii (plain text) document where we put instructions to the search engines.

The robots.txt file basically tells search engines two things: 1) which search engine(s) are excluded or forbidden to index and 2) which specific page(s), directories, or even the entire site are to be excluded from the indexing. The robot.txt file will usually contain information that looks like this:

User-agent: *
Disallow: /tmp/
Disallow: /private/

In this example, all search engines (*) are excluded from indexing two directories (tmp and private). All other directories and pages on the site are considered fair game to be included in the indexing."

No comments: