Request.Browser.Crawler
April 8th, 2008
In my previous post about exception logging, I show how to log several different parameters related to the exception in the database. Request.Browser.Crawler is one of them and its used to track browser crawlers. It warrants its own separate entry since it requires some extra bit of setup in the web.config to get it to work correctly.
You’ll have to add the following code in the section of your web.config file:
<!– This section is used by Request.Browser.Crawler property to detect search engine crawlers –> <browserCaps> <filter> <!– SEARCH ENGINES GROUP –> <!– check Google (Yahoo uses this as well) –> <case match=”^Googlebot(\-Image)?/(?’version’(?’major’\d+)(?’minor’\.\d+)).*”> browser=Google version=${version} majorversion=${major} minorversion=${minor} crawler=true </case> <!– check Alta Vista (Scooter) –> <case match=”^Scooter(/|-)(?’version’(?’major’\d+)(?’minor’\.\d+)).*”> browser=AltaVista version=${version} majorversion=${major} minorversion=${minor} crawler=true </case> <!– check Alta Vista (Mercator) –> <case match=”Mercator”> browser=AltaVista crawler=true </case> <!– check Slurp (Yahoo uses this as well) –> <case match=”Slurp”> browser=Slurp crawler=true </case> <!– check MSN –> <case match=”MSNBOT”> browser=MSN crawler=true </case> <!– check Northern Light –> <case match=”^Gulliver/(?’version’(?’major’\d+)(?’minor’\.\d+)).*”> browser=NorthernLight version=${version} majorversion=${major} minorversion=${minor} crawler=true </case> <!– check Excite –> <case match=”ArchitextSpider”> browser=Excite crawler=true </case> <!– Lycos –> <case match=”Lycos_Spider”> browser=Lycos crawler=true </case> <!– Ask Jeeves –> <case match=”Ask Jeeves”> browser=AskJeaves crawler=true </case> <!– check Fast –> <case match=”^FAST-WebCrawler/(?’version’(?’major’\d+)(?’minor’\.\d+)).*”> browser=Fast version=${version} majorversion=${major} minorversion=${minor} crawler=true </case> <!– IBM Research Web Crawler –> <case match=”http\:\/\/www\.almaden.ibm.com\/cs\/crawler”> browser=IBMResearchWebCrawler crawler=true </case> </filter> </browserCaps>
Now what does it all mean? Well, IIS uses that information in the <browserCaps> section of your config file to detect whether the client browser is a crawler or not. If you look at it closely, its basically a regular expression filter. I presume you could add more filters in a similar format to detect other kinds of crawlers.
Update: For the most accurate and updated version of browserCaps and other useful browser testing/detection resources you can go to one of these sites:
http://slingfive.com/pages/code/browserCaps/









Leave a Reply