The IntelliGiantsBot is Google’s web crawling bot (sometimes referred to as a “spider, crawler or robot”). Crawling is the process by which IntelliGiantsBot discovers new and updated pages within websites that are then added to our index.
IntelliGiants operate a huge network of high specification servers and offer differing levels of subscription to access the data collated.
IntelliGiantsBot crawl process begins with a list of URL’s, generated from source data supplied by our clients either from direct APIs supplied by third parties to manually uploaded data. We do not make use of your sitemaps in your domain and will only be targeting URL’s that link to specific locations. New, altered and dead links are noted and updated in our index. When crawling we are collecting the entire source code of the URL in question and it is stored in our index. Next time we crawl the same URL we will compare the changes and log these in our index for access by our clients.
For most sites, IntelliGiantsBot shouldn’t access one specific domain more than once every few seconds on average. However, due to network delays, and other anomalies, it’s possible that the rate will could oscillate from lower to higher rates over short periods. In general, IntelliGiantsBot should download only one copy of each page at a time. If you see that IntelliGiantsBot is downloading a page multiple times, it’s probably because the crawler was stopped and restarted.
IntelliGiantsBot has been designed to be distributed on several machines to improve performance and scale as the popularity of our services increase. To cut down on bandwidth usage, we run many crawlers on many machines. Therefore, your logs may show visits from several machines at IntelliGiants.com/crawler/, all with the user-agent IntelliGiantsBot. Our goal is to crawl as many pages from your site that contain the specific links we are searching for.
It’s almost impossible to keep a web server secret by not publishing links to it. As soon as someone follows a link from your “secret” server to another web server, your “secret” URL may appear in the referrer tag and can be stored and published by the other web server in its referrer log. Similarly, the web has many outdated and broken links. Whenever someone publishes an incorrect link to your site or fails to update links to reflect changes in your server, IntelliGiantsBot will try to download an incorrect link from your site.
If you want to prevent IntelliGiantsBot from crawling content on your site, you have a number of options, including using robots.txt to block access to files and directories on your server.
Once you’ve created your robots.txt file, there may be a small delay before IntelliGiantsBot discovers your changes. If IntelliGiantsBot is still crawling content you’ve blocked in robots.txt, check that the robots.txt is in the correct location. It must be in the top directory of the server (e.g., www.mysite.com/robots.txt); placing the file in a subdirectory won’t have any effect.
If you just want to prevent the “file not found” error messages in your web server log, you can create an empty file named robots.txt.
If you want us to stop crawling your site, please email firstname.lastname@example.org listing your domain and politely requesting that we stop. All rude, derogatory communication will be ignored, as our staff will not be subjected to such material.