This ThumbCrawl web will lookup the five URLs and two alternate URLs. The focus here is support a straight forward simple web crawl request.
I will use common open source tools and follow common crawl rules. These are largely out of the box settings. No customization will me made on my search configuration, so it will be a non-biased/ influenced return of finding using these tools. {Honest results with no gaming}
In addition your supplied URLs will be manually checked before the crawl to spot foul play or other harmful actions ( no stalking or under age activity). If your URLs are suspicions, the crawl will not be activated/run.
Points:
-
If a 'robot.txt' filer is present for the URL/site preventing a crawl that will be observed.
-
Any large sites such an CNN or major brand site will be rejected. ( just small to medium complexity sites are accepted, no nation brands or chains – Nutch is not a price crawler)
-
The goal here is to assist the small to regional entities or research with a non-biased independent crawl, there is no ad revenue interest here (ie: butch the baker and candle stick maker)
-
Looking to follow local/regional laws and protection rules. The request will not support any stalking, under age activities or criminal intent.
-
There will be a review of your supplied URLs to support the points above, it is my choice to move forward with your request.
-
You data will be not shared or sold to other parties
Note: If the request if rejected, I will look to refund most of the money ($10 fee for effort)
FYI...If the tools crash or have odd finding the present in crawl, attempt will be made to filter the offending URL or domains. If the crawl crashes three time, present results and assume a completed request.