View Full Version : shopmiva.com, robots.txt?
ILoveHostasaurus
07-09-06, 07:18 PM
Can someone from Miva let me know if shopmiva.com honors robots.txt? We've got a customer whose b2b-only store is now indexed in shopmiva but they didn't have any links to the store anywhere on their site so shopmiva.com obviously crawled it due to its knowledge of the store being there. This customer does not want it to be in there, so if we put a robots.txt on for /Merchant2 will shopmiva honor that?
dotCOM_host
07-09-06, 07:48 PM
Good question to know. In the short term, I think James mentioned that they just crawled MIVA Merchant stores based on the licensing server registrations, which means they most likely do not scan for the robots.txt file and simply crawl sites starting from the home page down if they are listed in their licensing server.
MIVA - would like to see your spider obey the robots.txt file - I just found a client with a b2b site as well that is listed yet they do not sell to general public...
Vic - WolfPaw Computers
07-09-06, 07:53 PM
I'd like to see that somewhere there is a disclosure that MIVA is doing this, and a means for customers to opt out.
I certainly do not think customers with dev stores will want them indexed and listed. I know we don't want our test stores listed.
Its also an invasion of merchant's privacy to do this without providing them some kind of warning or disclosure MIVA is spidering their sites for inclusion without their express permission or advanced knowledge.
I don't recall anything in the EULA that permits this.
Don't get me wrong here, I think its a great idea and a great added service - however, I think this needs to be thought through a little better first.
James Harrell
07-09-06, 08:05 PM
Hi All,
The spider and search engine are commercial grade tools, not home grown. I'll check with our Direct group who owns and manages the engine for clarification, but for now I'm almost sure it obeys robots.txt.
The way it works is we take the "Merchant registration" domain name, and start at the domain's home page (not the Merchant store front page) and follow links from there. If there's a link to the Merchant store from within the site the store will be spidered - just like in any other search engine.
Regards,
The site also says "Our search tool crawls through sites generated by our license manager, and if a store has not checked-in within 90 days, it will be removed from the list of sites."
What about older versions of MIVA Merchant that do not check-in to the license manager?
wmgilligan
07-09-06, 10:19 PM
Found this spider a week ago bogging down my server.... seemed to be searching all miva links.
Also - I am very concerned over the tool bar. Eveeryone should read the disclosre very closely.
Especially the second paragraph.
http://www.shopmiva.com/help/terms.html
Bill
James Harrell
07-10-06, 12:05 AM
Hi Bill,
No we don't use picosearch, and we won't bog down your stores. As I said, the search spider is professional grade, not something free like picosearch or mnogo, etc. The spider has a delay built into it and it won't fetch more than one page a minute. Compare that to Googlebot which hits my test store on 2 to 3 second intervals.
The spider identifies itself as "Miva (AlgoFeedback@miva.com)", here's a log line from one of our test sites:
66.150.55.230 - - [30/May/2006:14:37:34 -0400] "GET /robots.txt HTTP/1.1" 404 280 "-" "Miva (AlgoFeedback@miva.com)"
As for the T&C on the info page, I'll make sure that gets updated. We've removed a lot of the other features from the toolbar and limited it to just Pop-up blocking, RSS reading from our forums, and search term landing to ShopMIVA when you type in a search in the toolbar.
Regards,
Vic - WolfPaw Computers
07-10-06, 12:12 AM
This still does not cover the disclosure to Merchant store operators that they are being spidered specifically because of their use of MIVA products, nor provide any means of opting out.
Hi All,
The spider and search engine are commercial grade tools, not home grown. I'll check with our Direct group who owns and manages the engine for clarification, but for now I'm almost sure it obeys robots.txt.
The way it works is we take the "Merchant registration" domain name, and start at the domain's home page (not the Merchant store front page) and follow links from there. If there's a link to the Merchant store from within the site the store will be spidered - just like in any other search engine.
Regards,
How does it handle Additional store licenses? I don't believe there is a domain name for the additional store license. Is there? I will have additional stores to a mall that would love to be listed.
Scott
James Harrell
07-11-06, 05:02 AM
Hi David,
The MIVA crawler does obey standard robots.txt files. If you experience differently, please email me and I'll forward that along to our algo group.
Scott - additional stores are handled just like any other link on the site - if you have a link to the stores they'll be crawled. As always with any search engine, a good sitemap file would help.
Regards,
Hi David,
The MIVA crawler does obey standard robots.txt files. If you experience differently, please email me and I'll forward that along to our algo group.
Scott - additional stores are handled just like any other link on the site - if you have a link to the stores they'll be crawled. As always with any search engine, a good sitemap file would help.
Regards,
Thanks. But here's some more detail. The main stores domain, that the license is tied to, doesn't actually run. The stores name is different than the domain. So, does the domain name actually matter? I understand it will actually read the links through the mall, but I would also want it the spider to the domain of the additional storefronts. Am I asking too much? Am I asking the right question?
TIA
Scott
vBulletin® v3.7.4, Copyright ©2000-2008, Jelsoft Enterprises Ltd.