# 按照robots.txt的标准写法,规定一些不允许爬虫爬的页面和目录 # robots.txt的写法参照: # # Format is: # User-agent: # Disallow: | # # # It works likes this: a robot wants to vists a Web site URL, say # http://www.example.com/welcome.html. Before it does so, it firsts checks for # http://www.example.com/robots.txt, and finds: # --------------- # User-agent: * # Disallow: / # -------------- # # The "User-agent: *" means this section applies to all robots. The # "Disallow: /" tells the robot that it should not visit any pages on the site. # ----------------------------------------------------------------------------- User-agent: oao-spider Disallow: /