- Soạn thảo đơn giản bằng Notepad
- Nơi đặt file này tại vị trí cao nhất, ngang hàng với index.
- File này là công khai, có thể xem nó bằng cách gõ thêm robots.txt sau địa chỉ chính. Ví dụ http://***.com/robots.txt
- Mục đích: Thiết đặt các thuộc tính quy định các cỗ máy siêu tìm kiếm (google, yahoo, bing, ask, ...) được phép tìm kiếm phần nào và không được lùng sục phần nào trên thư mục web. Điều này là thiết thực và cần thiết vì không phải phần nào cũng công khai, đặc biết là phần nhạy cảm, ví dụ như admin.
Theo mặc định đa phần các con robot của các cỗ máy tìm kiếm sẽ nghe theo những thiết đặt này, tuy nhiên cũng có trường hợp không muốn tuân theo, ví dụ bọ baidu của china là một ví dụ.
- Một số công cụ tạo, kiểm tra file này:
- http://www.mcanerin.com/en/search-engine/robots-txt.asp,
- http://tools.seobook.com/robots-txt/generator/.
- http://www.frobee.com/robots-txt-check (chỉ kiểm tra)
- ...
Đa phần có cú pháp cơ bản sau:
User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /bất cứ file hay folder nào mà bạn không muốn công khai/
Trong đó :
+ User-agent: tên các công cụ được phép tìm kiếm, nếu tất cả thì đánh dấu *
+ Disallow: không co phép
+ Allow: Luôn cho phép
Để loại trừ tất cả các robot toàn bộ máy chủ
User-agent: *Để cho phép tất cả các robot truy cập đầy đủ
Disallow: /
User-agent: *Để loại trừ tất cả các robot phần của máy chủ
Disallow:
User-agent: *Để loại trừ một robot đơn
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
User-agent: BadBotĐể cho phép một robot đơn
Disallow: /
User-agent: GoogleĐể loại trừ tất cả các file ngoại trừ một
Disallow:
User-agent: *
Disallow: /
Tất cả các files trong stuff không được phép xem
User-agent: *Hoặc bạn có thể không cho phép một cách rõ ràng tất cả các trang không được phép
Disallow: /~joe/stuff/
User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html
Đối với wordpress nhìn chung:
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
hay:
User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Đối với blogspot nhìn chung:
User-agent: Mediapartners-Google*
Disallow:
User-agent:*
hay:
User-agent: Mediapartners-Google
Disallow:
User-agent:*
Disallow: /*?updated-max=*
Allow: /
Sitemap: http://.../feeds/posts/default?orderby=updated
Dưới đây là ví dụ robots.txt của một số trang web:
Problogger.net
User-agent: *http://vietseo.net/robots.txt
Disallow:
#Last modified $Date: 19/05/2008 $ by $Author: HoaiNam $ Created: Jan 2008
#
# Internet Archiver Wayback Machine Alexa
#User-agent: ia_archiver
#Disallow: /
# Digg mirror
#User-agent: duggmirror
#Disallow: /
User-agent: *
# Disallow: */trackback*
Disallow: /wp-*
Disallow: /archive/*
Disallow: /tai-nguyen-web/*
Disallow: /quang-ba-website/*
Disallow: /kiemtien/*
# Disallow: */feed*
# Disallow: /20*
Disallow: */?mobi
Disallow: /page/
Disallow: */?dl_id*
Disallow: */?dl_cat*
Allow: /
# BEGIN XML-SITEMAP-PLUGIN
Sitemap: http://www.vietseo.net/sitemaps.xml.gz
# END XML-SITEMAP-PLUGIN
http://vn.yahoo.com/robots.txt
User-agent: *http://vietnamnet.vn/robots.txt
Disallow: /p/
Disallow: /r/
Disallow: /*?
User-agent: *http://sinhvienit.vn/robots.txt
Allow: /
Sitemap: http://vietnamnet.vn/sitemap.xml
User-agent: coccoc
Crawl-delay: 5
Disallow: /@forum/url/
Disallow: /@forum/frm.php
Disallow: /forum/url/
Disallow: /forum/frm.php
User-agent: 008
Crawl-delay: 5
Disallow: /
User-agent: bingbot
Crawl-delay: 1
Disallow: /@forum/url/
Disallow: /@forum/frm.php
Disallow: /forum/url/
Disallow: /forum/frm.php
User-agent: *
Disallow: /@forum/url/
Disallow: /@forum/frm.php
Disallow: /forum/url/
Disallow: /forum/frm.php
Sitemap: http://sinhvienit.net/forum/xmlsitemap.php
http://dantri.com.vn/robots.txt
User-agent: *
Disallow: /Ajax/
Sitemap: http://dantri.com.vn/sitemap1/sitemap/sitemap-index.xml
http://youtube.com/robots.txt
# robots.txt file for YouTube
# Created in the distant future (the year 2000) after
# the robotic uprising of the mid 90's which wiped out all humans.
User-agent: Mediapartners-Google*
Disallow:
User-agent: *
Disallow: /all_comments
Disallow: /bulletin
Disallow: /comment
Disallow: /forgot
Disallow: /get_video
Disallow: /get_video_info
Disallow: /login
Disallow: /results
Disallow: /signup
Disallow: /t/terms
Disallow: /t/privacy
Disallow: /verify_age
Disallow: /videos
Disallow: /watch_ajax
Disallow: /watch_popup
Disallow: /watch_queue_ajax
http://www.google.com/robots.txt có rất nhiều thứ:
User-agent: *
Disallow: /search
Disallow: /sdch
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Allow: /catalogs/about
Allow: /catalogs/p?
Disallow: /catalogues
Disallow: /news
Allow: /news/directory
Disallow: /nwshp
Disallow: /setnewsprefs?
Disallow: /index.html?
Disallow: /?
Allow: /?hl=
Disallow: /?hl=*&
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /relcontent
Disallow: /imgres
Disallow: /imglanding
Disallow: /sbd
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /default
Disallow: /m?
Disallow: /m/
Disallow: /wml?
Disallow: /wml/?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=
Disallow: /local?
Disallow: /local_url
Disallow: /shihui?
Disallow: /shihui/
Disallow: /froogle?
Disallow: /products?
Disallow: /products/
Disallow: /froogle_
Disallow: /product_
Disallow: /products_
Disallow: /products;
Disallow: /print
Disallow: /books/
Disallow: /bkshp?*q=*
Disallow: /books?*q=*
Disallow: /books?*output=*
Disallow: /books?*pg=*
Disallow: /books?*jtp=*
Disallow: /books?*jscmd=*
Disallow: /books?*buy=*
Disallow: /books?*zoom=*
Allow: /books?*q=related:*
Allow: /books?*q=editions:*
Allow: /books?*q=subject:*
Allow: /books/about
Allow: /booksrightsholders
Allow: /books?*zoom=1*
Allow: /books?*zoom=5*
Disallow: /ebooks/
Disallow: /ebooks?*q=*
Disallow: /ebooks?*output=*
Disallow: /ebooks?*pg=*
Disallow: /ebooks?*jscmd=*
Disallow: /ebooks?*buy=*
Disallow: /ebooks?*zoom=*
Allow: /ebooks?*q=related:*
Allow: /ebooks?*q=editions:*
Allow: /ebooks?*q=subject:*
Allow: /ebooks?*zoom=1*
Allow: /ebooks?*zoom=5*
Disallow: /patents?
Disallow: /patents/related/
Allow: /patents?id=
Allow: /patents?vid=
Disallow: /scholar
Disallow: /citations?
Allow: /citations?user=
Allow: /citations?view_op=new_profile
Allow: /citations?view_op=top_venues
Disallow: /complete
Disallow: /s?
Disallow: /sponsoredlinks
Disallow: /videosearch?
Disallow: /videopreview?
Disallow: /videoprograminfo?
Allow: /maps/api/js?
Disallow: /maps?
Disallow: /mapstt?
Disallow: /mapslt?
Disallow: /maps/stk/
Disallow: /maps/br?
Disallow: /mapabcpoi?
Disallow: /maphp?
Disallow: /mapprint?
Disallow: /maps/api/js/
Disallow: /maps/api/staticmap?
Disallow: /mld?
Disallow: /staticmap?
Disallow: /places/
Allow: /places/$
Disallow: /maps/preview
Disallow: /maps/place
Disallow: /help/maps/streetview/partners/welcome/
Disallow: /help/maps/indoormaps/partners/
Disallow: /lochp?
Disallow: /center
Disallow: /ie?
Disallow: /sms/demo?
Disallow: /katrina?
Disallow: /blogsearch?
Disallow: /blogsearch/
Disallow: /blogsearch_feeds
Disallow: /advanced_blog_search
Disallow: /reader/
Allow: /reader/play
Disallow: /uds/
Disallow: /chart?
Disallow: /transit?
Disallow: /mbd?
Disallow: /extern_js/
Disallow: /calendar/feeds/
Disallow: /calendar/ical/
Disallow: /cl2/feeds/
Disallow: /cl2/ical/
Disallow: /coop/directory
Disallow: /coop/manage
Disallow: /trends?
Disallow: /trends/music?
Disallow: /trends/hottrends?
Disallow: /trends/viz?
Disallow: /notebook/search?
Disallow: /musica
Disallow: /musicad
Disallow: /musicas
Disallow: /musicl
Disallow: /musics
Disallow: /musicsearch
Disallow: /musicsp
Disallow: /musiclp
Disallow: /browsersync
Disallow: /call
Disallow: /archivesearch?
Disallow: /archivesearch/url
Disallow: /archivesearch/advanced_search
Disallow: /base/reportbadoffer
Disallow: /urchin_test/
Disallow: /movies?
Disallow: /codesearch?
Disallow: /codesearch/feeds/search?
Disallow: /wapsearch?
Disallow: /safebrowsing
Allow: /safebrowsing/diagnostic
Allow: /safebrowsing/report_badware/
Allow: /safebrowsing/report_error/
Allow: /safebrowsing/report_phish/
Disallow: /reviews/search?
Disallow: /orkut/albums
Allow: /jsapi
Disallow: /views?
Disallow: /c/
Disallow: /cbk
Allow: /cbk?output=tile&cb_client=maps_sv
Disallow: /recharge/dashboard/car
Disallow: /recharge/dashboard/static/
Disallow: /translate_a/
Disallow: /translate_c
Disallow: /translate_f
Disallow: /translate_static/
Disallow: /translate_suggestion
Disallow: /profiles/me
Allow: /profiles
Disallow: /s2/profiles/me
Allow: /s2/profiles
Allow: /s2/photos
Allow: /s2/static
Disallow: /s2
Allow: /s2/search/social
Disallow: /transconsole/portal/
Disallow: /gcc/
Disallow: /aclk
Disallow: /cse?
Disallow: /cse/home
Disallow: /cse/panel
Disallow: /cse/manage
Disallow: /tbproxy/
Disallow: /imesync/
Disallow: /shenghuo/search?
Disallow: /support/forum/search?
Disallow: /reviews/polls/
Disallow: /hosted/images/
Disallow: /ppob/?
Disallow: /ppob?
Disallow: /ig/add?
Disallow: /adwordsresellers
Disallow: /accounts/o8
Allow: /accounts/o8/id
Disallow: /topicsearch?q=
Disallow: /xfx7/
Disallow: /squared/api
Disallow: /squared/search
Disallow: /squared/table
Disallow: /toolkit/
Allow: /toolkit/*.html
Disallow: /globalmarketfinder/
Allow: /globalmarketfinder/*.html
Disallow: /qnasearch?
Disallow: /app/updates
Disallow: /sidewiki/entry/
Disallow: /quality_form?
Disallow: /labs/popgadget/search
Disallow: /buzz/post
Disallow: /compressiontest/
Disallow: /analytics/reporting/
Disallow: /analytics/admin/
Disallow: /analytics/web/
Disallow: /analytics/feeds/
Disallow: /analytics/settings/
Disallow: /alerts/
Disallow: /ads/preferences/
Allow: /ads/preferences/html/
Allow: /ads/preferences/plugin
Disallow: /ads/search
Disallow: /settings/ads/onweb/
Disallow: /phone/compare/?
Allow: /alerts/manage
Allow: /alerts/remove
Disallow: /travel/clk
Disallow: /hotelfinder/rpc
Disallow: /flights/rpc
Disallow: /commercesearch/services/
Disallow: /evaluation/
Disallow: /chrome/browser/mobile/tour
Disallow: /compare/*/apply*
Disallow: /forms/perks/
Disallow: /baraza/*/search
Disallow: /baraza/*/report
Disallow: /shopping/suppliers/search
Disallow: /ct/
Disallow: /edu/cs4hs/
Sitemap: http://www.gstatic.com/culturalinstitute/sitemaps/www_google_com_culturalinstitute/sitemap-index.xml
Sitemap: http://www.google.com/hostednews/sitemap_index.xml
Sitemap: http://www.google.com/sitemaps_webmasters.xml
Sitemap: http://www.google.com/ventures/sitemap_ventures.xml
Sitemap: http://www.gstatic.com/dictionary/static/sitemaps/sitemap_index.xml
Sitemap: http://www.gstatic.com/earth/gallery/sitemaps/sitemap.xml
Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml
Sitemap: http://www.gstatic.com/trends/websites/sitemaps/sitemapindex.xml
No comments:
Post a Comment