input license here

How to create and read robots.txt for Blog

How to create and read robots.txt for Blog
A robots.txt file created to add content blocking rules to search engines that crawls and indexes the URLs of websites, if you do not add a default robots.txt file that will allow the tools collection search. Search engines like Google, Bing, Yahoo, Twitter, Facebook ...., use algorithms called bots to gain access to the main link of each page analyzing all the URLs being crawled. is in that page then search robots.txt file, meta tag, rel attribute in the link to see which links are blocked, which links are allowed to visit. A robots.txt file is still considered by search engines when analyzed, as the content in the crawler bots reads which links are allowed to be crawled and indexed, and which links are blocked.

Each crawler has a specific name specified in the robots.txt file, such as the following: Google - Googlebot, Bing - Bingbot, Yahoo - Yahoobot, Twitter - Twitterbot, Facebook - Facebot ..., in addition to You can specify indefinitely with the attribute (*). 

After each bot is added with two rules: Allow and Disallow. 

How to create a robots.txt file 

The structure of the robots.txt file is read as follows:

user-agent: the name of the search engine bot 
Disallow: Links blocked 
allow: Links allows 
Sitemap: <domain> Sitemap /sitemap.xml 

illustrated example:

Suppose I'll Allows the Google bot, Twitter, Facebook, Google partner (Adsense) to collect data as follows: 

User-agent: Googlebot
User-agent: Twitterbot
User-agent: Facebot
Disallow: /p
Disallow: /search
Allow: /
User-agent: Mediapartners-Google
Allow: /
Sitemap: https://www.cuongbv.com/sitemap.xml

When reading content In the file that understands the Google bots, Twitter, Facebook block all static page links (/ p) and search pages (/ search) and allow Google Mediapartners-Google Partners to collect all links. 

Add filter rules from blocked links or block a link from allowed links

Assume in 2 rules: Disallow: / p and Disallow: / search add rules Allow filter to get links contained in links This blocked link and block a link from the Allow: / link, for example: 

User-agent: Googlebot
User-agent: Twitterbot
User-agent: Facebot
Disallow: /p
Disallow: /search
Disallow: /2018/11/cach-chia-se-bai-viet-len-facebook-an-toan-va-tuong-tac-cao.html
Allow: /
Allow: /p/about-us.html
Allow: /search/label/blogspot-seo
User-agent: Mediapartners-Google
Allow: /
Sitemap: https://www.cuongbv.com/sitemap.xml

Add rules (*) Advanced filter

Disallow: *?showComment=*
Disallow: *?spref=fb
Disallow: *?spref=tw
Disallow: *?spref=gp
Disallow: *?spref=pi
Disallow: *?utm_source=*

With this add (*) rule, no need to know any links that have the value behind the asterisk (*) will be blocked.
Related Posts
SHARE

Related Posts

Subscribe to get free updates

Post a Comment

Sticky