How to create and read robots.txt for Blog

    How to create and read robots.txt for Blog
    A robots.txt file created to add content blocking rules to search engines that crawls and indexes the URLs of websites, if you do not add a default robots.txt file that will allow the tools collection search. Search engines like Google, Bing, Yahoo, Twitter, Facebook ...., use algorithms called bots to gain access to the main link of each page analyzing all the URLs being crawled. is in that page then search robots.txt file, meta tag, rel attribute in the link to see which links are blocked, which links are allowed to visit. A robots.txt file is still considered by search engines when analyzed, as the content in the crawler bots reads which links are allowed to be crawled and indexed, and which links are blocked.

    Each crawler has a specific name specified in the robots.txt file, such as the following: Google - Googlebot, Bing - Bingbot, Yahoo - Yahoobot, Twitter - Twitterbot, Facebook - Facebot ..., in addition to You can specify indefinitely with the attribute (*). 

    After each bot is added with two rules: Allow and Disallow. 

    How to create a robots.txt file 

    The structure of the robots.txt file is read as follows:

    user-agent: the name of the search engine bot 
    Disallow: Links blocked 
    allow: Links allows 
    Sitemap: <domain> Sitemap /sitemap.xml 

    illustrated example:

    Suppose I'll Allows the Google bot, Twitter, Facebook, Google partner (Adsense) to collect data as follows: 

    User-agent: Googlebot
    User-agent: Twitterbot
    User-agent: Facebot
    Disallow: /p
    Disallow: /search
    Allow: /
    User-agent: Mediapartners-Google
    Allow: /
    Sitemap: https://www.cuongbv.com/sitemap.xml

    When reading content In the file that understands the Google bots, Twitter, Facebook block all static page links (/ p) and search pages (/ search) and allow Google Mediapartners-Google Partners to collect all links. 

    Add filter rules from blocked links or block a link from allowed links

    Assume in 2 rules: Disallow: / p and Disallow: / search add rules Allow filter to get links contained in links This blocked link and block a link from the Allow: / link, for example: 

    User-agent: Googlebot
    User-agent: Twitterbot
    User-agent: Facebot
    Disallow: /p
    Disallow: /search
    Disallow: /2018/11/cach-chia-se-bai-viet-len-facebook-an-toan-va-tuong-tac-cao.html
    Allow: /
    Allow: /p/about-us.html
    Allow: /search/label/blogspot-seo
    User-agent: Mediapartners-Google
    Allow: /
    Sitemap: https://www.cuongbv.com/sitemap.xml

    Add rules (*) Advanced filter

    Disallow: *?showComment=*
    Disallow: *?spref=fb
    Disallow: *?spref=tw
    Disallow: *?spref=gp
    Disallow: *?spref=pi
    Disallow: *?utm_source=*

    With this add (*) rule, no need to know any links that have the value behind the asterisk (*) will be blocked.

    No comments