Explained: Security.txt, Humans.txt, Ads.txt & Robots.txt
There are four text ("txt") files that you may or may not have in your website's root directory. These are often after thoughts when developing a website. Only robots.txt is considered critical. All four of these txt files are considered suggestive. This means bots may or may not follow their directive. There are security implications that we'll address below. This article is an overview of the most common informative txt files that you'll find in a websites root directory.
The three most common txt files: security.txt, humans.txt, ads.txt and robots.txt
The three most common txt files are (1) security.txt, a (2) humans.txt, (3) ads.txt, and the most important is the (4) robots.txt. Each serves different purposes related to the organization and transparency of a website. They are plain text files placed in a website's root directory, and they provide information for specific audiences or use cases. security.txt file: The security.txt file is a proposed standard that aims to help organizations define the process for security researchers to report vulnerabilities. It offers a way for security researchers to quickly identify the appropriate contact information and security policies. Use cases for security.txt include:
a. Specifying contact information: Providing an email address, phone number, or contact form URL for researchers to report security vulnerabilities.
b. Defining disclosure policy: Describing the organization's policy on responsible disclosure of security vulnerabilities, including time frames and guidelines for researchers.
c. Acknowledgments: Offering a public recognition or reward program for researchers who report vulnerabilities, such as a "hall of fame" or bug bounty program.
Downside of Security.txt File
The downside of a security.txt file is it opens a can of worms for security researchers to contact you to get paid for vulnerabilities. Sometimes those researchers can get a tad pushy to try and get pay outs. It is always important to put responsible disclosure limits on what vulnerabilities that you'll pay out for.
Humans.txt File Explained
humans.txt file: The humans.txt file is intended to provide credit and recognition to the people behind a website. It allows web developers, designers, and other contributors to identify themselves and their roles in the creation and maintenance of the site. Use cases for humans.txt include:
a. Attribution: Listing the names, roles, and contact information of individuals or teams involved in building and maintaining the website.
b. Technologies used: Providing information about the technologies, tools, or frameworks used in the development of the site.
c. Licensing: Specifying any licenses or copyright information related to the content or code used on the site.
At Fruition we typically do not publish a humans.txt file.
Ads.txt File Explained
ads.txt file: The ads.txt file (Authorized Digital Sellers) is an IAB Tech Lab initiative designed to improve transparency in the programmatic advertising ecosystem. It allows website publishers to declare the companies authorized to sell their digital inventory, thereby helping to prevent fraudulent activity. Use cases for ads.txt include:
a. Authorized sellers: Listing the advertising networks, exchanges, or platforms that have permission to sell the publisher's ad inventory.
b. Transparency: Providing a record of authorized sellers, which can be checked by advertisers and their agencies to confirm they are buying from legitimate sources.
c. Fraud prevention: Helping to reduce the risk of domain spoofing or unauthorized reselling of ad inventory by providing a clear list of authorized sellers.
You only need an ads.txt file if you plan on selling ads on your website.
Robots.txt File Explained
Robots.txt. The robots.txt file is the only one that could be considered critical only because it can cause significant harm to your website's traffic if it is not setup properly. Google bots, which come from these IPs, will check the robots.txt file for directives on if Google should index the content on the site. The robots.txt file is a simple text file that website owners use to provide instructions to web crawlers, also known as robots or bots, about how to crawl and index the site's content. Web crawlers, such as search engine bots, typically follow the rules defined in the robots.txt file to ensure that they are accessing and indexing content in a way that is aligned with the website owner's intentions.
What's wrong with this file?
The problem above is that it does not explicitly allow all. Instead it only explicitly allows one folder. If you want your site indexed change it to
Robots Exclusion Protocol
The robots exclusion protocol is important to know for excluding certain content from bots. It should be noted that only trustworthy bots may follow your robots.txt directives. Thus, if you do not want content indexed you must use other methods to prevent malicious bots from finding it.
Testing your robots.txt
To test your robots.txt file you can use Google's Search Console Robots.txt tester.
The robots.txt file is placed in the root directory of a website and is accessible through a URL like "https://fruition.net/robots.txt". The file uses a specific format to communicate the crawling rules, which consist of "User-agent" and "Disallow" or "Allow" directives.
- User-agent: This directive specifies the web crawler for which the following rules apply. It can be a specific bot (e.g., "Googlebot") or a wildcard "*" to target all bots.
- Disallow: This directive tells the web crawler not to crawl or access specific parts of the website. The directive is followed by the URL path that should be excluded from crawling.
- Allow (optional): This directive is used to grant permission for a web crawler to access certain parts of a website, even if a broader "Disallow" rule is in place. It helps to create exceptions for specific content.
Here's a simple example of a robots.txt file:
User-agent: * Disallow: /private/ Disallow: /temp/
In this example, the robots.txt file is instructing all web crawlers (indicated by the wildcard "*") not to crawl or access the "private" and "temp" directories of the website.
Caution about relying on the robots.txt file noindex directive
It's important to note that the robots.txt file is what you want bots to do. It does not mean bots will listen to the directive. Major search engines, like Google and Bing, adhere to the rules specified in the robots.txt file to ensure a better web experience for both website owners and users. It is unknown if ChatGPT's bots adhere to robots.txt directives.
Fruition's Website Help Desk Support
If you're interesting in getting support for your .txt files or anything else for your website please contact us and we can discuss support options including our website help desk. We're happy to help you setup your various text files. You can ping us via our contact page to discuss.
What is an ads.txt file?
The ads.txt file (Authorized Digital Sellers) is an IAB Tech Lab initiative designed to improve transparency in the programmatic advertising ecosystem. It allows website publishers to declare the companies authorized to sell their digital inventory, thereby helping to prevent fraudulent activity.
What is a security.txt file?
The security.txt file is a proposed standard that aims to help organizations define the process for security researchers to report vulnerabilities. It offers a way for security researchers to quickly identify the appropriate contact information and security policies.