Our agency handles organic search consulting for several SaaS vendors in the industry. A client that we recently started working with had done a fairly standard practice, placing their application on a subdomain and moving their brochure site to the core domain. This is a standard practice since it enables both your production team and your marketing team to make updates as needed without any dependency on the other.
As a first step in analyzing their organic search health, we registered both the brochure and application domains in Webmasters. That's when we identified an immediate issue. All of the application pages were being blocked from being indexed by the search engines. We navigated to their robots.txt entry in Webmasters and instantly identified the issue.
While preparing for the migration, their development team didn't want the application subdomain to be indexed by search, so they disallowed access to search engines. The robots.txt file is a file found in the root of your site – yourdomain.com/robots.txt – that lets the search engine know whether or not they should crawl the site. You can write rules to allow or disallow indexing on the entire site or specific paths. You can also add a line to specify your sitemap file.
The Robots.txt file had the following entry which prevented the site from being crawled and indexed in search result rankings:
User-agent: * Disallow: /
It should have been written as follows:
User-agent: * Allow: /
The latter provides permission to any search engine crawling the site that they can access any directory or file within the site.
Great… so now that the robots.txt file is perfect but how does Google know and when will they check the site again? Well, you can absolutely request that Google check your robots.txt, but it's not too intuitive.
Navigate to the Google Search Console Search Console to the Crawl > robots.txt Tester. You will see the contents of the most recently crawled robots.txt file within the Tester. If you'd like to resubmit your robots.txt file, click Submit and a popup will come up with a few options.
The final option is Ask Google to update. Click the blue Submit button next to that option and then navigate back to the Crawl > robots.txt Tester menu option to reload the page. You should now see the updated robots.txt file along with a date stamp that shows that it was crawled again.
If you don't see an updated version, you can click submit and select View uploaded version to navigate to your actual robots.txt file. Many systems will cache this file. In fact, IIS generates this file dynamically based on rules entered through their user interface. You'll most likely have to update the rules and refresh the cache to publish a new robots.txt file.