FastBots: Build A Custom WordPress XML Sitemap For Training Your AI Bot
Martech Zone has thousands of articles, with many of them outdated. I’ve worked on the site for several years to remove or update hundreds of articles, but I still have many more. At the same time, I’d like to train a natural language bot with my content, but the last thing I want to do is train it on outdated articles.
FastBots is a ChatGPT-powered bot builder that you can initially train using your sitemap (or other options). I needed a filtered sitemap that included all articles modified since a specific date. Additionally, I wanted to include my pages and acronyms (a custom post type). I didn’t want to include archive pages for categories and tags or have my home page since it’s also an archive.
Using the code I’m providing at the end of this article; I built a custom WordPress plugin that creates a custom XML sitemap that dynamically refreshes each time I publish a post. FastBots doesn’t have an automated retraining method as I publish each article, but this is a great starting point for using the platform.
The sitemap imports all the links to train the AI Bot on:
All pages are now imported, and you can train your bot on the applicable data. You also have the opportunity to remove specific pages. FastBots also allowed me to customize my AI bot’s branding and even include a link to a relevant article in my response. There’s also a lead request built into the platform.
The platform worked flawlessly… you can give my bot a test drive here:
Launch Martech Zone’s Bot, Marty Build Your FastBots AI Bot
Custom XML Sitemap
Rather than add this functionality to my theme, I built a custom WordPress plugin to build out a Sitemap. Just add a directory in your plugins folder, then a PHP file with the following code:
<?php
/*
Plugin Name: Bot Sitemap
Description: Dynamically generates an XML sitemap including posts modified since a specific date and updates it when a new article is added.
Version: 1.0
Author: Your Name
*/
// Define the date since when to include modified posts (format: Y-m-d)
$mtz_modified_since_date = '2020-01-01';
// Register the function to update the sitemap when a post is published
add_action('publish_post', 'mtz_update_sitemap_on_publish');
// Function to update the sitemap
function mtz_update_sitemap_on_publish($post_id) {
// Check if the post is not an auto-draft
if (get_post_status($post_id) != 'auto-draft') {
mtz_build_dynamic_sitemap();
}
}
// Main function to build the sitemap
function build_bot_sitemap() {
global $mtz_modified_since_date;
$args = array(
'post_type' => 'post',
'date_query' => array(
'column' => 'post_modified',
'after' => $mtz_modified_since_date
),
'posts_per_page' => -1 // Retrieve all matching posts
);
$postsForSitemap = get_posts($args);
// Fetch all 'acronym' custom post type posts
$acronymPosts = get_posts(array(
'post_type' => 'acronym',
'posts_per_page' => -1,
));
// Fetch all pages except the home page
$pagesForSitemap = get_pages();
$home_page_id = get_option('page_on_front');
$sitemap = '<?xml version="1.0" encoding="UTF-8"?>';
$sitemap .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
foreach($postsForSitemap as $post) {
setup_postdata($post);
if ($post->ID != $home_page_id) {
$sitemap .= '<url>'.
'<loc>'. get_permalink($post) .'</loc>'.
'<lastmod>'. get_the_modified_date('c', $post) .'</lastmod>'.
'<changefreq>weekly</changefreq>'.
'</url>';
}
}
foreach($acronymPosts as $post) {
setup_postdata($post);
if ($post->ID != $home_page_id) {
$sitemap .= '<url>'.
'<loc>'. get_permalink($post) .'</loc>'.
'<lastmod>'. get_the_modified_date('c', $post) .'</lastmod>'.
'<changefreq>weekly</changefreq>'.
'</url>';
}
}
foreach($pagesForSitemap as $page) {
setup_postdata($page);
if ($page->ID != $home_page_id) {
$sitemap .= '<url>'.
'<loc>'. get_permalink($page) .'</loc>'.
'<lastmod>'. get_the_modified_date('c', $page) .'</lastmod>'.
'<changefreq>monthly</changefreq>'.
'</url>';
}
}
wp_reset_postdata();
$sitemap .= '</urlset>';
file_put_contents(get_home_path().'bot-sitemap.xml', $sitemap);
}
// Activate the initial sitemap build on plugin activation
register_activation_hook(__FILE__, 'build_bot_sitemap');