Pieter says: “My additional insight is to go much deeper into log file analysis.
I know, for quite a lot of SEOs, log file analysis is already a thing they're doing, but I don't think they're doing much more than scratching the surface – especially if I look around at agencies in my area, in Europe.
I think log file analysis is somewhat of a hidden gem: a gold mine that's not touched enough. I know that what most SEOs link to a log file analysis is that it's a very technical tool which, of course, it is. It is invaluable, especially for bigger websites like e-com and big, international/multinational websites.
It's invaluable to see what's happening to your crawl budget; if bots are hitting walls somewhere or if there are a lot of 404s that you cannot find in the traditional SEO tools, it's invaluable. You must do that, especially for bigger websites.
But I think there's much more that can be found in these log files. Just dig a bit deeper and learn the names of the bots/the user agents that they are using, and you can find out a lot of information – both for SEO purposes, but now also for generative engine optimization and finding out more about what's happening in ChatGPT, Perplexity, etc.
Also, your log files will give you a lot more information, so you'll be able to see if ChatGPT visited your website, and by seeing which of ChatGPT's bots it was (because they actually have at least three that we know of), you will see what the purpose of its visit was.
You have the GPT bot, which is just the learning bot that’s gathering information to add to the large language model to make it smarter. But there's also, for instance, ChatGPT-User bot, and whenever you see a hit from that bot, you know that your site was being used as a source because that is for real-time visits. It's whenever a prompt is given for which your website is used as a source, and ChatGPT doesn't have enough information in its LLM, in its large language model, to answer the prompt. Then the ChatGPT-User bot will come visit your website and actually gather more information, more context, in order for it to answer the question.
That's really invaluable information that you cannot gather elsewhere, for the moment, because ChatGPT doesn't have an analytics tool that you can use as an SEO or other marketing expert. It's very, very important and you can find a lot of information. Seeing what these bots are doing is very interesting.
Also, again, the technical part: gather information on whether they have trouble, if they hit 404 pages, or if they're hallucinating/they just make up URLs (that also happens). It's also very interesting if that happens because you can learn from that as well. But also, for your content strategy, which URLs from your website are being visited by which bot?
You can learn so much from it. You can use this in your content strategy, in your interlinking strategy, etc. There’s really a lot you can learn from the log files.”
Okay, so log file analysis is hyper-relevant in the age of AI. It's been relevant for years, to give that additional context in terms of what users do on your site and how you can use that to actually augment your SEO strategy.
But, how do you find your log files, how do you have conversations with IT departments, how do you ensure that the right information is being retained for the right length of time?
“From my experience, that’s always a bit of a struggle. It’s a short struggle. As soon as it's set up, it's okay, but some IT departments have some difficulties with finding the right log files.
First of all, it's important that you have the right log files, because you have things like ADAL log files on your site that are not relevant for SEO work. What you need are the access log files, and they're on the server of your website. That's why they're so interesting: they're being gathered on a server level.
Who you need to talk to is the person in charge of your server. Most of the time, that's just your IT department, who will help you guide the way. If it's a different person, then just talk to them. It always involves some back-and-forth emailing, because the way you can gather these log files is always different depending on which kind of server you're using, so there's not just one easy email that I can always send to my clients’ IT departments. But we always get there.
It's also a question of what you want to do with the log files. For me, if you're able to set up an API with your log files and a third-party tool to gather the log files, that's the easiest way. At our agency, we use JetOctopus, which is a very cool tool that sets up the API for you and then links it to Google Search Console data, Google Analytics data, etc.
That's the ideal situation, where you have an API, because then, whenever there's an issue appearing in your log files, you can immediately get an alert. That's cool.
What you can also ask for is a one-time export of, let's say, at least three weeks of data. That's a very big file you will get, but there are other tools that you can use to analyse the export. For instance, Screaming Frog also has (besides the well-known Screaming Frog) the Screaming Frog Log File Analyser. You can import log file data into there, so that's also very interesting. Then it's just a one-time export.
It can be a good idea for getting to know the log files and convincing your clients, if you’re on the agency side. It might be a good first step. Depending on the type of website you are on (for instance, if you’re on a WordPress site), some plugins will just do the work for you.
We sometimes set up Dark Visitors, which is a WordPress plugin. It's very easy to use. It's not just a WordPress plugin; you can use it in other content management systems as well. For WordPress, it just has a plug-and-play setup. Just set it up, and you will see the bot visits coming in immediately.
You can't learn as much from it as you would from having the entire thing coming into a tool like JetOctopus, but what we have learned from it is quite a lot. We have it set up for our own website, for instance. Then, we see which AI bots are coming in, what Google Bot is doing, and which bots are spamming our websites unnecessarily. It's a very easy tool to get to know log files.
To summarise this long answer, the setup really depends on what you intend to do with it. The ideal situation, for me, is that you set up a connection between your log files where they're stored. It gets very technical. You need some sort of ‘bucket’, it's called, where your third-party tool can gather it. That's the ideal situation, but you can also ask for an export or just set up a WordPress plugin, for instance.”
You talked about AI bots visiting sites and being able to view that, and log files.
What, specifically, are you looking for within your log files to analyse that, and how can you piece that together with how your website is performing on the various AI search engines?
“What we're mostly looking at, first thing, is which bots are visiting.
As I mentioned, if we take the use case of ChatGPT, it's very interesting to see, for the GPT bot (the ‘LLM bot’), which pages they are using, and which they aren't. This already clarifies quite a lot, in terms of what information ChatGPT finds interesting and which information is simply not found. A very interesting use case there is to help this bot find more information.
Obviously, if you're a publisher, the situation is different. Then you probably will not be so happy that the GPT bot is visiting your website, because it's basically stealing/borrowing your information in order to get smarter. Then, a good use case would be just to use the robots.txt file to block this specific bot.
Another thing you can do is, for instance, we look at the pages that the ChatGPT-User bot is visiting, because this shows that these pages are actually being mentioned in ChatGPT. We think about it, and we see this as relevant content. We think about what the ChatGPT-User bot would do if it visits that page. Why is this page mentioned more than others? Is this, for instance, a long read? Is it a very short FAQ page?
Again, learning about which type of content is being shown in ChatGPT and being used as a source can also help you out very much because it gives you a lot of information for your content strategy.”
What about tying this together with your actual performance in the AI search engines?
“For me, learning about your performance in AI engines is like a puzzle where you have to gather the pieces yourself, because we don't have a lot of information in one tool – like in Google Ads, for instance, where you see everything. You see your impressions, you see your clicks, and you see what people have done on your website. You don't have that for ChatGPT.
What we do have is a few pieces of the puzzle. We have the log files, which show your impressions – in part; not all of them, obviously. Only that, if the LLMs don't have enough information about your website for a specific prompt, you will see that you're being used as a source. That's one piece of the puzzle.
You have brand trackers for AI tools where you can see, if you automatically push prompts into ChatGPT, etc., whether you are mentioned, how you are mentioned, and maybe which competitors of yours do better or worse than you. That's another piece of the puzzle. Then you have your Google Analytics data, where you actually see the visitors coming in. These are the three pieces of the puzzle that we link to each other.
You can do that in a Looker Studio report, where you see that these pages, for instance, get good impressions because they are mentioned a bit more, because we rank a bit higher than our competitors there, on average, so they give us more clicks eventually.
Learning more about the impressions is just so valuable because the clicks are decreasing, and fewer people are going to visit your website. Simply learning about how you are being shown and how your website is being used in a very black box tool like ChatGPT or Perplexity is already a lot of information that you can really use to help steer your content strategy, but also help these tools fix hallucinations.
We've had clients where these tools use the wrong version of the homepage, putting the wrong thing behind the URL. It helped us to just add a redirect from the hallucinated homepage to the right homepage. This helped us get more visibility and more clicks to the right homepage.
As I mentioned, it's a very cool way of learning a lot, but the deeper you go, the more you learn, and the more you go into the details. Sometimes it's going into a one-time visit of a bot and following it around to see what it does. It's very interesting and you can really learn a lot.
You can sort of communicate with these bots for the first time. For instance, if you see a lot of visits to a page on your website that doesn’t exist. We had that happen. We have an ‘SEO’ page on our agency website and, instead of visiting ‘SEO’ in the directory, the ChatGPT bot kept visiting a page for ‘Search Engine Optimization’, which was a 404 page.
It visited that page over 1,000 times in one week, so we put a redirect from that page to the right ‘SEO’ page. It helped us to be found more often and get more visits by sort of communicating to the bot. It also helped us understand that the full term that ChatGPT was using was ‘Search Engine Optimization’, and it was maybe a keyword or part of a prompt that we needed to mention more, instead of only using ‘SEO’ as an abbreviation.
There are a lot of cool use cases to discover.”
Pieter, what's the key takeaway from the tip you shared today?
“The key takeaway for me is, if you're not yet using log files, if you're not yet analysing them, make sure you start doing it.
Just by starting, it's not that complicated. It looks like a lot, it looks like something only technical SEOs can do, but it's really not. Maybe you'll need some help setting it up the first time, but there are a lot of cool tools out there that will easily help you. There are also a lot of YouTube tutorials to be found.
As soon as it’s set up, you will be able to start digging – and its kind of addictive, actually, when you start to get to know what’s happening on your website, start learning from it, start playing with it, and start testing. That's really my key takeaway.
Sometimes it will do nothing, whatever you test, but sometimes you really get results out of it. That's what SEO is about, and I don't think you can do SEO without seeing data. This is really a way of gathering data that's untouched, so far, for a lot of websites.”
Pieter Serraris is the SEO lead at OMcollective, and you can find him over at OMcollective.com.