In April 2019, Benjamin Burkholder (who is awesome, by the way) published a Medium article showing off a script he wrote that uses SERP result features to infer a user’s search intent. The script uses the SerpAPI.com API for its data and labels search queries in the following way:
- Informational — The person is looking for more information on a topic. This is indicated by whether an answer box or PAA (people also ask) boxes are present.
- Navigational — The person is searching for a specific website. This is indicated by whether a knowledge graph is present or if site links are present.
- Transactional — The person is aiming to purchase something. This is indicated by whether shopping ads are present.
- Commercial Investigation — The person is aiming to make a purchase soon but is still researching. This is indicated by whether paid ads are present, an answer box is present, PAAs are present, or if there are ads present at the bottom of the SERP.
This is one of the coolest ways to estimate search intent, because it uses Google’s understanding of search intent (as expressed by the SERP features shown for that search).
The one problem with Burkholder’s approach is its reliance on the Serp API. If you have a large set of search queries you want to find intent for, you need to pass each query phrase through the API, which then actually does the search and returns the SERP feature results, which Burkholder’s script can then classify. So on a large set of search queries, this is time consuming and prohibitively expensive.
SerpAPI charges ~$0.01 per keyword, so analyzing 5,000 keywords will cost you $50. Running these results through Burkholder’s labeler script also takes 3 to 5 hours to get through these 5,000 keywords.
So I got to thinking: What if I adapted Burkholder’s approach so that, rather than use it to classify intent directly, I could use it to train a machine learning model that I would then use to classify intent? In other words, I’d incur one-time costs to produce my Burkholder-labeled training set, and, assuming it was accurate enough, I could then use that training set for all further classification, cost free.
With an accurate training set, anyone could label huge numbers of keywords super quickly, without spending a dime.
Finding a model
Hamlet Batista has written a few stellar posts about how to leverage Natural Language models like BERT for labeling intent.
In his posts, he uses an existing intent labeling model that returns categories from Kaggle’s Question Answering Dataset. While these labels can be useful, they are not really “intent categories” in line with what we typically think of for intent taxonomy categories and instead have labels such as Description, Entity, Human, Numeric, and Location.
He achieved excellent results by training a BERT encoder, getting near 90% accuracy in predicting labels for new/unlabeled search keywords.
The big question for me was, could I leverage the same tech (Uber’s Ludwig BERT encoder) to create an accurate model using the search intent labels I’d get from Burkholder’s code?
It turns out the answer is yes!
How to do it
Here’s how the process works:
1. Gather your list of keywords. If you’re planning on training your own model, I recommend doing so within a specific category/niche. Training on clothing-related keywords and then using that model to label finance related keywords will likely be significantly less accurate than training on clothing related keywords and then using that model to label other unlabeled clothing related keywords. That said, I did try using a model labeled on one category/niche to label another, and the results still seemed quite good to me.
2. Run Burkholder’s script over your list of keywords from Step 1. This will require signing up for SerpAPI.com and buying credits. I recommend getting labels for at least 10,000 search queries with this script to use for training. The more training data, the more accurate your model will likely be.
3. Use the labeled data from the previous step as your training data for the BERT model. Batista’s code to do this is very straightforward, and this article will guide you through the process. I was able to get about ~72% accuracy using about 10,000 labels of training data.
4. Use your model from Step 3 to label unlabeled search data, and then take a look at your results!
The results
I ran through this process using a huge list (13,000 keywords) of clothing/fashion-related search terms from SEMrush as my training data. My resulting model gets just about 80% accuracy.
It seems likely that training the model with more data will continue to improve its accuracy up to a point. If any of you attempt it and improve on 80% accuracy, I would love to hear about it. I think with 20,000+ labeled searches, we could see up to maybe 85-90% accuracy.
This means when you ask this model to predict the intent of unlabeled search queries, 8 times out of 10 it will give you the same label as what would have been returned by Burkholder’s Serp API rules-based classifier. It can also do this for free, in large volumes and incredibly fast.
So something that would have taken a few thousand dollars and days of scraping can now be done for free in just minutes.
In my case I used keywords from a related domain (makeup) instead of clothing keywords, and overall I think it did a pretty good job. Labeling 5,000 search queries took under two minutes with the BERT model. Here’s what my results looked like:
The implications
For SEO tools to be useful, they need to be scalable. Keyword research, content strategy, PPC strategy, and SEO strategy usually rely on being able to do analysis across entire niches/themes/topics/websites.
In many industries, the keyword longtails can extend into the millions. So a faster, more affordable approach to Burkholder’s solution can make a lot of difference.
I forsee AI and machine learning tools being used more and more in our industry, enabling SEOs, paid search specialists, and content marketers to gain superpowers that haven’t been possible before these new AI breakthroughs.