Dan Peer Reviews Some Research: Top Keywords by Volume

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on reddit
Share on email

Dan Peer looks at the top 100 search terms on Yahoo! and Google by volume to see what marketers need to be thinking about. He also shares some of his favorite keywords for this year’s new SEO trends, including “content marketing.”


Dan here! SEO, in my opinion, is a research-based field. Garbage In, Garbage Out (GIGO) is one of my favorite ideas, which I’ll link to rather than explain, but I still want you to read! Because poor data leads to bad research, which leads to terrible methods, which leads to bad results, I believe it is critical to do intellectually honest and valid research.

If only our industry were more willing to accept peer review. This took me around 60 minutes in total, for anyone interested in peer evaluating other studies.

Today, I’m going to peer examine this paper shared on Twitter by Ryan Jones of Sapient Nitro, and provide some counter-evidence, conflicting, and superior research.


Here’s the research I’m going to look at. I’ll simply be honest: this is difficult research. This is why:

  1. It didn’t do any data processing at all (e.g. removing stop words and other common words). This implies that the study will focus on the most prevalent parts of speech, rather than insights gained through keyword selections. While subsequent reports claimed that the stop phrases were the goal, I’m not sure why that would ever be the case. I don’t believe this is a fair reason without further effort from the writers. Stop words are ineffective for categorizing themes (this includes things like intent, which is itself a theme classification). In any case, we utilize the NLTK library to pre-process our data at LSG. A simple use-case of the library is removing stop and other frequent terms. None of the insights are useful until the data is properly processed and cleaned. GIGO, as the saying goes.
  2. The collection of data. BrightEdge’s data set isn’t particularly good, and they’re not very open about how they collect it. If you’re going to study a term set that’s meant to be representative (150k keywords is a drop in the bucket compared to the whole search corpus of Google), make sure it’s as accurate as possible. If BrightEdge’s keyword corpus is less representative than, say, AHREFs, then the insights cannot be believed. GIGO once again.

Thankfully, while processing vast volumes of data, we at LSG know how to eliminate items like stop words and other typical components of writing. I was able to get what I believe to be a more effective keyword set for use in the study. And, as you’ll see when I guide you through it and show you the results, it’s simply a lot more practical.

The Investigation

After reading this tweet from AHREFs CMO and being fascinated, I received the top 100k terms by volume from AHREFs courtesy to the excellent Patrick Stox:

Quick SEO Suggestion:

An empty search in the @ahrefs Keywords Explorer gives you access to our whole 4 billion US keyword database (industry’s largest, by the way)!

Then use the S. volume, KD, and Word count filters to locate queries that are “high-vol, low-comp.”

It’s ideal for stumbling onto fresh chances! pic.twitter.com/BGfrlxQ45s

August 4, 2021 — Tim Soulo (@timsoulo)

The Methodology:

I grabbed the top 100,000 keywords in terms of volume and processed the ngrams as follows:

Ngram script being run in Jarvis Slack Bot

Then I analyzed the findings (which look like this)

ngram output

then put them through wordart.com’s word cloud generator. This is my favorite word cloud generator since it processes data quickly. You may get rid of frequent words, use stemming to get close alternatives, and experiment with the visual design. I would give it a ten out of ten.

And for those who want to fight about 100,000 vs. 150,000 keywords, this table should show you that the difference isn’t that significant in terms of whose drop of water is bigger:

some math on sample sizes

The End Result

Once frequent terms like “for” are removed from the study, there is meaningful information to be gained. Take a look!

word cloud of 100k top keywords

Spoiler alert: when you do good data analysis on data, you may uncover some valuable information! The first gram, “near,” is the most noticeable.

For a long time, I’ve maintained that all search is local search. For a long time, AJ Kohn has been saying it. This is due to the fact that it is the truth of the situation. The most important development that SEOs are overlooking is localization of search results. Firstly, since local search has traditionally been seen as a peculiar practice used by small businesses. I suppose their loss is our gain.

Another fascinating feature is “versus.” Comparison inquiries are quite common, and if they make sense for your content, you should use them. People that are winning in search are already doing so!

There are a few more takeaways from this that I would describe as basic but useful confirmation. People appreciate free items and stonks, therefore navigational searches are highly common.

Anyway, for anyone who want to look at the ngram data from the study for themselves, here it is. Please feel free to publish further research as long as you provide a link to it. I’m not going to publish the top 100k AHREFs data since you already know where to get it.


Get in Touch with your New
Digital Marketing Consultant Now!

- Dominate your search results.
- Save time by letting us do the work.
- Expand and protect your brand.
- Generate more leads for sales potential.
- Convert more leads for growth.
Scroll to Top