Quote:
Originally Posted by Marshal
Based on my experience with so-called "big data", I believe that "Google dance" is just a part of the regular sorting algorithm of huge amounts of information (really huge). So it takes weeks until the results settle down. Google usually re-sort all the results only every few months or so (with every new update). What follows up is that the minor changes after the initial "dance" are just a part of the new sites being added to the list and put in place. But I wouldn't exclude the possibility that dancing is a part of the CTR testing to see what result attracts most clicks.
(If you are interested to know more, don't get discouraged with the amount of information below. )
Let me back that up with some more information: The goal is that you need to sort ALL websites ever indexed for each and every keyword out there (the cut-off is at result #100). To get an idea of how big that operation is, you have the following input:
1. Number of keywords:
To get a better idea about the amount of data mentioned, let's imagine Google works only with the English language. Based on Oxford Dictionary there are 171,146 words currently in use in the English language. Then you need to figure out an amount of potential combination of all those words. There are 2, 3, 4, or even more words for one keyword. Then add (at least semantically) meaningful questions to that list. The number of possible combinations is counted in millions.
2. Number of websites:
There are 1.88 billion websites today (and growing) based on Statista data. Each of those websites has thousands and even millions of pages.
Imagine you have to sort all web pages (not only websites but their every page!) for each and every potential keyword out there. Those familiar with combinatorics already have an idea that the resulting number is insanely big. Even with all the huge computational power that Google has, it is impossible to compute all those results in real-time. It usually takes weeks to get the first result pages populated with meaningful results. It can take months to get results starting from page 2.
Multiply that with a lot of different languages. Then add major updates every few months to the equation (at least twice a year) and you will get an idea that it is impossible NOT to have "dancing" in place since it is simply impossible to get near-real-time results.
Google is probably is not crunching all that data all over again, so they probably use some caching instead, which saves them time. But every now and then they had to regenerate the complete database. Something like that probably happened last year, where they had their database "frozen" from May, since a big "bug" that they had, until late September. There's a high possibility that they either implemented some big (unplanned) change in their algorithms, or they had a major bug, so they most likely had to regenerate the complete database of results.
So, to put that simple: I would say that "dancing" is just a regular part of getting new results generated all over again.
|
The only problem with your analysis is that there is a lot of conjecture.
In my other life, I work in the SEO & Digital Marketing space and specialize in technical SEO. I also am an affiliate marketer. So let me give you a little perspective.
Your first point on keywords is bare, there's a reason for it.
The Google system doesn't need to "figure out" anything. It already has the largest crowd-sourced "figure outers" in the world - namely its searchers. They are the ones who search for these word combinations.
Now, G is a meticulous tracker. It tracks everything. And thanks to its other "Free" products viz. Google Analytics and Chrome - it can continue tracking.
So among the thousands of metrics, it tracks, let's analyze the base metrics first.
1. Volume - The number of times a particular keyword was searched.
2. CTR - Which links are clicked more.
3. Speed - Which site served the content the fastest, using the least amount of data.
4. Conversion - Which keywords are paying keywords and resulted in action on the end website (Forms filled, Add to Carts, Purchases)
5. Time Spent on Site, Pages Visited.
And finally, another keyword it tracks is "Intent".
The most common intents are
- Information
These are people looking for information on a subject
Example - "Symptoms of ED", "When to travel to Amsterdam?", "Best Porn Sites".
- Navigational
These are people who look for login pages of websites.
Example - "Sign up for ABN Credit Card", "US Visa Application Form" etc.
- Transactional
For affiliate marketers, e-commerce entities and SaaS companies - this is where the magic happens. These are people looking to buy stuff.
Example - "Cheapest Tube Script"
Beyond the above 3 main, there are more -
Specific Page Ones -
"German Wife Ravaged on Christmas Eve"
This is a tube site keyword. For a video someone likes. And wishes to access it.
This could come under Navigational. But for the sake of clarity, let's keep it as an outlier.
Google also harnesses NLP (Natural Language Processing) -
This is what tells it the difference between the use of the word "suspect" in
"The police found that the suspect had two penises" and "I suspect I may have two penises".
--------
Beyond this, Google has a product called "Adwords". The above metrics help Google decide the base price for a keyword (which is then inflated through a bidding war).
Now Google's tracking metrics far exceed beyond this. But for the sake of brevity, let's keep them limited till here.
Based on this, Google's paramount goal is to stay the "most relevant search engine".
A - It doesn't want to be gamed.
B - It doesn't want to serve low-quality pages. Because if people start finding crap on Google, they'll move to other search engines.
C - Because of its ulterior motive of taking over the world.
Since you mentioned "Big Data" - I'd like to point out that in Data Analytics - things happen through "Priority Buckets". A grouping mechanism of sorts.
What does that mean?
Google assimilates a set of keywords based on the above metrics and more. And then puts them in certain buckets.
This allows it to run controlled experiments on these particular buckets.
It also helps it to see the upheaval going on, place a cost on ranking, estimate traffic etc. for these buckets.
One keyword can be in several buckets.
So say -
Bucket A - High Traffic Keywords
Mesothelioma Lawyers
Viagra Cialis
Online Pharmacy No Prescription
Bucket B - Shady Websites
Viagra Cialis
Online Pharmacy No Prescription
Buy Fullz
(This is just an estimate. The buckets contain thousands, if not million keywords).
Now based on these, Google uses their proprietary algorithm and sometimes human intervention to rank websites. It sees what's going on, then readjusts the rankings, it keeps on doing it until the rankings stick with the winners.
It is the most complicated A/B Split testing ever.
Also, Buckets are not "Niche specific" or at the very least, they are "partly niche specific".
So for example, if you rank for "Lawyers in Baltimore" - you may see a change in rankings during an update.
But if you rank for "Property Dispute Baltimore Lawyer" - you may continue to rank.
-----
When you see a "Google Volatility Tracker" - their data is partially biased.
Example -
https://cognitiveseo.com/signals/ or
https://www.semrush.com/sensor/
Because these trackers cost a lot per month and the keywords being tracked here are mostly money keywords by agencies.
-----
And finally, to touch your mention of "real time". There is a "certain" realtime element to Google results. Meaning that Google indexes and ranks content within minutes of it being posted in some niches (Not talking about Google News).
Google has a LOT OF PROCESSING POWER. So much so that they are selling it through Google Cloud.
But think of Google as several small search engines, broken down in buckets. And not one big one. Every bucket behaves differently. Based on their internal risk markers.
The bucket you hit depends on your keyword search.