{"id":39716,"date":"2016-03-02T15:00:06","date_gmt":"2016-03-02T23:00:06","guid":{"rendered":"http:\/\/www.bruceclay.com\/blog\/?p=39716"},"modified":"2019-08-08T11:06:43","modified_gmt":"2019-08-08T18:06:43","slug":"rankbrain-google-machine-learning-system-smx","status":"publish","type":"post","link":"https:\/\/www.bruceclay.com\/blog\/rankbrain-google-machine-learning-system-smx\/","title":{"rendered":"RankBrain: What Do We Know About Google’s Machine-Learning System? #SMX"},"content":{"rendered":"
SEO is\u00a0very tactical and we always try to look behind the curtain of Google’s algorithms. So, it’s no surprise that we all want to know more about RankBrain. RankBrain is Google’s machine learning system that they confirmed<\/a> out of the blue in October 2015.<\/p>\n In this session, we’ll learn about RankBrain via the studies done by our presenters.<\/p>\n Moderator<\/strong>:<\/p>\n Danny Sullivan,\u00a0Founding Editor, Search Engine Land (@dannysullivan<\/a>)<\/p>\n Speakers:<\/strong><\/p>\n Danny Sullivan says that a\u00a0few months ago, Google said, “Oh hey, by the way, we have this new thing called RankBrain and it’s the third most important signal that factors into ranking.” SEOs asked, “Can you tell us about that?” Google said no. We still want to understand it.<\/p>\n Tober’s company Searchmetrics studied Google results to understand:<\/p>\n He had two full-time people work for\u00a0three weeks to collect the RankBrain data he is presenting here today.<\/p>\n Before we talk about details and key findings, it’s important to look at machine learning (ML)<\/strong> and artificial intelligence (AI)<\/strong>\u00a0first. So they dug through patents and papers.<\/p>\n ML is not AI<\/strong><\/p>\n Machine learning is an algorithm that improves over time. Deep learning aims to bridge the gap between ML and AI and it solves more complex problems. Human-like intelligence (AI) is the end game.<\/p>\n Examples of ML:<\/p>\n The limits of machine learning are shown in the recent news that Google’s machine learning system beat a human in the game Go:<\/p>\n Deep learning and AlphaGo:<\/p>\n There’s one project that sticks out in their research of Google patents, and that’s the work of Geoffrey Hinton:<\/p>\n Here’s Hinton’s\u00a0research in a simplified nutshell. Let’s talk about thought vectors. Imagine empty space. Every word in the world has a position and a proximity to other words. Google can map queries in search in that space so you get the proximity and distance of different queries.<\/p>\n When a query such as “What’s the weather going to be like in California?”<\/em> is searched, RankBrain can interpret it as “Weather Forecast California.” Using training data, similar query sentences (with similar results) are closely positioned.<\/p>\n Searches for “credit card” and “debit card” will be close to each other in space. Results scoring is where Google results rank better based on proximity. Good results, with higher relevance, have higher proximity to each other.<\/p>\n Here was the Searchmetrics Hypothesis: Traditional ranking factors can no longer make sense for typical organic rankings. This isn’t for all queries. Google said that Google RankBrain is not used on every query. When it is used, it is the third most important ranking signal.<\/p>\n Brief study background:<\/p>\n In order to do this analysis, they had to remove all previous understanding of ranking factors. This includes backlinks and internal links, keyword in title, word count and interactive elements used. They cleared the slate of their presupposed ranking factors in order to get the new ranking signals.<\/p>\n Backlinks:<\/strong><\/p>\n We see that for ecommerce keywords, there is a\u00a0positive correlation between rankings and backlinks. However, for health and loan queries, rankings and backlinks have a negative correlation.<\/p>\n Internal links:<\/strong><\/p>\n For loan queries, the average number of internal links is less than 100. For ecommerce, the average number of internal links is much higher.<\/p>\n Keyword in Title:<\/strong><\/p>\n Here’s a traditional SEO<\/a> optimization tactic. In this analysis, they accounted for stemming and variations.<\/p>\n In the loan category, only 10 percent of pages have the keyword in the title tag.<\/p>\n Word count and interactive elements:<\/strong><\/p>\n For e-commerce and health, pages that are more successful have more content and more interactive elements.<\/p>\n Why do traditional ranking factors fail to explain these examples? If you look behind the curtains, you see that Google has changed a lot, with Hummingbird adding in machine learning.<\/p>\n Results of Searchmetric’s study<\/strong><\/p>\n They emulated RankBrain and gave search results a relevance score. This score is based on how relevant a result is to a query. They found\u00a0around 25 relevance ranking factors to assess relevance<\/strong>.<\/p>\n Relevance factors are different for each keyword, so Tober says they can’t provide a table of all the relevance factors.<\/p>\n For example, 9 out of 10 ecommerce websites have a keyword “add to cart” function above the fold. Rank #9 does not. However, it has the highest relevance score of the top 30 and that’s why it ranks.<\/p>\n Another example: If you search for “best bluetooth headphones” then you’ll see:<\/p>\n See how many internal links for each result and compare their relevance scores.<\/p>\n And another example, for the keyword “natural detox” you’ll see the word count comparison for two results, with the higher ranking result having fewer words, fewer internal links and fewer interactive elements.<\/p>\n Content with a high relevance score matches user intention, is\u00a0logically\u00a0structured and\u00a0comprehensive, offers\u00a0a good\u00a0user experience, and deals with topics holistically.<\/strong> Holistic means that other topics related to the topic are covered on this page.<\/p>\n Key Findings<\/strong><\/p>\n Outlook for SEO<\/strong><\/p>\n When Google already knows the best results, RankBrain is not used. He believes RankBrain is filtering long-tail queries. Relevance is crucial for good rankings and RankBrain can detect how relevant our content is.<\/p>\n The Future of Search<\/strong><\/p>\n Tober says we’re in for an abundance of redundancy. He explains that machine learning and Searchmetrics share this philosophy, which he says applies for content creation, too: <\/p>\n Eric had brain surgery in 2003. Does that make him a RankBrain expert? No, but his company Stone Temple Consulting did a study of Google results\u00a0before and after RankBrain to see\u00a0what’s changed.<\/p>\n Notable quote from the Bloomberg article announcement:<\/p>\n “RankBrain interprets language, interprets your queries, in a way that has some of the gut feeling and guessability of people.”<\/em><\/p>\n Google is trying to understand the true meaning of queries and do a better job of providing results that are relevant.<\/p>\n Some basic language analysis concepts figure in here. Take, for example, stop words<\/strong>. Google has traditionally stripped stop words out of a query or indexed page to simplify the language analysis.<\/p>\n But there are places where this practice doesn’t work. Take the query “the office”<\/em> as an example. Someone wants the TV show, but Google may give results for your office. A more sophisticated language analysis might look for those two words capitalized in the middle of a sentence as a cue that this is a specific kind of “the office.”<\/p>\n Another example is “coach,”<\/em> which sometimes refers to a brand. Google may have had to manually patch the results if they find poor engagement on SERPs for a query like this. Here’s a way as humans we might find that the Coach brand is being referenced, which Google ultimately wants to be able to recognize algorithmically:<\/p>\n RankBrain is trying to understand different kinds of relationships in language. On the topic of RankBrain, Gary Illyes told Enge it had to do with “being able to represent strings of text in very high-dimensional space and ‘see’ how they relate to one another.”<\/p>\n RankBrain gets at the context, what other words are used in the same page, same paragraph, same sentence, etc. It can notice patterns in language usage:<\/p>\n Along with the example of the stop word “the” affecting the query, Enge also shows how certain words were hard for Google before RankBrain:<\/p>\n Example queries:<\/p>\n This is the example offered the the Bloomberg article.<\/p>\n Another troublesome example query that Gary Illyes shared in an interview with Enge: “Can you get 100% score on Super Mario without walkthrough?”<\/em><\/p>\n The word “without” was traditionally taken out of a query, so now the user isn’t getting the answer they were looking for. This is an example of how Google was able to improve the results with RankBrain.<\/p>\n From their database of 500K queries, they looked for examples of queries that in June\/July 205 Google didn’t understand and compared it to January 2016 results.<\/p>\n In the left example, you see that Google doesn’t answer the query about Ragnarok, and in the right we see that Google did answer the question.<\/p>\n To find queries that Google misunderstood, they removed a number of queries that either were not clear what the searcher meant, or that didn’t make sense in other ways. Here’s their end aggregate analysis:<\/p>\n How did the results improve? The 89 “better results” the study found were improved in three basic ways:<\/p>\n RankBrain did a good job of parsing the query and instructing Google’s back-end retrieval systems to get the right results. Enge believes that Google\u00a0can pass a better query not just to web results but also to featured snippets and other algorithms.<\/p>\n Enge shows a January 2016 SERP result of Abz Love in Wikipedia for the query “who is abs.”<\/em> He explains that Google now figures out to put weighted importance on the “who is” portion of the query, even though “abs” changed the way he spells his name.<\/p>\n Types of improvements they found included:<\/p>\n Phrase interpretation and word interpretation improved. About two-thirds of the time, the user query was pretty clear to a human and Google didn’t understand, while the other third of the time, the query wasn’t clear originally.<\/p>\n Impact on SEO<\/strong><\/p>\n There’s a growing importance on relevance and context in ranking. We’re seeing a shift in that direction, although he’s not sure that all of that is due to RankBrain. He doesn’t see a lot of direct impact on SEO from RankBrain right now. The biggest change happening in SEO is with content and comprehensiveness and relevance.<\/p>\n Summary of key SEO implications:<\/p>\n Danny Sullivan says his takeaway is, what are you really going to do any differently? Create great content for humans, which Google will reward like they’ve always said they do.<\/p>\n\n
Marcus Tober: Machine Learning Ranks Relevance<\/h2>\n
\n
\n
\n
\n
<\/a>
<\/p>\n
The Searchmetrics Study<\/h3>\n
\n
\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
\n
\n
\n
Eric Enge: What Is RankBrain and How Does it Work<\/h2>\n
<\/p>\n
<\/p>\n
\n
<\/p>\n
Stone Temple Consulting’s Study<\/h3>\n
<\/p>\n
\n
\n
\n
\n