How Google Works: A Google Ranking Engineer’s Story #SMX

Google Software Engineer Paul Haahr has been at Google for more than 14 years. For two of them, he shared an office with Matt Cutts. He’s taking the SMX West 2016 stage to share how Google works from a Google engineer’s perspective – or, at least, share as much as he can in 30 minutes. After, Webmaster Trends Analyst Gary Illyes will join him onstage and the two will field questions from the SMX audience with Search Engine Land Editor Danny Sullivan moderating (jump to the Q&A portion!).

From left: Google Webmaster Trends Analyst Gary Illyes, Google Software Engineer Paul Haahr and Search Engine Land Editor Danny Sullivan on the SMX West 2016 stage in San Jose.

How Google Works

Haahr opens by telling us what Google engineers do. Their job includes:

Writing code for searches
Optimizing metrics
Looking for new signals
Combining old signals in new ways
Moving results with good ratings up
Moving results with bad ratings down
Fixing rating guidelines
Developing new metrics when necessary

Two parts of a search engine:

Ahead of time (before the query)
Query processing

Before the Query

Crawl the web
Analyze the crawled pages
- Extract links
- Render contents
- Annotate semantics
Build an index

The Index

Like the index of a book
For each word, a list of pages it appears on
Broken up into groups of millions of pages
Plus per-document metadata

Query Processing

Query understanding and expansion
Does the query name any known entities?
Retrieval and scoring
- Send the query to all the shards
  Each shard
  - Finds the matching pages
  - Computes a score for query+page
  - Sends back the top N page by score
- Combine all the top pages
- Sort by score
Post-retrieval adjustments
- Host clustering
- Is there duplication

Scoring Signals

A signal is:

A piece of information used in scoring
Query independent – feature of a page
Query dependent

Metrics

“If you cannot measure it, you cannot improve it” – Lord Kelvin

Relevance
- Does a page usefully answer the user’s query
- Ranking’s top-line metric
Quality
- How good are the results we show
Time to result (faster is better)

Google measures itself with live experiments:

A/B experiments on real traffic
Look for changes in click patterns
A lot of traffic is in one experiment or another

At one time, Google tested 41 different blues to see which was best.

Google also does human rater experiments:

Show real people experimental search results
Ask how the results are
Aggregate ratings across raters
Publish guidelines explaining criteria for raters
Tools support doing this in an automated way, similar to Mechanical Turk

Google judges pages on two main factors:

Needs Met (where mobile is front and center)
Page Quality

Needs Met grades:

Fully Meets
Very Highly Meets
Highly Meets
Moderately Meets
Slightly Meets
Fails to Meet

Page quality concepts:

Expertise
Authoritativeness
Trustworthiness

Google engineer development process:

Idea
Repeat until ready
- Write code
- Generate data
- Run experiments
- Analyze
Launch report by quantitative analyst
Launch review
Launch

What goes wrong?

There are two kinds of problems:

Systematically bad ratings
Metrics don’t capture the things we care about

Here’s an example of a bad rating. Someone searches for [Texas farm fertilizer] and the search result provides a map to the manufacturer’s headquarters. It’s very unlikely that that’s what they want. Google determines this through live experiments. If a rater sees the maps and rates it as “Highly Meets” needs, then this is a failing at the point of rating.

Or, what if the metrics are missing? In 2009-2011, there were lots of complaints about low-quality content. But relevance metrics kept going up, due to content farms. Conclusion: Google wasn’t measuring the metrics they needed to be. Thus, the quality metric was developed apart from relevance.

Here’s Paul Haahr’s slide deck, which is worth a look:
Update 7/19: Presentation has now been marked private by the author.

How Google Works: A Ranking Engineer's Perspective By Paul Haahr from Search Marketing Expo – SMX

Gary Illyes and Paul Haahr Answer Questions from the SMX Audience

SMX: How does RankBrain fit into all of this?

Haahr: RankBrain gets to see a subset of the signals. I can’t go into too much detail about how RankBrain works. We understand how it works but not as much what it’s doing. It uses a lot of the stuff that we’ve published about deep learning.

How would RankBrain know the authority of a page?

Haahr: It’s all a function of the training that it gets. It sees queries and other signals. I can’t say that much more that would be useful.

SMX: When you are logged into a Google app, do you differentiate by the information you gather? If you’re in Google Now vs. Chrome can that impact what you’re seeing?

Haahr: It’s really a question of if you’re logged in or not. We provide a consistent experience. Your browsing history follows you to either.

Does Google deliver different results for the same queries at different times in the day?

Illyes: I’m not sure. In Maps, for example, if we display something maps related we will show the hours. It doesn’t change what shows up, to Gary’s knowledge.

SMX: What’s going on with Panda and Penguin?

Illyes: I gave up on giving a date or timeline on Penguin. We are working on it, thinking about how to launch it, but I honestly don’t know a date and I don’t want to say a date because I was already wrong three or four times, and it’s bad for business.

SMX: Post-Google Authorship, how are you tracking author authority?

Haahr: There I’m not going to go into any detail. What I will say is the raters are expected to review that manually for a page that they are seeing. What we measure is: are we able to do a good job of serving results that the raters think are good authorities.

SMX: Does that mean authority is used as a direct or indirect factor?

Haahr: I wouldn’t say yes or no. It’s much more complicated than that and I can’t give a direct answer.

SMX: When explicit authorship ended, Google did say to keep having bylines. Should you bother with rel=author at all?

Illyes: There is at least one team that is still looking into using the rel=author tag just for the sake of future developments. If I were an SEO I would still leave the tag. It doesn’t hurt to have it. On new pages, however, it’s probably not worth it to have. Though we might use it for something in the future.

SMX: What are you reading right now?

Haahr: I read a lot of journalism and very few books. However, I just finished “City on Fire” – it’s about New York in the ’70s. There are 900 pages and I was disappointed when it ended. I’ve just started “It Can’t Happen Here.”

Kristi Kellogg is a journalist, news hound, professional copywriter, and social (media) butterfly. Currently, she is a senior SEO content writer for Conde Nast. Her articles appear in newspapers, magazines, across the Internet and in books such as "Content Marketing Strategies for Professionals" and "The Media Relations Guidebook." Formerly, she was the social media editor at Bruce Clay Inc.

See Kristi's author page for links to connect on social media.

Posted by Kristi Kellogg on March 3rd, 2016 at 3:30 pm

Comments (2)

Filed under: SEO — Tags: Google, Liveblog, SEO, SMX West 2016

Still on the hunt for actionable tips and insights? Each of these recent SEO posts is better than the last!

Bruce Clay on April 2, 2024

What Is SEO?

Bruce Clay on March 28, 2024

Google’s Explosive March Updates: What I Think

Bruce Clay on March 21, 2024

3 Types of E-commerce Product Reviews for SEO + Conversions

2 Replies to “How Google Works: A Google Ranking Engineer’s Story #SMX”

May 10, 2016 at 3:48 am

Augustus

I think the basic of SEO like link building, op-page / off-page optimization techniques are still the root of building a successful blog and get better ranking.

March 5, 2016 at 2:05 am

bharat

First time I read about the working of Google. This is a wonderful knowledge. Thanks for sharing this info with us.

How Google Works: A Google Ranking Engineer’s Story #SMX

How Google Works

What goes wrong?

Gary Illyes and Paul Haahr Answer Questions from the SMX Audience

2 Replies to “How Google Works: A Google Ranking Engineer’s Story #SMX”

LEAVE A REPLY Cancel reply

Contact Us Now.

RECENT SEO ARTICLES

How Google Works: A Google Ranking Engineer’s Story #SMX

How Google Works

What goes wrong?

Gary Illyes and Paul Haahr Answer Questions from the SMX Audience

2 Replies to “How Google Works: A Google Ranking Engineer’s Story #SMX”

LEAVE A REPLY Cancel reply

Contact Us Now.

RECENT SEO ARTICLES

Request a Quote!