arrow Products
Glide CMS image
Glide CMS arrow
The powerful intuitive headless CMS for busy content and editorial teams, bursting with features and sector insight. MACH architecture gives you business freedom.
Glide Go image
Glide Go arrow
Enterprise power at start-up speed. Glide Go is a pre-configured deployment of Glide CMS with hosting and front-end problems solved.
Glide Nexa image
Glide Nexa arrow
Audience authentication, entitlements, and preference management in one system designed for publishers and content businesses.
For your sector arrow
Media & Entertainment
arrow
Built for any content to thrive, whoever it's for. Get your content out faster and do more with it.
Sports & Gaming
arrow
Bring fans closer to their passions and deliver unrivalled audience experiences wherever they are.
Publishing
arrow
Tailored to the unique needs of publishing so you can fully focus on audiences and content success.
For your role arrow
Technology
arrow
Unlock resources and budget with low-code & no-code solutions to do so much more.
Editorial & Content
arrow
Make content of higher quality quicker, and target it with pinpoint accuracy at the right audiences.
Developers
arrow
MACH architecture lets you kickstart development, leveraging vast native functionality and top-tier support.
Commercial & Marketing
arrow
Speedrun ideas into products, accelerate ROI, convert interest, and own the conversation.
Technology Partners
AWS image
AWS
arrow
Getty Images image
Getty Images
arrow
Brightcove image
Brightcove
arrow
Poool image
Poool
arrow
Solution Partners
Endava image
Endava
arrow
The App Lab image
The App Lab
arrow
Code Store image
Code Store
arrow
Polemic Digital image
Polemic Digital
arrow
Resources arrow
Developer Experience
arrow
Find out more how to work with Glide headless CMS, Glide Go, and Glide Nexa identity management.
Customer Support
arrow
Learn more about the unrivalled customer support from the team at Glide.
Documentation
arrow
User Guides and Technical Documentation for Glide Publishing Platform headless CMS, Glide Go, and Glide Nexa.
arrow Newsroom
News arrow
Comment arrow
Newsletter arrow

Trash or treasure? Unearthing Google's accidental secrets

A stash of leaked Google API documentation seems to shed light on just how many ways the search giant can analyse published information - even all the ways it claimed it couldn't.

by Rob Corbidge
Published: 15:00, 30 May 2024
Indiana Jones

In trying to assess the huge leak of internal Google search documentation made public in the past few days, my colleagues and I, and it seems the SEO community at large, have felt a bit like characters in Raiders of the Lost Ark, where we have all the pieces of a puzzle but lack the Staff of Ra to know how to unravel it.

Most of you will probably by now be somewhat familiar with the nature of what was revealed, even as the specifics are still being mined and discussed. In short, some 2500 pages of Google Search API documentation are now in the public domain, detailing 14,014 API features or attributes which Google can apparently use to analyse and categorise content and searches. 

You can read about the leak here and also the ongoing process of verification here and elsewhere.

We don't know where or how many of these API features/attributes are used or weighted. But what has caused such a stir in the SEO community and elsewhere is the number of attributes which Google insisted definitely do NOT factor in any search rankings which definitely DO appear in the API documentation. 

Is that the waft of porky pies we detect?

Mountain View's only public response so far has been thus: "We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation." 

Should we be surprised at the apparent revelations and contradictions to public statements revealed by the leak? Not really. Over the years a lot of very clever people in SEO have identified discernible patterns in Google search behaviour which contradict claims of what are supposedly not relevant to search rankings. 

Indiana Jones and The Raiders of The Lost Ark

To see those suspected but disavowed elements there plainly included in the API documentation seems only to reinforce the observers' case that Google rarely gives the whole story. 

I mean, we shouldn't expect it to when it comes to talking about its proprietary tech - but as others point out, you don't have to go out of your way to reputationally trash people making the claims when you know fine well they are true: there's a nice way to say no comment. 

There's been a denial of reality going on for some time. Even as Google public reps like John Mueller increasingly spoke in almost cryptic terms about how search worked, and Danny Sullivan conducted explanatory exercises in comparative ambiguity, it was obvious we weren't getting the full picture. We were just being told to ignore the evidence.

Let's all remember that "Don't Be Evil" got overwhelmed by size and market dominance years ago. We don't see much to suggest Google is especially loyal or honest to its own people, so why should it be more compelled to be open and truthful to you? Being in court is a permanent state of play for Google, and even there they apparently struggle to be exact with the facts.

The result of all this can only be to further distance the publishing industry from the search giant which eats the planet's advertising money, a scenario unlikely to overly trouble Google amidst its attempted pivot to become the world's prime destination for traffic and information. 

I'd like to proffer one example from the leak to suggest why Google's targeted goal of becoming The AI Generated Last Word on Anything and Everything isn't guaranteed to win people's trust, and it's not to do with using glue on your pizza or unleaded spaghetti.

In the documentation are pointers to Google identifying authors and treating them as entities in its system, even when they repeatedly told us that wasn't the case. 

Quelle surprise - actual human authors have value! Google knows it, and so does anyone else who ignored their denial and went ahead with author biographies to see improved results. 

Gen-AI content is only as good as its inputs, and killing off an industry which creates higher quality information would seem the kind of scorched earth policy that will leave them starving too. 

Ultimately, there is no spectacular revelation in the leak akin to exposing the algorithms - although I do assume OpenAI has tipped it into its maw and will use it to try and reverse engineer its own improved search. 

The leak will eventually be recorded as one of the many milestones in the erosion of trust which Google ultimately relies upon to conduct its (advertising) business. 

Trust is reputation, and reputation arrives on foot but leaves on horseback. 

Anyway, on with the edition….

Latest articles

Googles golden jail cell and the problem of the internet
Google's golden jail cell is a metaphor for the web, and we're all struggling to break out
arrow button
A robot gangster stealing content and data.
The rise of the robot gangsters
arrow button
A group of angry people fighting each other
The publisher's AI dilemma: my enemy's enemy is my enemy
arrow button

Ready to get started?

No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.

Book a demo