reply to this post rate flag

Google's "Panda search algorithm" leaked < DeBunkur >

FYI --

Someone at Google "accidentally" leaked an internal document that possibly sheds light on how the algorithm works. Michael Icon King found the leak and broke into it over the Memorial Day weekend

Read the whole thing here: https ipullrank com /google-algo-leak

Some takeaways I found interesting:

1. Google has what they call a "Golden Document" flag which appears to suggest they can manually increase the weighting of a page that fits their "gold standard" just in case the algo isn't doing the job.

2. Over 14,000 possible ranking signals / features included in the leak. Which would back up the large number Google tells us they possibly use.

3. Google does have an internal "Authority" figure they refer to as "siteAuthority".

4. Google has a system called "NavBoost" which appears to adjust the rankings based on user click data.

5. There's something referred to as "Baby Panda" added to the algo, it could be an indicator of the HCU or possibly just a stripped down version of Panda.

6. There actually is a sandbox though it appears to be focused on keeping spam from being ranked.

7. Chrome traffic is valued in the algorithm and appears to be a score on its own.

8. Google calls non-core algorithms "Twiddlers" which are systems that adjust the post-core rankings before presenting them to a user. It appears a lot of Twiddlers exist or have existed. Mike even references a Tweet where an ex-Googler talks about disabling Twiddlers and bringing down YouTube search on accident.

9. Google uses various demotion algorithms to remove a URL from the search results including: Anchor text mismatch, Exact Match Domains, Navigation demotion, product reviews, location, porn, SERPs.

10. Google stores all changes ever made to a page but appear to only used the last twenty snapshots/changes in ranking calculations.

11. The index only allows a certain number of tokens on a page to be considered for the ranking system. Once that max is hit it appears the rest of the content may not be used to impact the algorithm.

12. Google uses a specific YMYL score.

13. Content is vectorized and turned into embeddings to gauge how on or off topic a page is for the website.

14. Google has a classifier for "smallPersonalSite".

15. If a website has 50% or more video pages it is considered a "video-focused" site.

If this is real, it is super interesting. Y'all go read Mike's full article while I stay up all night trying to find this leak and digging into it more.


https://forums.craigslist.org/?ID=330988793