O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

[MozCon 2019] Fixing the Indexability Challenge: A Data-Based Framework

[MozCon 2019] How do you turn an unwieldy 2.5 million-URL website into a manageable and indexable site of just 20,000 pages? Areej will share the methodology and takeaways used to restructure a job aggregator site which, like many large websites, had huge problems with indexability and the rules used to direct robot crawl. This talk will tackle tough crawling and indexing issues, diving into the case study with flow charts to explain the full approach and how to implement it.

  • Seja o primeiro a comentar

[MozCon 2019] Fixing the Indexability Challenge: A Data-Based Framework

  1. 1. Fixing the Indexability Challenge: A Data-Based Framework Areej AbuAli #MozCon 2019 slideshare.net/areejabuali@areej_abuali
  2. 2. I’m here to talk to you about a framework that I came up with to fix a client’s indexability challenge
  3. 3. But more importantly… (And I only realised this after I’d finished preparing this talk)
  4. 4. I’m also here to talk to you about how “Technical Problems are People Problems”
  5. 5. I first met this client back in 2017
  6. 6. They’re a job aggregator site
  7. 7. Their website was struggling
  8. 8. 89%YoY decrease in organic visibility 0 2000 4000 6000 8000 10000 12000
  9. 9. Organic Traffic Y1 X Y2 Y Change -46%
  10. 10. They barely ranked…
  11. 11. 0 100,000 200,000 300,000 400,000 500,000 600,000 Competitor OrganicVisibility Client indeed.co.uk monster.co.uk reed.co.uk
  12. 12. And their site was so massive; we struggled to crawl it
  13. 13. To make this work, we need to fix the fundamentals first
  14. 14. We need to fix their tech
  15. 15. So this talk is about my 18-month relationship with this client
  16. 16. It’s about what worked, what didn’t work and what I would have done differently
  17. 17. The Initial Findings
  18. 18. Links Tech Content I started working on a comprehensive audit
  19. 19. I ended up with a 70-page document
  20. 20. There was a total of 50 recommendations
  21. 21. Some of the main things included were…
  22. 22. 72% of backlinks came from only 3 referring domains
  23. 23. Their on-page content was full of duplication and missing the basics
  24. 24. There were NO sitemaps
  25. 25. Canonical tags were not set up correctly
  26. 26. And their internal linking structure was a nightmare
  27. 27. Every recommendation was outlined using a traffic light system
  28. 28. And every section was split into Problem, Effect & Solution
  29. 29. We had a half-day audit handover meeting where I walked them through all of our recommendations
  30. 30. Everyone was in good spirits
  31. 31. Yet I couldn’t help but feel it was not enough…
  32. 32. Something was missing
  33. 33. Everything I recommended up till now was solid
  34. 34. But I had a gut feeling that due to the nature of their site…
  35. 35. These recommendations wouldn’t quite cut it
  36. 36. I had to go back to the drawing board
  37. 37. The Supplementary Findings
  38. 38. Also Known As…
  39. 39. The Findings I Should’ve Found The First Time Round But Didn’t So I’m Choosing To Call It Supplementary Findings To Sound Like An Expert.
  40. 40. They’re a job aggregator site – in essence, they’re a job search engine
  41. 41. That means that every single search conducted could, potentially, be crawled and/or indexed if it wasn’t built right
  42. 42. That’s equivalent to an infinite number of potential searches!
  43. 43. Which part was their site not getting right?
  44. 44. I knew that it was impossible to crawl
  45. 45. The one time I tried to fully crawl the site, it returned over 2.5 million URLs
  46. 46. And I could only crawl it by excluding massive sections
  47. 47. And if their pages couldn’t be crawled, then they would never be indexed properly…
  48. 48. And they won’t rank
  49. 49. So there were three problems that needed to be fixed
  50. 50. Crawling Indexing Ranking
  51. 51. It was apparent that there were no rules in place to help direct robots
  52. 52. Unique URLs were created, using up all possible filter combinations
  53. 53. This can create a potentially-unlimited number of URLs
  54. 54. And all the pages looked exactly the same – they just had a list of jobs!
  55. 55. Google was wasting crawl budget by crawling duplicate thin pages and attempting to index them
  56. 56. The ‘Aha!’ Moment
  57. 57. It was apparent that there were no rules in place to help direct robots
  58. 58. I needed to create a customised framework that instructs bots on what to do and what not to do
  59. 59. The job aggregator industry seemed to be doing one of two things
  60. 60. Limiting indexable pages à miss out on ranking opportunity
  61. 61. Limiting indexable pages à miss out on ranking opportunity Not limiting indexable pages à wind up with reduced link equity
  62. 62. My framework was going to do neither
  63. 63. Instead it would use search volume data to determine which pages are valuable to search engines
  64. 64. So how exactly would this work?
  65. 65. It starts off by passing the search query through a keyword combination script
  66. 66. This script outputs different combinations for the search conducted
  67. 67. It does that by changing the order of keywords to see all possible combinations
  68. 68. Digital Marketing Manager London
  69. 69. Digital Marketing Manager London London Digital Marketing Manager Digital Marketing London Manager
  70. 70. 4! = 24 combinations
  71. 71. These combinations will increase based on the search query and filters applied
  72. 72. Even though Google will regard most of these combinations as the same…
  73. 73. The script will help avoid duplicating pages that have different versions of the same thing
  74. 74. It then searches the database to see if this job is available
  75. 75. If (Job # Available) Load a page stating so and no-index it
  76. 76. If (Job = Available) Search the keyword database for all keyword combinations from the script
  77. 77. Fetch search volume data for these keyword combinations
  78. 78. For search volume data, we recommended using keywordtool.io API
  79. 79. If (SearchVolume > 50) A page is then created using the highest SV keyword combination that is both indexed and crawled
  80. 80. If (SearchVolume < 50) Load page for users but no-index it
  81. 81. Your search volume cut-off can be updated at anytime and is based on what makes sense for your industry
  82. 82. There is always the possibility of errors occurring
  83. 83. What if there’s a tie-breaker? (Several keyword combinations having the same search volume)
  84. 84. Create an indexed crawlable page based on the keyword used in the user search query
  85. 85. What if the API was down? (And you’re unable to generate search volume data)
  86. 86. Load a no-index page for usability and don’t store the query in the database
  87. 87. As for their internal linking structure…
  88. 88. This was the status of their header
  89. 89. This is what we recommended
  90. 90. We also provided internal linking recommendations for their footer and job advertisement pages
  91. 91. And an exact breakdown of their filter system on job search pages
  92. 92. As for their sitemaps – they had none
  93. 93. We recommended creating and splitting them up
  94. 94. Blog Sitemap Job Advertisements Sitemap Job Results Sitemap Ancillary Sitemap
  95. 95. The final step was to help them sort out their content
  96. 96. Even if they were only indexing high search volume pages; their content was very thin
  97. 97. Their chances to rank would still be minimal
  98. 98. They had the same H1,Title Tag & Meta Description for every filtered indexed page
  99. 99. Most competitors automatically generate optimised meta tags
  100. 100. They needed to do the same for indexable filter pages
  101. 101. Other than a company page and a handful of blog posts, there were no core content pages
  102. 102. So we performed in-depth keyword research and opportunity analysis to see what content generates traffic
  103. 103. And we provided a content audit and strategy to go with it
  104. 104. My ‘Aha!’ Moment felt complete
  105. 105. This was the piece of the puzzle that was missing
  106. 106. Four Months Later
  107. 107. The client confirm that they implemented everything
  108. 108. The first thing that caught my attention was their site went from over 2.5M crawled pages to 20K
  109. 109. Which initially felt like good news…
  110. 110. Until I realised that their traffic had declined…
  111. 111. Remember this? Organic Traffic Y1 X Y2 Y Change -46%
  112. 112. Organic Traffic Y1 X Y2 Y Y3 Z Change -86%
  113. 113. The Mini Audit Findings
  114. 114. Also Known As…
  115. 115. The Findings I’m Rushing To Find In A Panic To Prove That They Haven’t Implemented My Recommendations Accurately, Hence I Am Still An Expert.
  116. 116. I went through the original list of 50 recommended actions
  117. 117. At that point, there were 29 that had not been implemented
  118. 118. I also discovered ten new issues that were affecting their indexability
  119. 119. Google was choosing to only index 20% of the submitted sitemap
  120. 120. Googlebot will choose to visit their site less if their indexability is not in check
  121. 121. The client explained that they’ve added canonical tags and felt that was enough
  122. 122. They’re relying on Googlebot to: 1) Crawl the pages
  123. 123. They’re relying on Googlebot to: 1) Crawl the pages 2) Find the canonical tags
  124. 124. They’re relying on Googlebot to: 1) Crawl the pages 2) Find the canonical tags 3) Then choose to ignore them
  125. 125. Canonical tags are simply hints for bots
  126. 126. Google might decide to ignore the tags and pick other pages to index
  127. 127. Almost 80K pages had been indexed despite not being submitted via the sitemap
  128. 128. And some pages were included in the sitemap but not indexed
  129. 129. /job/finance-manager-liverpool
  130. 130. /job/finance-manager-liverpool?index= 57271b7c~4&utm_campaign=job- detail&utm_source=search- result&utm_medium=website
  131. 131. Because this page was the one discovered via internal links
  132. 132. Over 5K similar pages with parameters were getting indexed
  133. 133. Your main goal is to maximise crawl budget
  134. 134. You cannot use canonical tags as a sticking plaster to fix that
  135. 135. This implementation was still incomplete
  136. 136. I had to change my way of conveying this message
  137. 137. I put a stop to the endless stream of emails and scheduled a face-to-face meeting
  138. 138. We reviewed each and every single remaining task and discussed them in detail
  139. 139. Behold the wonders of Google Sheets!
  140. 140. We re-prioritised tasks and put estimated completion dates
  141. 141. It was not an easy meeting but it felt productive
  142. 142. Where are we now?
  143. 143. On a personal level, I discovered that I suffer from Imposter Syndrome
  144. 144. This was my constant state of mind
  145. 145. I was working closely with the CTO throughout this project
  146. 146. I felt he didn’t trust me or my knowledge
  147. 147. So, what would I have done differently?
  148. 148. If I could go back in time, I would realise what the actual problem was
  149. 149. All technical problems are people problems
  150. 150. The SEO recommendations were solid
  151. 151. Getting it implemented was the hard part
  152. 152. As a tech SEO, the most you can do is to influence priorities
  153. 153. You have no control
  154. 154. In this instance, I didn’t manage to persuade him to implement the recommendations
  155. 155. I also learned that the way I’ve been doing SEO audits is plain wrong
  156. 156. I always focused on delivering a set of comprehensive actions
  157. 157. Instead, maybe I should just deliver a SINGLE recommendation
  158. 158. And once that’s implemented…
  159. 159. Then, and only then, will I recommend another
  160. 160. And maybe I shouldn’t recommend Nice-To-Do’s…
  161. 161. Until there are only Nice-To-Do’s left to do
  162. 162. Because they are simply a distraction from the main problem
  163. 163. This talk does not have a happy ending…
  164. 164. It is not a successful case study…
  165. 165. There’s no upward visibility graph and page one rankings for me to show off…
  166. 166. This talk is about real life
  167. 167. It’s about a framework I created that fixes indexability issues that I’m proud of and I know in my gut *works*
  168. 168. So I wanted to share it with you
  169. 169. Because I can see this applied across many sites in plenty of industries
  170. 170. And I’d love forYOU to implement it
  171. 171. So I’m going to share my full methodology with you
  172. 172. slideshare.net/areejabuali
  173. 173. bit.ly/mozcon-areej
  174. 174. And just remember…
  175. 175. Getting the basics right is so fundamental
  176. 176. If you can do nothing else, just do the tech.
  177. 177. Areej AbuAli Slides: slideshare.net/areejabuali Tweet things @areej_abuali
  178. 178. References (1) http://3.bp.blogspot.com/_zKFo7OwbIp8/TLkK8i5FrlI/AAAAAAAAEeY/oeXBnenxnWU/s1600/uphill3_l.jpg (2) https://static.independent.co.uk/s3fs-public/thumbnails/image/2018/08/16/16/handshake.jpg (3) http://www.subversivecopyeditor.com/.a/6a010536d5bba5970c022ad359d336200c-pi (4) https://pbs.twimg.com/tweet_video_thumb/Dd4TQ7jUQAA-umE.jpg (5) https://cdn-images-1.medium.com/max/1600/0*ZQDtKU0r347Sta1J.jpg (6) https://b.kisscc0.com/20180923/xzw/kisscc0-three-wise-monkeys-cartoon-black-and-white-animal-three-wise-monkeys-simple- 5ba75b8cdb7c39.781586971537694604899.png (7) https://upload.wikimedia.org/wikipedia/commons/c/ce/Puzzle_black-white_missing.jpg (8) http://d70c4c94fa2afb6dbbc0-2bb54928637cb07488eb9dfab3a7ca9e.r6.cf2.rackcdn.com/uploaded/t/0e5859330_1485453015_too-good-to- be-true-rotator.jpg (9) https://imgc.allpostersimages.com/img/print/posters/kim-warp-an-aerial-view-of-a-car-driving-along-an-infinity-shaped-road-new-yorker- cartoon_a-G-9164141-8419447.jpg