O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

SEO for Large/Enterprise Websites - Data & Tech Side

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
SEO for Large Websites
SEO for Large Websites
Carregando em…3
×

Confira estes a seguir

1 de 148 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a SEO for Large/Enterprise Websites - Data & Tech Side (20)

Anúncio

Mais recentes (20)

Anúncio

SEO for Large/Enterprise Websites - Data & Tech Side

  1. 1. Doing SEO for large websites. Working on large websites, or large number of websites. Let’s talk about SEO at scale, with the enterprise.
  2. 2. 31 m
  3. 3. 1.8m & 220kg
  4. 4. 17x larger
  5. 5. 4,913x heavier
  6. 6. 1,084 T
  7. 7. x2
  8. 8. x2 x2
  9. 9. x2 x3 x2
  10. 10. SLOWER DIFFICULT TO WORK WITH
  11. 11. Working in a large organisation Working with data Technical Foundation Minimising Risk Scaling Content Reporting
  12. 12. Working in a large organisation Scaling Content Reporting Working with data Technical Foundation Minimising Risk
  13. 13. Working in a large organisation Scaling Content Working with data Technical Foundation Reporting Minimising Risk
  14. 14. Templates
  15. 15. Getting (& processing) data
  16. 16. Finding technical issues
  17. 17. Preventing technical issues
  18. 18. Templates
  19. 19. I would like a 1000 problems please.
  20. 20. “Please fix all 18,304 pages”
  21. 21. LIES
  22. 22. LIES 5 6
  23. 23. Category Home page Product Contact Us Obviously different
  24. 24. Small product number Main category page Out of stock product Extremes
  25. 25. Facet category page Reviews Page 2 Same page different URL
  26. 26. Country County City Area/District Street
  27. 27. Getting (& processing) data
  28. 28. Impressions week by week for new content
  29. 29. Pre change Post change Clicks pre and post change for site sections
  30. 30. Competing pages for a set of terms
  31. 31. SLOWER DIFFICULT TO WORK WITH SAMPLING
  32. 32. SLOWER DIFFICULT TO WORK WITH SAMPLING LIMITS
  33. 33. 1,000 rows at a time
  34. 34. SLOWER DIFFICULT TO WORK WITH SAMPLING LIMITS LAG
  35. 35. SLOWER DIFFICULT TO WORK WITH SAMPLING LIMITS LAG SEGMENTATION
  36. 36. Search console properties for a large brand.
  37. 37. Register all the things.
  38. 38. 5 sub-folders provided 260% more keywords
  39. 39. Part 1: Data Studio Part 2: Day by day data Part 3: Python Part 4: Data warehousing Get Get, Analyse Get, Store, Analyse, Report
  40. 40. Part 1: Data Studio Part 2: Day by day data Part 3: Python Part 4: Data warehousing
  41. 41. Data studio for extracting data ● Add a data source ● Create a table for it. ● Download the table. With both GA & GSC, you’ll get everything in the table, no paginating.
  42. 42. Part 1: Data Studio Part 2: Day by day data Part 3: Python Part 4: Data warehousing
  43. 43. Day by day data To get even more data we have to get it day by day. ● bit.ly/search-console-dat a-downloader This bit is Search Console only.
  44. 44. Part 1: Data Studio Part 2: Day by day data Part 3: Python Part 4: Data warehousing
  45. 45. Getting data from APIs Pull down your analytics data. ● Daily_google_analytics_v3 ● Getting search console data from the API
  46. 46. Getting data from APIs Pull down your analytics data. ● Daily_google_analytics_v3 ● Getting search console data from the API Getting started with pandas: ● Pandas tutorial with ranking data
  47. 47. Getting data from APIs Pull down your analytics data. ● Daily_google_analytics_v3 ● Getting search console data from the API Getting started with pandas: ● Pandas tutorial with ranking data As a workflow I’d highly recommend Jupyter notebooks for getting started. ● Why use jupyter notebooks? ● SearchLove Video (paid)
  48. 48. SEO Pythonistas A memorial and soon to be collection of Hamlet’s excellent work. SEO Pythonistas - In loving memory of Hamlet Batista @DataChaz
  49. 49. Part 1: Data Studio Part 2: Day by day data Part 3: Python Part 4: Data warehousing
  50. 50. Analyse Store data Get data Report
  51. 51. Analyse Store data Get data Report Takes time & space.
  52. 52. Analyse Store data Get data Report Takes time & space.
  53. 53. A developer could do it.
  54. 54. Rolling your own JC Chouinard has built a series of excellent granular tutorials which walk you through setting up one on your own machine. Link.
  55. 55. Off the shelf Get in touch with me! I run Piped Out which is software for building SEO data warehouses.
  56. 56. Finding technical issues
  57. 57. Part 1: Templates Part 2: Logs Part 3: Crawling Big
  58. 58. Part 1: Templates Part 2: Logs Part 3: Crawling Big
  59. 59. Not the same fields as a crawl. No page title for example.
  60. 60. ● Crawling & indexing problems
  61. 61. ● Crawling & indexing problems ● Measuring freshness
  62. 62. Time until article crawled
  63. 63. ● Crawling & indexing problems ● Measuring freshness ● Prioritisation
  64. 64. ● Crawling & indexing problems ● Measuring freshness ● Prioritisation ● Monitoring website changes (e.g. migrations)
  65. 65. Jun ‘19 Apr ‘19 Aug‘19 Oct ‘19 200 301 302 Status codes in product pages
  66. 66. Jun ‘19 Apr ‘19 Aug‘19 Oct ‘19 200 301 302 Status codes in product pages ELK
  67. 67. ● Crawling & indexing problems ● Measuring freshness ● Prioritisation ● Monitoring website changes (e.g. migrations) ● Debugging
  68. 68. Hi x I’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions about the log set-up (as well as with getting the logs!). What time period do we want? What we’d ideally like is 3-6 months of historical logs for the website. Our goal is to look at all the different pages search engines are crawling on our website, discover where they’re spending their time, the status code errors they’re finding etc. We can absolutely do analysis with a month or so (we've even done it with just a week or two), but it means we lose historical context and obviously we're more likely to lose things on a larger side. There are also some things that are really helpful for us to know when getting logs. Do the logs have any personal information in? We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be removed. Can we get logs from as close to the edge as possible? It's pretty likely you've got a couple different layers of your network that might log. Ideally we want those from as close to the edge as possible. This prevents a couple issues: ● If you've got caching going on, like a CDN or Varnish then if we get logs from after them, we won't see any of the requests they answer. ● If you've got a load balancer distributing to several servers sometimes the external IP gets lost (perhaps X-Forwarded-For isn't working), which we need to verify Googlebot or we accidentally only get logs from a couple servers. Are there any sub parts of your site which log to a different place? Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well. (Although of course if you're sending us CDN logs this won't matter.) How do you log hostname and protocol? It's very helpful for us to be able to see hostname & protocol. How do you distinguish those in the log files? Do you log HTTP & HTTPS to separate files? Do you log hostname at all? This is one of the problems that's often solved by getting logs closer to the edge, as while many servers won't give you those by default, load balancers and CDN's often will. Where would we like the logs? In an ideal world, they would be files in an S3 bucket and we can draw them down from there. If possible, we'd also ask that multiple files aren't zipped together for upload, because that makes processing harder. (No problem with compressed logs just, just zipping multiple log files into a single archive). Is there anything else we should know? Best, {x}
  69. 69. Part 1: Templates Part 2: Logs Part 3: Crawling Big
  70. 70. Sampling your crawl ● Limit your crawl percentage per template. i.e. ● 20% to product pages ● 30% to category pages
  71. 71. Low memory crawler Runs locally on your machine and allows you to crawl with a very low memory footprint. Doesn’t render JS or process data however.
  72. 72. Run SF in the cloud You can purchase a super high memory computer in the cloud, install SF on it and run it at maximum speed.
  73. 73. Preventing technical issues
  74. 74. Search console properties for a large brand.
  75. 75. Part 1: Manually crawling Part 2: Automating assertions Part 3: Unit testing
  76. 76. Change detection with SF
  77. 77. Change detection with SF
  78. 78. Part 1: Manually crawling change detection Part 2: Automating assertions Part 3: Unit testing
  79. 79. <meta name="robots" content="noindex">
  80. 80. <meta name="robots" content="noindex,nofollow"> <meta name="robots" content="noindex">
  81. 81. Is it different?
  82. 82. Is it the value I want? Is it different?
  83. 83. <meta name="robots" content="noindex,nofollow"> <meta name="robots" content="noindex">
  84. 84. Element Equals Title Big Brown Shoe - £12.99 - Example.com Status Code 200 H1 Big Brown Shoe Canonical <link rel="canonical" href="https:/ /example.com/product/big-brown-shoe" /> CSS Selector: #review-counter Any number CSS Selector: #product-data { "@context": "https:/ /schema.org/", "@type": "Product", "name": "Big Brown Shoe", "description": "The biggest brownest show you can find.", "sku": "0446310786", "mpn": "925872", }
  85. 85. Asserting with Google sheets
  86. 86. Asserting with Google sheets
  87. 87. Part 1: Manually crawling Part 2: Automating assertions Part 3: Unit testing
  88. 88. Unit tests
  89. 89. Create code Test code Deployment
  90. 90. Create code Test code Deployment All our hard work.
  91. 91. Create code Test code Deployment All our hard work.
  92. 92. Create code Test code Deployment
  93. 93. endtest.io
  94. 94. Conclusions
  95. 95. @dom_woodman bit.ly/seo-for-large-websites www.pipedout.com @dom_woodman

×