O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Better Sitemap (Mozilla Drumbeat)

Project proposal on how SItemap 0.90 can be improved.

  • Entre para ver os comentários

Better Sitemap (Mozilla Drumbeat)

  1. 1. Better Sitemap U-Zyn Chua [email_address] December 12, 2009 Mozilla Drumbeat Challenge Singapore This work is licensed under a Creative Commons Attribution 3.0 License. All other trademarks, logos and copyrights are the property of their respective owners.
  2. 2. Sitemap 0.90 U-Zyn Chua [email_address]
  3. 3. <ul><li>XML </li></ul><ul><li>List of URLs </li></ul><ul><li>For URL discovery </li></ul><ul><li>Robot-friendly </li></ul><ul><li>Max of 10MB/50k URLs per file </li></ul>U-Zyn Chua [email_address]
  4. 4. <ul><li><?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> </li></ul><ul><li><urlset xmlns=&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/</loc> </li></ul><ul><li><priority>1.000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/3dwh_dmca.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/cpanel/domain</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/edu/</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/new.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/overview.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/privacy.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/program_policies.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/seminars.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/terms.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/testimonials.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/admins/tour.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/administration.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/benefits.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/calendar.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/customers/asu.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/customers/pdfs/asu_success_story.pdf</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/details.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/features.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/gmail.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/pagecreator.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/seminars.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/startpage.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul><ul><li><url> </li></ul><ul><li><loc>http://www.google.com/a/help/intl/en/edu/talk.html</loc> </li></ul><ul><li><priority>0.5000</priority> </li></ul><ul><li></url> </li></ul>U-Zyn Chua [email_address] <ul><li>Messy </li></ul><ul><li>Huge (google.com’s – 3.9MB) </li></ul><ul><li>Useless (for human) </li></ul>
  5. 5. Improvements U-Zyn Chua [email_address]
  6. 6. <ul><li>For robots: </li></ul><ul><ul><li>Faster </li></ul></ul><ul><ul><li>More efficient </li></ul></ul><ul><li>For humans: </li></ul><ul><ul><li>More useful </li></ul></ul><ul><ul><li>At least readable by human web client – browser. </li></ul></ul><ul><ul><li>A browser uses about 5KB of bandwidth to download favicons. Why not use the bandwidth to download more useful material? </li></ul></ul>Aims U-Zyn Chua [email_address]
  7. 7. <ul><li>Site map </li></ul><ul><li>Parent page </li></ul><ul><li>Sibling pages </li></ul><ul><li>Children pages </li></ul><ul><li>Parsable by web browsers </li></ul>Hierarchical U-Zyn Chua [email_address]
  8. 8. Hierarchical U-Zyn Chua [email_address] Browser is able to tell user where he/she is at
  9. 9. <ul><li><lastmod> is in Sitemap 0.90 </li></ul><ul><li>But not sorted-by </li></ul><ul><li>Present sitemap in chronological order </li></ul>Chronological U-Zyn Chua [email_address]
  10. 10. Chronological U-Zyn Chua [email_address] Browser showing newly updated pages
  11. 11. <ul><li>Robots: </li></ul><ul><ul><li>Do not have to download huge sitemap files everytime </li></ul></ul><ul><ul><li>Only download first few chunks </li></ul></ul><ul><li>Browsers: </li></ul><ul><ul><li>Easily tell surfers where the newly updated content is located </li></ul></ul><ul><ul><li>(unlike RSS) not limited to blog/blog-like site. </li></ul></ul>Chronological U-Zyn Chua [email_address]
  12. 12. More Efficient (Draft) <ul><li>Multiple versions </li></ul><ul><ul><li>Chronological </li></ul></ul><ul><ul><ul><li>Robots do not have to download the whole sitemap for each crawl </li></ul></ul></ul><ul><ul><li>Hierarchical </li></ul></ul><ul><li>Seekable </li></ul><ul><ul><li>With header index </li></ul></ul><ul><ul><li>Only download needed portions </li></ul></ul>U-Zyn Chua [email_address]
  13. 13. More Efficient (Draft) <ul><li>Smarter </li></ul><ul><ul><li>Each page serves sitemap based on where client/user is at. </li></ul></ul><ul><ul><li>Do not have to download whole sitemap. </li></ul></ul><ul><ul><li>Do not have to parse whole sitemap. </li></ul></ul><ul><ul><li>Able to keep filesize small – approx. 5KB for browsers to load quickly. </li></ul></ul><ul><li>Switch away from XML? </li></ul>U-Zyn Chua [email_address]
  14. 14. Better Sitemap U-Zyn Chua [email_address] This work is licensed under a Creative Commons Attribution 3.0 License. All other trademarks, logos and copyrights are the property of their respective owners. <ul><li>For robots and humans alike </li></ul><ul><li>Chronological </li></ul><ul><li>Hierarchical </li></ul><ul><li>Seekable </li></ul><ul><li>Smarter </li></ul>Project Summary

×