SlideShare uma empresa Scribd logo
1 de 9
Web crawling and database tables
• We want to crawl/scrap web
pages and get the proper
content to build standartize
database tables.
What can we use?
Google Search Tools
* Google uses structured data that it finds on the web to
understand the content of the page, as well as to gather
information about the web and the world in general.
* Structured data is a standardized format for providing
information about a page and classifying the page content;
for example, on a recipe page, what are the ingredients,
the cooking time and temperature, the calories, and so on.
Google search tools
• https://schema.org
Schema.org is a collaboration
between Google, Microsoft,
Yahoo! and Yandex - large search
engines who will use this marked-
up data from web pages.
* schema.org provide a normalize
about property, type and
descriptions of structured data
tags.
• The Google Structured Data
Testing Tool is an easy and
useful tool for validating your
structured data, and in some
cases, previewing a feature in
Google Search.
https://search.google.com/structu
red-data/testing-tool/
@type
@id
url
name
image
dateModified
totalTime
recipeYield
recipeIngredient
recipeInstructions
recipeCategory
keywords
recipeCuisine
cookTime
prepTime
"recipeIngredient": [
"1 (15 ounce) package double crust ready-to-use pie
crust",
"6 cups thinly sliced, peeled apples (6 medium)",
"3/4 cup sugar", "2 tablespoons all-purpose flour",
"3/4 teaspoon ground cinnamon",
"1/4 teaspoon salt",
"1/8 teaspoon ground nutmeg",
"1 tablespoon lemon juice"
]
There are structured data format and property
examples for recipe.
Inspect of source code with The Google Structured Data Testing Tool
from the point of structured data
• Search results of ‘yemek tarif’ on Google.
First page websites (03.03.2020; 14:00);
1. Yemek.com
2. Lezzet.com.tr
3. Refikaninmutfagi.com
4. Nefisyemektarifleri.com
Inspect of this web page’s source code
** Common issue of ‘yemek.com, nefisyemektarifleri.com, lezzet.com.tr’ is there is
no match on the main page but run the (javascript) code before.
On source code page (ctrl-f);
https://yemek.com/ // no match ‘recipeIngredient’
https://yemek.com/tarif/narenciyeli-hashasli-kek/ // match ‘recipeIngredient’
Website Useful Structured Data
Yemek.com
+
Lezzet.com.tr
+
Nefisyemektarifleri.com
+
Refikaninmutfagi.com
-
** yemek.com, nefisyemektarifleri.com, lezzet.com.tr have useful structured
data.
We crawl/scrape this sites with same settings and send a json, csv file or
database.
** refikaninmutfagi.com has not useful structured data. We set a specific
crawl format for this site.
yemek.com lezzet.com.tr nefisyemektarifleri.com refikaninmutfagi.com
@type @type @type @type
@id name @id @id
url image url url
name description mainEntityOfPage inLanguage
image recipeYield name
image recipeIngredient name datePublished
image recipeInstructions headline dateModified
dateModified prepTime description description
totalTime cookTime datePublished isPartOf
recipeYield author dateModified
recipeIngredient aggregateRating url
recipeInstructions keywords mainEntityOfPage
recipeCategory nutrition recipeYield
keywords recipeCategory prepTime
recipeCuisine recipeCuisine cookTime
cookTime video totalTime
prepTime recipeIngredient
description ingredients
author recipeInstructions
aggregateRating author
nutrition aggregateRating
keywords
nutrition
recipeCategory
recipeCuisine
video
• We extract (schema.org) microdata using scrapy.
https://blog.scrapinghub.com/2014/06/18/extracting-schema-org-
microdata-using-scrapy-selectors-and-xpath
* Alternative ways to scrape websites (Schema.org Microdata, JSON
Linked Data, internal JavaScript variables, and XHRs).
https://blog.apify.com/web-scraping-in-2018-forget-html-use-xhrs-
metadata-or-javascript-variables-8167f252439c
• End to end scrapy tutorial part I-IV (2019 sep).
https://towardsdatascience.com/a-minimalist-end-to-end-scrapy-
tutorial-part-i-11e350bcdec0

Mais conteúdo relacionado

Semelhante a Web crawling scraping

Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptxIntegrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptxBegum Kaya
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideAdam Audette
 
11 Actionable SEO Tips and Tricks You Can Use Today!
11 Actionable SEO Tips and Tricks You Can Use Today!11 Actionable SEO Tips and Tricks You Can Use Today!
11 Actionable SEO Tips and Tricks You Can Use Today!Daniel Bianchini
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET Journal
 
What is Structured Data?
What is Structured Data?What is Structured Data?
What is Structured Data?Abhishek Kumar
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptxScrbifPt
 
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...Jose Luis Hernando Sanz
 
Checking google index status at scale
Checking google index status at scaleChecking google index status at scale
Checking google index status at scaleBuiltvisible
 
BITM3730 11-14.pptx
BITM3730 11-14.pptxBITM3730 11-14.pptx
BITM3730 11-14.pptxMattMarino13
 
IST 561 Spring 2007--Session7, Sources of Information
IST 561 Spring 2007--Session7, Sources of InformationIST 561 Spring 2007--Session7, Sources of Information
IST 561 Spring 2007--Session7, Sources of InformationD.A. Garofalo
 
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...Tarun Gehani
 
The Technical SEO Full Course how to do
The Technical SEO  Full Course  how to doThe Technical SEO  Full Course  how to do
The Technical SEO Full Course how to doasadkhan888889990
 
SEO-HIGH TRAFFIC ROUTING
SEO-HIGH TRAFFIC ROUTINGSEO-HIGH TRAFFIC ROUTING
SEO-HIGH TRAFFIC ROUTINGBUDNET
 
Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Christopher Mbinda
 

Semelhante a Web crawling scraping (20)

Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptxIntegrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
Integrating Structured Data (to an SEO Plan) for the Win _ WTSWorkshop '23.pptx
 
Seo Tip 5
Seo Tip 5Seo Tip 5
Seo Tip 5
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive Guide
 
11 Actionable SEO Tips and Tricks You Can Use Today!
11 Actionable SEO Tips and Tricks You Can Use Today!11 Actionable SEO Tips and Tricks You Can Use Today!
11 Actionable SEO Tips and Tricks You Can Use Today!
 
Search engine
Search engineSearch engine
Search engine
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
 
What is Structured Data?
What is Structured Data?What is Structured Data?
What is Structured Data?
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
Checking Google Index Status at Scale using Node.js - Jose Hernando - Brighto...
 
Schema Tags In Seo
Schema Tags In SeoSchema Tags In Seo
Schema Tags In Seo
 
Checking google index status at scale
Checking google index status at scaleChecking google index status at scale
Checking google index status at scale
 
BITM3730 11-14.pptx
BITM3730 11-14.pptxBITM3730 11-14.pptx
BITM3730 11-14.pptx
 
IST 561 Spring 2007--Session7, Sources of Information
IST 561 Spring 2007--Session7, Sources of InformationIST 561 Spring 2007--Session7, Sources of Information
IST 561 Spring 2007--Session7, Sources of Information
 
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
How to Enhance Your SEO When Redesigning an Ecommerce Website - Tarun Gehani,...
 
Introduction to Microdata & Google Rich Snippets
Introduction to Microdata  & Google Rich SnippetsIntroduction to Microdata  & Google Rich Snippets
Introduction to Microdata & Google Rich Snippets
 
The Technical SEO Full Course how to do
The Technical SEO  Full Course  how to doThe Technical SEO  Full Course  how to do
The Technical SEO Full Course how to do
 
SEO-HIGH TRAFFIC ROUTING
SEO-HIGH TRAFFIC ROUTINGSEO-HIGH TRAFFIC ROUTING
SEO-HIGH TRAFFIC ROUTING
 
Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Search Engine Optimization (SEO)
Search Engine Optimization (SEO)
 
Chapter 8 part1
Chapter 8   part1Chapter 8   part1
Chapter 8 part1
 
Week10
Week10Week10
Week10
 

Último

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 

Último (20)

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 

Web crawling scraping

  • 1. Web crawling and database tables • We want to crawl/scrap web pages and get the proper content to build standartize database tables. What can we use?
  • 2. Google Search Tools * Google uses structured data that it finds on the web to understand the content of the page, as well as to gather information about the web and the world in general. * Structured data is a standardized format for providing information about a page and classifying the page content; for example, on a recipe page, what are the ingredients, the cooking time and temperature, the calories, and so on.
  • 3. Google search tools • https://schema.org Schema.org is a collaboration between Google, Microsoft, Yahoo! and Yandex - large search engines who will use this marked- up data from web pages. * schema.org provide a normalize about property, type and descriptions of structured data tags. • The Google Structured Data Testing Tool is an easy and useful tool for validating your structured data, and in some cases, previewing a feature in Google Search. https://search.google.com/structu red-data/testing-tool/
  • 4. @type @id url name image dateModified totalTime recipeYield recipeIngredient recipeInstructions recipeCategory keywords recipeCuisine cookTime prepTime "recipeIngredient": [ "1 (15 ounce) package double crust ready-to-use pie crust", "6 cups thinly sliced, peeled apples (6 medium)", "3/4 cup sugar", "2 tablespoons all-purpose flour", "3/4 teaspoon ground cinnamon", "1/4 teaspoon salt", "1/8 teaspoon ground nutmeg", "1 tablespoon lemon juice" ] There are structured data format and property examples for recipe.
  • 5. Inspect of source code with The Google Structured Data Testing Tool from the point of structured data • Search results of ‘yemek tarif’ on Google. First page websites (03.03.2020; 14:00); 1. Yemek.com 2. Lezzet.com.tr 3. Refikaninmutfagi.com 4. Nefisyemektarifleri.com
  • 6. Inspect of this web page’s source code ** Common issue of ‘yemek.com, nefisyemektarifleri.com, lezzet.com.tr’ is there is no match on the main page but run the (javascript) code before. On source code page (ctrl-f); https://yemek.com/ // no match ‘recipeIngredient’ https://yemek.com/tarif/narenciyeli-hashasli-kek/ // match ‘recipeIngredient’
  • 7. Website Useful Structured Data Yemek.com + Lezzet.com.tr + Nefisyemektarifleri.com + Refikaninmutfagi.com - ** yemek.com, nefisyemektarifleri.com, lezzet.com.tr have useful structured data. We crawl/scrape this sites with same settings and send a json, csv file or database. ** refikaninmutfagi.com has not useful structured data. We set a specific crawl format for this site.
  • 8. yemek.com lezzet.com.tr nefisyemektarifleri.com refikaninmutfagi.com @type @type @type @type @id name @id @id url image url url name description mainEntityOfPage inLanguage image recipeYield name image recipeIngredient name datePublished image recipeInstructions headline dateModified dateModified prepTime description description totalTime cookTime datePublished isPartOf recipeYield author dateModified recipeIngredient aggregateRating url recipeInstructions keywords mainEntityOfPage recipeCategory nutrition recipeYield keywords recipeCategory prepTime recipeCuisine recipeCuisine cookTime cookTime video totalTime prepTime recipeIngredient description ingredients author recipeInstructions aggregateRating author nutrition aggregateRating keywords nutrition recipeCategory recipeCuisine video
  • 9. • We extract (schema.org) microdata using scrapy. https://blog.scrapinghub.com/2014/06/18/extracting-schema-org- microdata-using-scrapy-selectors-and-xpath * Alternative ways to scrape websites (Schema.org Microdata, JSON Linked Data, internal JavaScript variables, and XHRs). https://blog.apify.com/web-scraping-in-2018-forget-html-use-xhrs- metadata-or-javascript-variables-8167f252439c • End to end scrapy tutorial part I-IV (2019 sep). https://towardsdatascience.com/a-minimalist-end-to-end-scrapy- tutorial-part-i-11e350bcdec0