SlideShare uma empresa Scribd logo
1 de 39
Hadoop Based SQL and Big Data Analytics
Solution
Hadoop Data Tagging and
Metadata Extension
Hadoop Based SQL and Big Data Analytics
Solution
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
• The file system manages access to both the content of files and the metadata
about those files.
What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
• The file system manages access to both the content of files and the metadata
about those files.
• Metadata characterizes data. It is used to provide documentation such that data
can be understood and more readily consumed by your organization. Metadata
answers the who, what, when, where, why, and how questions for users of the
data.
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
• Since all the metadata and tags associated with a file are kept in a relational
database, you can leverage the existing infrastructure built around SQL to search
the data on the Hadoop cluster.
MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
• Since all the metadata and tags associated with a file are kept in a relational
database, you can leverage the existing infrastructure built around SQL to search
the data on the Hadoop cluster.
• It understands dozens of file formats such as pdf/xls/doc file formats, image files,
audio and video files, etc.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
• Adding data tags to the data based on some condition or unconditionally is called
Data Tagging.
What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
• Adding data tags to the data based on some condition or unconditionally is called
Data Tagging.
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Data Tagging
Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Tag
Tag
Tag
Data Tagging
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
Download QueryIO Now!
http://QueryIO.com/download/big-data-analytics-download.html
OR
Take a Demo
http://demo.QueryIO.com/queryio
Hadoop Based SQL and Big Data Analytics
Solution
“Its Free”

Mais conteúdo relacionado

Destaque

Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformNavneet Gupta
 
It's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksIt's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksLucidworks
 
Requirements document for big data use cases
Requirements document for big data use casesRequirements document for big data use cases
Requirements document for big data use casesAllied Consultants
 
Enterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingEnterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingKnowledgent
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309DrVictorFang
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopDataWorks Summit
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoopskaluska
 
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...Amazon Web Services
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기beom kyun choi
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
 

Destaque (16)

Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data Platform
 
It's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksIt's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, Lucidworks
 
Requirements document for big data use cases
Requirements document for big data use casesRequirements document for big data use cases
Requirements document for big data use cases
 
Enterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingEnterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum Computing
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Hadoop Data Tagging and Metadata Extension

  • 1. Hadoop Based SQL and Big Data Analytics Solution
  • 2. Hadoop Data Tagging and Metadata Extension Hadoop Based SQL and Big Data Analytics Solution
  • 3. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution
  • 4. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”.
  • 5. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc.
  • 6. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc. • The file system manages access to both the content of files and the metadata about those files.
  • 7. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc. • The file system manages access to both the content of files and the metadata about those files. • Metadata characterizes data. It is used to provide documentation such that data can be understood and more readily consumed by your organization. Metadata answers the who, what, when, where, why, and how questions for users of the data.
  • 8. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution
  • 9. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested.
  • 10. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database.
  • 11. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database. • Since all the metadata and tags associated with a file are kept in a relational database, you can leverage the existing infrastructure built around SQL to search the data on the Hadoop cluster.
  • 12. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database. • Since all the metadata and tags associated with a file are kept in a relational database, you can leverage the existing infrastructure built around SQL to search the data on the Hadoop cluster. • It understands dozens of file formats such as pdf/xls/doc file formats, image files, audio and video files, etc.
  • 13. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution
  • 14. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information.
  • 15. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file.
  • 16. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories.
  • 17. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories. • Adding data tags to the data based on some condition or unconditionally is called Data Tagging.
  • 18. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories. • Adding data tags to the data based on some condition or unconditionally is called Data Tagging.
  • 19. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution
  • 20. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).
  • 21. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster.
  • 22. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.
  • 23. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.
  • 24. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time. Data Tagging
  • 25. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time. Tag Tag Tag Data Tagging
  • 26. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution
  • 27. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder.
  • 28. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button.
  • 29. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 30. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 31. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 32. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution
  • 33. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100).
  • 34. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
  • 35. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 36. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 37. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 38. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 39. Download QueryIO Now! http://QueryIO.com/download/big-data-analytics-download.html OR Take a Demo http://demo.QueryIO.com/queryio Hadoop Based SQL and Big Data Analytics Solution “Its Free”