QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData. It understands dozens of file formats such as pdf/xls/doc file formats, image files, audio and video files, etc.
4. What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
5. What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
6. What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
• The file system manages access to both the content of files and the metadata
about those files.
7. What is MetaData?
Hadoop Based SQL and Big Data Analytics
Solution
• Metadata is simply “Data about Data”.
• In terms of file system, the metadata is the information about files like size of
file, time on which the file was created, last modified, type of file, owner of file
etc.
• The file system manages access to both the content of files and the metadata
about those files.
• Metadata characterizes data. It is used to provide documentation such that data
can be understood and more readily consumed by your organization. Metadata
answers the who, what, when, where, why, and how questions for users of the
data.
9. MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
10. MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
11. MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
• Since all the metadata and tags associated with a file are kept in a relational
database, you can leverage the existing infrastructure built around SQL to search
the data on the Hadoop cluster.
12. MetaData Extension with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides On-Ingest metadata extraction service where by extended
metadata can be extracted from the files on ingest and you don't need to worry
about running costly batch jobs later on. This enables the unstructured data on
cluster searchable readily as soon as its ingested.
• To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the
metadata for each file stored on Hadoop in a relational database.
• Since all the metadata and tags associated with a file are kept in a relational
database, you can leverage the existing infrastructure built around SQL to search
the data on the Hadoop cluster.
• It understands dozens of file formats such as pdf/xls/doc file formats, image files,
audio and video files, etc.
13. What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
14. What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
15. What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
16. What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
17. What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
• Adding data tags to the data based on some condition or unconditionally is called
Data Tagging.
18. What are Data Tags?
Hadoop Based SQL and Big Data Analytics
Solution
• Tag is a label attached to someone or something for the purpose of identification
or to give other information.
• A Data Tag is a tag attached to the data or file to provide extra information about
the data or file.
• Data tags can be used to categorize the data based on various criteria to manage
vast amount of data. Finally the data can be extracted, sorted and processed
based on these categories.
• Adding data tags to the data based on some condition or unconditionally is called
Data Tagging.
19. Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
20. Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
21. Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
22. Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
23. Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
24. Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Data Tagging
25. Data Tagging with QueryIO
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides advanced manual and automated data tagging feature which
allows you to define properties for files as they are being written to HDFS. It
automatically stores the basic MetaData files stored in HDFS and further extends
the MetaData layer by enabling you to define additional MetaData (Data Tags).
• Here again the tags defined for all the files on cluster are stored in a relational
database. It takes care of keeping the metadata and tags stored in database in
synch with the files stored on Hadoop cluster.
• Data tagging helps you to define data tag and operator which should be applied
on files on cluster. You can choose to define data tags using the table you have
already created using Hive DDL or can choose system defined MetaStore tables for
different file formats. You can also provide expressions to apply tags conditionally
on-ingest or on a scheduled time.
Tag
Tag
Tag
Data Tagging
27. Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
28. Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
29. Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
30. Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
31. Unconditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• QueryIO provides both conditional and unconditional Tagging. User can choose to
tag hand picked files or files in a particular folder.
• For that all the used need to do is open the HDFS data browser, choose the files
you want to tag, and click on “add tag” button.
• Unconditional tagging is useful when you want to tag the files whose HDFS
location is already known to you and the tagging is not dependent on other
attributes of file like file type, file length etc.
33. Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
34. Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
35. Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
36. Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
37. Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.
38. Conditional Data Tagging
Hadoop Based SQL and Big Data Analytics
Solution
• Conditions can be defined using the values of file attributes (Ex: if Length > 1000)
OR by parsing the content of the file (Ex: if NumberOfLines > 100).
• Also the tag value can be obtained by parsing the content of file (Ex:
NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
• Conditional data tags can be added on chosen file types or on all files present on
the HDFS cluster.