SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Big Data
What is Big data?
 ‘Big Data’ is similar to ‘small data’, but bigger in size.
 but having data bigger it requires different approaches:
-Techniques, tools and architecture
 Big data is a term for data sets that are so large or complex
that traditional data processing applications are inadequate
to deal with them.
Sources of Big Data
Social Media Data
Black Box Data
Stock Exchange Data
Transport Data
Power Grid Data
Search Engine Data
 Social Media Data: Social media such as Facebook and Twitter
hold information and views posted by millions of people across
the globe.
 Black Box Data: It is a component of helicopter, airplanes, and
jets, etc. It captures voices of the flight crew, recordings of
microphones and earphones, and the performance information of
the aircraft.
 Stock Exchange Data: The stock exchange data holds
information about the ‘buy’ and ‘sell’ decisions made on a share
of different companies made by the customers.
 Transport Data: Transport data includes model, capacity,
distance and availability of a vehicle.
 Search Engine Data: Search engines retrieve lots of data
from different databases.
 Power Grid Data: The power grid data holds information
consumed by a particular node with respect to a base
station.
Three Vs of Big Data
Velocity
• Data speed
Volume
• Data
quantity
Variety
• Data Types
Velocity
 high-frequency stock trading algorithms reflect market
changes within microseconds
 machine to machine processes exchange data between
billions of devices
 on-line gaming systems support millions of concurrent users,
each producing multiple inputs per second.
Volume
• A typical PC might have had 10 gigabytes of storage in 2000.
• Today, Facebook ingests 600 terabytes of new data every
day.
• The smart phones, the data they create and consume;
sensors embedded into everyday objects will soon result in
billions of new, constantly-updated data feeds containing
environmental, location, and other information, including
video.
Variety
 Big Data isn't just numbers, dates, and strings. Big Data is
also geospatial data, 3D data, audio and video, and
unstructured text, including log files and social media.
 Traditional database systems were designed to address
smaller volumes of structured data, fewer updates or a
predictable, consistent data structure.
 Big Data analysis includes different types of data.
Challenges
Storage
Searching
Sharing
Transfer
Analysis
Hadoop
History of Hadoop
 Hadoop was created by computer scientists Doug Cutting and
Mike Cafarella in 2005.
 It was inspired by Google's MapReduce, a software framework
in which an application is broken down into numerous small
parts.
 Doug named it after his son’s toy elephant.
 In November 2016 Apache Hadoop became a registered
trademark of the Apache Software Foundation.
What is Hadoop?
 Hadoop is an open source, Java-based programming framework
that supports the processing and storage of extremely large data
sets in a distributed computing environment.
 Hadoop runs applications using the mapreduce algorithm, where
the data is processed in parallel on different CPU nodes.
 Its distributed file system facilitates rapid data transfer
rates among nodes and allows the system to continue operating in
case of a node failure.
 Hadoop can perform complete statistical analysis for a huge
amount of data.
Hadoop Architecture
HADOOP
MapReduce
(Distributed Computation)
HDFS
(Distributed Storage)
YARN
Framework
Common
Utilities
HADOOP COMMON:
 Common refers to the collection of common utilities and
libraries that support other Hadoop modules.
 These libraries provides file system and OS level abstraction
and contains the necessary Java files and scripts required to
start Hadoop.
HADOOP YARN:
 Yet Another Resource Negotiator
 a resource-management platform responsible for managing
computing resources in clusters and using them for
scheduling of users' applications
HDFS
 Hadoop Distributed File System.
 Hadoop file system that runs on top of existing file system
 Designed to handle very large files with streaming data
access patterns
 Uses blocks to store a file or parts of a file.
HDFS - Blocks
File Blocks
 64MB (default), 128MB (recommended) – compare to 4 KB in
UNIX
 Behind the scenes, 1 HDFS block is supported by multiple
operating system (OS) blocks
 Fits well with replication to provide fault tolerance and
availability
. . .
128 MB
OS Block
HDFS Block
Advantages of blocks
 Fixed size – easy to calculate how many fit on a disk
 file can be larger than any single disk in the network
 If a file or a chunk of the file is smaller than the block
size, only needed space is used. Eg: 420MB file is split as:
128 MB 128 MB 128 MB 36 MB
HDFS -Replication
 Blocks with data are replicated to multiple nodes
 Allows for node failure without data loss
Writing a file to HDFS
Big data ppt
Big data ppt
Big data ppt
Big data ppt
Big data ppt
Apache PIG
What is Pig?
 Pigs Eat Anything
Pig can operate on data whether it has metadata or not. It
can operate on data that is relational, nested, or
unstructured. And it can easily be extended to operate on
data beyond files, including key/value stores, databases, etc.
 Pigs Live Anywhere
Pig is intended to be a language for parallel data
processing. It is not tied to one particular parallel
framework. It has been implemented first on Hadoop, but we
do not intend that to be only on Hadoop.
 Pigs Are Domestic Animals
Pig is designed to be easily controlled and modified by its
users.
 Pig Latin was designed to fit in a sweet spot between the
declarative style of SQL, and the low-level, procedural style
of MapReduce.
 Apache Pig is a platform for analyzing large data sets that
consists of a high-level language for expressing data analysis
programs, coupled with infrastructure for evaluating these
programs.
 Pig's infrastructure layer consists of
 a compiler that produces sequences of Map-Reduce
programs,
 Pig's language layer currently consists of a textual
language called Pig Latin.
KEY PROPERTIES OF PIG LATIN
Ease of programming. It is trivial to achieve parallel
execution of simple, "embarrassingly parallel" data
analysis tasks. Complex tasks comprised of multiple
interrelated data transformations are explicitly
encoded as data flow sequences, making them easy to
write, understand, and maintain.
Optimization opportunities. The way in which tasks
are encoded permits the system to optimize their
execution automatically, allowing the user to focus on
semantics rather than efficiency.
Extensibility. Users can create their own functions to
do special-purpose processing.
JAQL
INTRODUCTION
 Jaql (JAQL) is a functional data processing and query language
most commonly used for JSON query processing on BigData.
 It started as an Open Source project at Google.
 IBM took it over as primary data processing language for
their Hadoop software package BigInsights.
 It supports a variety of other data sources like CSV, TSV, XML.
 Jaql is one of the languages that helps to abstract
complexities of MapReduce programming framework within
Hadoop.
 It’s a loosely typed functional language with lazy evaluation(it
means that Jaql functions are not materialized until they are
needed).
 Jaql allows us to process both structured and nontraditional
data.
 Jaql’s query language was inspired by many programming
and query languages, including Lisp, SQL, XQuery, and Pig.
What we can do with Jaql?
 Access and load data from different sources (local file
system, web, twitter, HDFS, HBase, …)
 Query data (databases)
 Transform, aggregate and filter data
 Write data into different places (local file system, HDFS,
HBase, databases, …)
Setup to run Jaql
Command prompt
Eclipse environment.
There are two choices for your Jaql environment :-
TO RUN JAQL FROM A COMMAND
WINDOW
 Open a command window (Terminal).
 Change to the Jaql bin directory. cd
$BIGINSIGHTS_HOME/jaql/bin
 Start the Jaql shell. ./jaqlshell
Jaql basics
Statement, assignment and comments :
 jaql> "Hello world";
"Hello world“
 jaql> a = 10*2;
jaql> a;
20
 jaql> // This is a comment
jaql> /* and this is also
a comment */
 Double and single quotes are treated the same. Semicolon
terminates a statement.
Data Types
 null – null
 boolean – true, false
 string – “hi”
 long – 10
 double – 10.2, 10d, 10e-2
 array – [1, 2, 3]
 record – {a : 1, b : 2}
 others as jaql extensions – decfloat, binary, date,
schema, function, comparator, regex

Mais conteúdo relacionado

Mais procurados (20)

PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Our big data
Our big dataOur big data
Our big data
 
Big Data
Big DataBig Data
Big Data
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data
Big dataBig data
Big data
 

Semelhante a Big data ppt (20)

Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data
Big dataBig data
Big data
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
paper
paperpaper
paper
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
CSB_community
CSB_communityCSB_community
CSB_community
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 

Último

Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 

Último (16)

Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 

Big data ppt

  • 2. What is Big data?  ‘Big Data’ is similar to ‘small data’, but bigger in size.  but having data bigger it requires different approaches: -Techniques, tools and architecture  Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them.
  • 3. Sources of Big Data Social Media Data Black Box Data Stock Exchange Data Transport Data Power Grid Data Search Engine Data
  • 4.  Social Media Data: Social media such as Facebook and Twitter hold information and views posted by millions of people across the globe.  Black Box Data: It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.  Stock Exchange Data: The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
  • 5.  Transport Data: Transport data includes model, capacity, distance and availability of a vehicle.  Search Engine Data: Search engines retrieve lots of data from different databases.  Power Grid Data: The power grid data holds information consumed by a particular node with respect to a base station.
  • 6. Three Vs of Big Data Velocity • Data speed Volume • Data quantity Variety • Data Types
  • 7. Velocity  high-frequency stock trading algorithms reflect market changes within microseconds  machine to machine processes exchange data between billions of devices  on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
  • 8. Volume • A typical PC might have had 10 gigabytes of storage in 2000. • Today, Facebook ingests 600 terabytes of new data every day. • The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.
  • 9. Variety  Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media.  Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure.  Big Data analysis includes different types of data.
  • 12. History of Hadoop  Hadoop was created by computer scientists Doug Cutting and Mike Cafarella in 2005.  It was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts.  Doug named it after his son’s toy elephant.  In November 2016 Apache Hadoop became a registered trademark of the Apache Software Foundation.
  • 13. What is Hadoop?  Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment.  Hadoop runs applications using the mapreduce algorithm, where the data is processed in parallel on different CPU nodes.  Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating in case of a node failure.  Hadoop can perform complete statistical analysis for a huge amount of data.
  • 15. HADOOP COMMON:  Common refers to the collection of common utilities and libraries that support other Hadoop modules.  These libraries provides file system and OS level abstraction and contains the necessary Java files and scripts required to start Hadoop. HADOOP YARN:  Yet Another Resource Negotiator  a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications
  • 16. HDFS  Hadoop Distributed File System.  Hadoop file system that runs on top of existing file system  Designed to handle very large files with streaming data access patterns  Uses blocks to store a file or parts of a file.
  • 17. HDFS - Blocks File Blocks  64MB (default), 128MB (recommended) – compare to 4 KB in UNIX  Behind the scenes, 1 HDFS block is supported by multiple operating system (OS) blocks  Fits well with replication to provide fault tolerance and availability . . . 128 MB OS Block HDFS Block
  • 18. Advantages of blocks  Fixed size – easy to calculate how many fit on a disk  file can be larger than any single disk in the network  If a file or a chunk of the file is smaller than the block size, only needed space is used. Eg: 420MB file is split as: 128 MB 128 MB 128 MB 36 MB
  • 19. HDFS -Replication  Blocks with data are replicated to multiple nodes  Allows for node failure without data loss
  • 20. Writing a file to HDFS
  • 27. What is Pig?  Pigs Eat Anything Pig can operate on data whether it has metadata or not. It can operate on data that is relational, nested, or unstructured. And it can easily be extended to operate on data beyond files, including key/value stores, databases, etc.  Pigs Live Anywhere Pig is intended to be a language for parallel data processing. It is not tied to one particular parallel framework. It has been implemented first on Hadoop, but we do not intend that to be only on Hadoop.  Pigs Are Domestic Animals Pig is designed to be easily controlled and modified by its users.
  • 28.  Pig Latin was designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of MapReduce.  Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.  Pig's infrastructure layer consists of  a compiler that produces sequences of Map-Reduce programs,  Pig's language layer currently consists of a textual language called Pig Latin.
  • 29. KEY PROPERTIES OF PIG LATIN Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain. Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency. Extensibility. Users can create their own functions to do special-purpose processing.
  • 30. JAQL
  • 31. INTRODUCTION  Jaql (JAQL) is a functional data processing and query language most commonly used for JSON query processing on BigData.  It started as an Open Source project at Google.  IBM took it over as primary data processing language for their Hadoop software package BigInsights.  It supports a variety of other data sources like CSV, TSV, XML.
  • 32.  Jaql is one of the languages that helps to abstract complexities of MapReduce programming framework within Hadoop.  It’s a loosely typed functional language with lazy evaluation(it means that Jaql functions are not materialized until they are needed).  Jaql allows us to process both structured and nontraditional data.  Jaql’s query language was inspired by many programming and query languages, including Lisp, SQL, XQuery, and Pig.
  • 33. What we can do with Jaql?  Access and load data from different sources (local file system, web, twitter, HDFS, HBase, …)  Query data (databases)  Transform, aggregate and filter data  Write data into different places (local file system, HDFS, HBase, databases, …)
  • 34. Setup to run Jaql Command prompt Eclipse environment. There are two choices for your Jaql environment :-
  • 35. TO RUN JAQL FROM A COMMAND WINDOW  Open a command window (Terminal).  Change to the Jaql bin directory. cd $BIGINSIGHTS_HOME/jaql/bin  Start the Jaql shell. ./jaqlshell
  • 36. Jaql basics Statement, assignment and comments :  jaql> "Hello world"; "Hello world“  jaql> a = 10*2; jaql> a; 20  jaql> // This is a comment jaql> /* and this is also a comment */  Double and single quotes are treated the same. Semicolon terminates a statement.
  • 37. Data Types  null – null  boolean – true, false  string – “hi”  long – 10  double – 10.2, 10d, 10e-2  array – [1, 2, 3]  record – {a : 1, b : 2}  others as jaql extensions – decfloat, binary, date, schema, function, comparator, regex