Latest trends in big data and career opportunities1. Sujay Chungath
Founder Director, Netscitus Corporation
Latest Trends in Big Data and
Career Opportunities
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
2. AGENDA
Time : 9 AM to 10 AM IST
Table of Contents
● Netscientium - Who are we ?
● What is Big Data and relevance
● Latest trends
● Career opportunities
● Q&A
● Our offerings
● Contact
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
3. NETSCIENTIUM, WHO ARE WE ?
● Netscientiun is the Knowledge
Initiative of Netscitus Corporation, a
company with base in India and USA
● Netscientium is specialized in giving
online and offline trainings in Big
Data Technologies
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
4. WHAT IS BIG DATA AND ITS RELEVANCE
Health information
exchange Gene sequencing,
Serialization,
Healthcare service
quality
improvements
Drug Safety
Banks and
Financial services
Modeling True Risk,
Threat Analysis, Fraud
Detection, Trade
Surveillance, Credit
Scoring And Analysis
Retail
Point of sales
Transaction
Analysis,
Customer Churn
Analysis,
Sentiment Analysis
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
5. LIMITATIONS OF EXISTING TECHNOLOGIES
A meagre 10%
of the 2PB
Data is
available for BI
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
EPL Compute Grid
Storage
Storage only Grid (original Raw Data)
Processing
2. Moving data to compute doesn’t
scale.
1. Can’t
explore
original high
fidelity raw
data.
90% of the
2PB
archived
3.
Premature
data death
Mostly
Append
Collection
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
6. HADOOP ADVANTAGE
BI Reports + Interactive Apps
RDBMS (Aggregated Data)
Hadoop: Storage + Compute Grid
Both
Storage
And
Processing
No Data
Archiving
1. Data Exploration
& Advanced
analytics
3. Keep
Data Alive
forever
Mostly
Append
Collection
Entire 2PB
Data is
available for
processing
2. Scalable throughout for ETL &
aggregation
Instrumentation
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
7. CAREER OPPORTUNITIES - DATA SCIENTIST
Data Scientist
The [big] data scientist needs to be able to program
Python, R, Java, Ruby, Clojure, Matlab, Pig or SQL.
They need to have an understanding of Hadoop, Hive and/or MapReduce.
In addition the need to be familiar with disciplines such as:
Natural Language Processing: the interactions between computers and humans;
Machine learning: using computers to improve as well as develop algorithms;
Conceptual modeling: to be able to share and articulate modelling;
Statistical analysis: to understand and work around possible limitations in models;
Predictive modeling: most of the big data problems are towards being able to predict future
outcomes;
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
8. CAREER OPPORTUNITIES-- BIG DATA ENGINEER
Role - Big Data Engineer / BigData Development / Bigdata Architect
• A software Engineer who is expert in Java / C / C++ => HADOOP (APIs, MR Coding, Ecosystem &
Admin ) => HIVE/PIG/IMPALA/ML => OOZIE Plus Monitoring.
• Architect, Design & Develop Bigdata based software from scratch / Upgrade / Mainitain.
• A software Engineer who is expert in ORACLE / PL/SQL/ MS SQL / TERRADATA / DATA WAREHOUSING
=> HADOOP (APIs, MR Coding, Ecosystem & Admin ) => HIVE/PIG/IMPALA/ML => OOZIE Plus
Monitoring tools.
• Architect, Design & Develop Bigdata based data ware house
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
9. CAREER OPPORTUNITIES - HADOOP DBA
• Role - Big Data DBA
Design and Development of Data modelling.
Hadoop ecosystem installation and configuration.
DR / Cluster to Clysters - Database backup and recovery.
Database connectivity and security.
Performance monitoring and tuning ; Configuration based
Disk space management.
Software patches and upgrades for Unix as well as Hadoop
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
10. CAREER OPPORTUNITIES - HADOOP ADMINISTRATOR
● Role - Big Data Admin
• Good Linux and shell Scripting background
• Good knowledge of Hadoop Ecosystem and technologies.
• Understanding of Hadoop design principals and factors that affect distributed system
performance, including hardware and network considerations.
• Experience in providing Infrastructure Recommendations, Capacity Planning and develop
utilities to monitor cluster better
• Experience around managing large clusters with huge volumes of data
• Experience with cluster maintenance tasks such as creation and removal of nodes, cluster
monitoring and troubleshooting. Manage and review Hadoop log files.
• Experience installing and implementing security for Hadoop clusters.
• Installing Hadoop Updates, patches and version upgrades.
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
11. CAREER OPPORTUNITIES - HADOOP OPERATIONS
BigData – Production Support / Operations
• Good Linux and shell Scripting background
• Good knowledge of Hadoop Ecosystem and technologies.
• Cluster maintenance
• Job Management / Job failures / Investigation / Restart
• Autosys / Oozie integration
• Data analysis – Data recovery
• Cluster to Cluster data movement
• Escalations
• Operations management.
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
12. CONTACT US
❖ India
➢ Email
■ careermanager@netscientium.com
■ smitha@netscientium.com
■ Phone +91 9008587999
❖ USA
➢ Email
■ careermanager@netscientium.com
➢ Phone
Website http://netscientium.com/
careermanager@netscientium.com
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com
13. WRITE TO US FOR NEXT WEBINAR
Note : Please write a mail to us with your feedback and following details to get the updates on next
webinar
Use coupon code ‘WEBINAR-11’ and your mail id used today to avail Rs.2000/- off in our trainings in
August and September 2015.
© Netscientium All Rights Reserved 2015 Email:careermanager@netscientium.com