SlideShare uma empresa Scribd logo
1 de 20
SPARK MACHINE LEARNING
Certification Course Academic Year (2017-2018)
Done by:
K Teja Sreenivas
INTRODUCTION:
– Machine learning is a type of artificial intelligence (AI) that
allows software applications to become more accurate in
predicting outcomes without being explicitly programmed.
The basic premise of machine learning is to build
algorithm that can receive input data and use statistical
learning to predict an output value within an acceptable
range.
– Machine learning algorithms are often categorized as
being supervised or Unsupervised.
MACHINE LEARNING TYPES:
LIFE CYCLE IN DESIGNING A
MACHINE LEARNING MODEL
 1. Data collection
 2. Data processing
 3. Feature Engineering
 4. Model Building
 5. Model Evaluation
 6. Model evaluation
 7. Model Deployment
SPARK FOR MACHINE LEARNING:
• Spark is a distributed file system used in place of hadoop. Big Data is used over
network clusters and used as an essential application in several industries. The broad
use of Hadoop and MapReduce technologies shows how such technology is
constantly evolving. The increase in the use of Apache Spark, which is a data
processing engine, is testament to this fact.
• Superior abilities for Big Data applications are provided by Apache Spark when
compared to other Big Data Technologies like MapReduce or Hadoop. The Apache
Spark features are as follows:
1. Holistic framework
2. Speed
3. Easy to use
4. Enhanced support
PROBLEM STATMENT:
Prediction of Annual returns
using sets of weights which
are simulated using US stock
market historical data to
obtain their performances.
DATA SET ATTRIBUTE INFORMATION:
• The inputs are the weights of the stock-picking concepts as follows
X1=the weight of the Large B/P concept
X2=the weight of the Large ROE concept
X3=the weight of the Large S/P concept
X4=the weight of the Large Return Rate in the last quarter concept
X5=the weight of the Large Market Value concept
X6=the weight of the Small systematic Risk concept
The outputs are the investment performance indicators (normalized) as follows
Y1=Annual Return
Y2=Excess Return
Y3=Systematic Risk
Y4=Total Risk
Y5=Abs. Win Rate
Y6=Rel. Win Rate
TERMINOLOGY:
• P/B ratio : The price-to-book ratio, or P/B ratio, is a financial ratio used to compare a company's current market price to its
book value. It is also sometimes known as a Market-to-Book ratio.
• ROE: Return on equity (ROE) is the amount of net income returned as a percentage of shareholder equity. Return on
equity measures a corporation's profitability by revealing how much profit a company generates with the money
shareholders have invested.
• The S&P 500 measures the value of stocks of the 500 largest corporations by market capitalization listed on the New York
Stock Exchange or Nasdaq Composite. Standard & Poor's intention is to have a price that provides a quick look at the stock
market and economy.
• Return Rate: A rate of return is the gain or loss on an investment over a specified time period, expressed as a percentage
of the investment's cost. Gains on investments are defined as income received plus any capital gains realized on the sale of
the investment.
• market value: The amount for which something can be sold on a given market.
• Systematic Risk: Systematic risk is the risk inherent to the entire market or market segment. Systematic risk, also known
as “undiversifiable risk,” “volatility,” or “market risk,” affects the overall market, not just a particular stock or industry. This
type of risk is both unpredictable and impossible to completely avoid.
SOFTWARE TOOLS USED:
• SPARK
• SPYDER
• ANACONDA
• JUPYTER
• PYTHON
• VERTUAL MACHINE
• HDFS
from pyspark import SparkContext , SQLContext
sqlContext = SQLContext(sc)
#data collection:
data = sqlContext.read.csv('/home/tej/Documents/ML with spark/train.csv',header=True, sep=',')
data.show(n=5)
X_train =
data.select('Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_
Value','Small_systematic_Risk','systematic_risks','Annual_Return')
X_train=X_train.select(X_train.Large_ROE.cast('float'),X_train.Large_Return_Rate_last_quarter.cast
('float'),X_train.Large_Market_Value.cast('float'),X_train.Small_systematic_Risk.cast('float'),X_train.
systematic_risks.cast('float'),X_train.Large_BnP.cast('float'),X_train.Large_SnP.cast('float'),X_train.A
nnual_Return.cast('float'))
from pyspark.ml.feature import VectorAssembler,VectorIndexer,StringIndexer
assembler=VectorAssembler(inputCols=['Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_
last_quarter','Large_Market_Value','Small_systematic_Risk','systematic_risks'],outputCol='features')
X_train=assembler.transform(X_train)
featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures",
maxCategories=7).fit(X_train)
X_train=featureIndexer.transform(X_train)
from pyspark.ml.regression import LinearRegression
linear_reg = LinearRegression(labelCol='Annual_Return',featuresCol =
'indexedFeatures')
linear_reg_model = linear_reg.fit(X_train)
test_data = sqlContext.read.csv('/home/tej/Documents/ML with spark/test.csv',header=True, sep=',')
X_test=test_data.select('Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_Value',
'Small_systematic_Risk','systematic_risks','Annual_Return')
X_test=
test_data.select(X_test.Large_ROE.cast('float'),X_test.Large_Return_Rate_last_quarter.cast('float'),X_test.Large_Mark
et_Value.cast('float'),X_test.Small_systematic_Risk.cast('float'),X_test.systematic_risks.cast('float'),X_test.Large_BnP.c
ast('float'),X_test.Large_SnP.cast('float'),X_test.Annual_Return.cast('float'))
assembler =
VectorAssembler(inputCols=['Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_
Value','Small_systematic_Risk','systematic_risks'],outputCol='features')
X_test=assembler.transform(X_test)
featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=7).fit(X_test)
X_test=featureIndexer.transform(X_test)
linear_predictions = linear_reg_model.transform(X_test)
linear_predictions.show()
linear_predictions.select('Annual_Return','prediction').show()
CONCLUSION:
• From the final output it is clear that using linear model in training the data set we have
obtained predictions which show perdictions of annul returns with less than 0.1 unit
error on average.
key learning :
• we have learnt the basic uses of a machine learning and the uses of spark
in the implementation of the machine learning model.
• The various phases involved in the designing machine learning model in
understood and implemented using a machine learning Random forest model
•
THANKYOU !

Mais conteúdo relacionado

Semelhante a Spark machine learning

Leveraging Data Analysis for Sales
Leveraging Data Analysis for SalesLeveraging Data Analysis for Sales
Leveraging Data Analysis for SalesAditya Ratnaparkhi
 
A Study on Empirical Testing of Capital Asset Pricing Model
A Study on Empirical Testing of Capital Asset Pricing ModelA Study on Empirical Testing of Capital Asset Pricing Model
A Study on Empirical Testing of Capital Asset Pricing ModelProjects Kart
 
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world""Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world"Pavel Hardak
 
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldLessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldDatabricks
 
Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage reshmamajji123
 
2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpart2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpartAscendore Limited
 
Stock Market Prediction
Stock Market Prediction Stock Market Prediction
Stock Market Prediction SalmanShezad
 
Fin 550 Massive Success / snaptutorial.com
Fin 550  Massive Success / snaptutorial.comFin 550  Massive Success / snaptutorial.com
Fin 550 Massive Success / snaptutorial.comNorrisMistryzh
 
Know risk for mining industry 1
Know risk for mining industry 1Know risk for mining industry 1
Know risk for mining industry 1Ozdocs
 
Project Evaluation and Estimation in Software Development
Project Evaluation and Estimation in Software DevelopmentProject Evaluation and Estimation in Software Development
Project Evaluation and Estimation in Software DevelopmentProf Ansari
 
Chapter 2: Information Systems in Organizations
Chapter 2: Information Systems in OrganizationsChapter 2: Information Systems in Organizations
Chapter 2: Information Systems in Organizationsphak_09
 
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...PMI Pearl City Chapter
 
IRJET - Stock Recommendation System using Machine Learning Approache
IRJET - Stock Recommendation System using Machine Learning ApproacheIRJET - Stock Recommendation System using Machine Learning Approache
IRJET - Stock Recommendation System using Machine Learning ApproacheIRJET Journal
 
Risk Insight v1.0 User Guide
Risk Insight v1.0 User GuideRisk Insight v1.0 User Guide
Risk Insight v1.0 User GuideProtect724gopi
 
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...IOSR Journals
 
SDX EQ Presentation
SDX EQ PresentationSDX EQ Presentation
SDX EQ Presentationnimrodio
 
Are indian life insurance companies cost efficient ppt
Are indian life insurance companies cost efficient pptAre indian life insurance companies cost efficient ppt
Are indian life insurance companies cost efficient pptRam Pratap Sinha
 

Semelhante a Spark machine learning (20)

Leveraging Data Analysis for Sales
Leveraging Data Analysis for SalesLeveraging Data Analysis for Sales
Leveraging Data Analysis for Sales
 
I Know First Presentation (May 2016)
I Know First Presentation (May 2016)I Know First Presentation (May 2016)
I Know First Presentation (May 2016)
 
A Study on Empirical Testing of Capital Asset Pricing Model
A Study on Empirical Testing of Capital Asset Pricing ModelA Study on Empirical Testing of Capital Asset Pricing Model
A Study on Empirical Testing of Capital Asset Pricing Model
 
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world""Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
 
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldLessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
 
Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage
 
CAST HIGHLIGHT - Overview & Demos
CAST HIGHLIGHT - Overview & DemosCAST HIGHLIGHT - Overview & Demos
CAST HIGHLIGHT - Overview & Demos
 
2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpart2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpart
 
Stock Market Prediction
Stock Market Prediction Stock Market Prediction
Stock Market Prediction
 
Fin 550 Massive Success / snaptutorial.com
Fin 550  Massive Success / snaptutorial.comFin 550  Massive Success / snaptutorial.com
Fin 550 Massive Success / snaptutorial.com
 
Know risk for mining industry 1
Know risk for mining industry 1Know risk for mining industry 1
Know risk for mining industry 1
 
Project Evaluation and Estimation in Software Development
Project Evaluation and Estimation in Software DevelopmentProject Evaluation and Estimation in Software Development
Project Evaluation and Estimation in Software Development
 
Chapter 2: Information Systems in Organizations
Chapter 2: Information Systems in OrganizationsChapter 2: Information Systems in Organizations
Chapter 2: Information Systems in Organizations
 
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...
Dhaval Shah on "Strategic Alignment Of Projects For Higher Profits And Increa...
 
IRJET - Stock Recommendation System using Machine Learning Approache
IRJET - Stock Recommendation System using Machine Learning ApproacheIRJET - Stock Recommendation System using Machine Learning Approache
IRJET - Stock Recommendation System using Machine Learning Approache
 
Risk Insight v1.0 User Guide
Risk Insight v1.0 User GuideRisk Insight v1.0 User Guide
Risk Insight v1.0 User Guide
 
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
Gain Comparison between NIFTY and Selected Stocks identified by SOM using Tec...
 
WACC
WACCWACC
WACC
 
SDX EQ Presentation
SDX EQ PresentationSDX EQ Presentation
SDX EQ Presentation
 
Are indian life insurance companies cost efficient ppt
Are indian life insurance companies cost efficient pptAre indian life insurance companies cost efficient ppt
Are indian life insurance companies cost efficient ppt
 

Último

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Último (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Spark machine learning

  • 1. SPARK MACHINE LEARNING Certification Course Academic Year (2017-2018) Done by: K Teja Sreenivas
  • 2. INTRODUCTION: – Machine learning is a type of artificial intelligence (AI) that allows software applications to become more accurate in predicting outcomes without being explicitly programmed. The basic premise of machine learning is to build algorithm that can receive input data and use statistical learning to predict an output value within an acceptable range. – Machine learning algorithms are often categorized as being supervised or Unsupervised.
  • 4.
  • 5. LIFE CYCLE IN DESIGNING A MACHINE LEARNING MODEL  1. Data collection  2. Data processing  3. Feature Engineering  4. Model Building  5. Model Evaluation  6. Model evaluation  7. Model Deployment
  • 6. SPARK FOR MACHINE LEARNING: • Spark is a distributed file system used in place of hadoop. Big Data is used over network clusters and used as an essential application in several industries. The broad use of Hadoop and MapReduce technologies shows how such technology is constantly evolving. The increase in the use of Apache Spark, which is a data processing engine, is testament to this fact. • Superior abilities for Big Data applications are provided by Apache Spark when compared to other Big Data Technologies like MapReduce or Hadoop. The Apache Spark features are as follows: 1. Holistic framework 2. Speed 3. Easy to use 4. Enhanced support
  • 7. PROBLEM STATMENT: Prediction of Annual returns using sets of weights which are simulated using US stock market historical data to obtain their performances.
  • 8. DATA SET ATTRIBUTE INFORMATION: • The inputs are the weights of the stock-picking concepts as follows X1=the weight of the Large B/P concept X2=the weight of the Large ROE concept X3=the weight of the Large S/P concept X4=the weight of the Large Return Rate in the last quarter concept X5=the weight of the Large Market Value concept X6=the weight of the Small systematic Risk concept The outputs are the investment performance indicators (normalized) as follows Y1=Annual Return Y2=Excess Return Y3=Systematic Risk Y4=Total Risk Y5=Abs. Win Rate Y6=Rel. Win Rate
  • 9. TERMINOLOGY: • P/B ratio : The price-to-book ratio, or P/B ratio, is a financial ratio used to compare a company's current market price to its book value. It is also sometimes known as a Market-to-Book ratio. • ROE: Return on equity (ROE) is the amount of net income returned as a percentage of shareholder equity. Return on equity measures a corporation's profitability by revealing how much profit a company generates with the money shareholders have invested. • The S&P 500 measures the value of stocks of the 500 largest corporations by market capitalization listed on the New York Stock Exchange or Nasdaq Composite. Standard & Poor's intention is to have a price that provides a quick look at the stock market and economy. • Return Rate: A rate of return is the gain or loss on an investment over a specified time period, expressed as a percentage of the investment's cost. Gains on investments are defined as income received plus any capital gains realized on the sale of the investment. • market value: The amount for which something can be sold on a given market. • Systematic Risk: Systematic risk is the risk inherent to the entire market or market segment. Systematic risk, also known as “undiversifiable risk,” “volatility,” or “market risk,” affects the overall market, not just a particular stock or industry. This type of risk is both unpredictable and impossible to completely avoid.
  • 10. SOFTWARE TOOLS USED: • SPARK • SPYDER • ANACONDA • JUPYTER • PYTHON • VERTUAL MACHINE • HDFS
  • 11. from pyspark import SparkContext , SQLContext sqlContext = SQLContext(sc) #data collection: data = sqlContext.read.csv('/home/tej/Documents/ML with spark/train.csv',header=True, sep=',') data.show(n=5) X_train = data.select('Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_ Value','Small_systematic_Risk','systematic_risks','Annual_Return')
  • 12. X_train=X_train.select(X_train.Large_ROE.cast('float'),X_train.Large_Return_Rate_last_quarter.cast ('float'),X_train.Large_Market_Value.cast('float'),X_train.Small_systematic_Risk.cast('float'),X_train. systematic_risks.cast('float'),X_train.Large_BnP.cast('float'),X_train.Large_SnP.cast('float'),X_train.A nnual_Return.cast('float')) from pyspark.ml.feature import VectorAssembler,VectorIndexer,StringIndexer assembler=VectorAssembler(inputCols=['Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_ last_quarter','Large_Market_Value','Small_systematic_Risk','systematic_risks'],outputCol='features') X_train=assembler.transform(X_train) featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=7).fit(X_train) X_train=featureIndexer.transform(X_train)
  • 13.
  • 14. from pyspark.ml.regression import LinearRegression linear_reg = LinearRegression(labelCol='Annual_Return',featuresCol = 'indexedFeatures') linear_reg_model = linear_reg.fit(X_train)
  • 15.
  • 16. test_data = sqlContext.read.csv('/home/tej/Documents/ML with spark/test.csv',header=True, sep=',') X_test=test_data.select('Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_Value', 'Small_systematic_Risk','systematic_risks','Annual_Return') X_test= test_data.select(X_test.Large_ROE.cast('float'),X_test.Large_Return_Rate_last_quarter.cast('float'),X_test.Large_Mark et_Value.cast('float'),X_test.Small_systematic_Risk.cast('float'),X_test.systematic_risks.cast('float'),X_test.Large_BnP.c ast('float'),X_test.Large_SnP.cast('float'),X_test.Annual_Return.cast('float')) assembler = VectorAssembler(inputCols=['Large_BnP','Large_ROE','Large_SnP','Large_Return_Rate_last_quarter','Large_Market_ Value','Small_systematic_Risk','systematic_risks'],outputCol='features') X_test=assembler.transform(X_test) featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=7).fit(X_test) X_test=featureIndexer.transform(X_test)
  • 18.
  • 19. CONCLUSION: • From the final output it is clear that using linear model in training the data set we have obtained predictions which show perdictions of annul returns with less than 0.1 unit error on average. key learning : • we have learnt the basic uses of a machine learning and the uses of spark in the implementation of the machine learning model. • The various phases involved in the designing machine learning model in understood and implemented using a machine learning Random forest model •