SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
6693
Advanced Analytics with R and SAP HANA
Jitender Aswani and Jens Doerpmund

LETS TALK CODE
R & HANA: LETS TALK CODE
Analytics from SAP
Four Topic Areas for DKOM

     Data Sources                        Analytic                           Access
                                       Applications


                                                       Enterprise
                       Business
                                                      Performance
                      Intelligence
                                                      Management

                                     Collaboration

                       Enterprise                     Governance,
                      Information                      Risk, and
                      Management                      Compliance

                                         Data
                                      Warehousing




                                                                                          2
R & HANA: LETS TALK CODE
Business Intelligence
Innovate. Accelerate. Simplify.




                                                           Dashboards
                Reporting               Data
                                                              and
               and Analysis           Discovery
                                                           Applications



                                  Deployment Options

                  BI Platform     SAP HANA      Embedded      Cloud



                                                                                    3
R & HANA: LETS TALK CODE
Code Agenda



   R and HANA – Why choose this topic for DKOM?


   Basic R


   Advanced R


   R and HANA Integration – A match made in tech wonder-world


   R, HANA, XS Engine, JSON/HTML5


   Airlines App Demo


                                                                                       4
R & HANA: LETS TALK CODE
R and HANA:
Why choose this topic for DKOM?


   Free, Fun and Fantastic


   Unbelievably simple yet amazingly powerful


   Unparalleled platform for analytical and statistical programming


   Vibrant community – 3,670 packages contributed


   In-memory Technology


   Excellent support in SAP HANA for advanced analytics on big-data

                                                                                             5
R & HANA: LETS TALK CODE
Basic R:
Setup and Explore Preloaded Data Sets

   Setup is trivial                                       > names(cars)
       Download R (http://www.r-project.org/)
                                                             Print column names
          And / Or
                                                           cars$speed
       Download RStudio (http://rstudio.org/)
                                                             Access speed column of cars database
   Run RStudio                                            > mycars <- cars
   Start play with preloaded datasets                       Copy the cars dataset over to mycars
     library(help="datasets")                             > mycars$product <- mycars$speed * mycars$dist
   Type cars                                                Add a new column to mycars dataset
       dataset with 50 records on two columns (speed      > names(mycars) or just type > mycars
        and dist)
                                                           > Titanic
   > ?cars
       prints more information on cars dataset

                                                                                                        6
R & HANA: LETS TALK CODE
Basic R:
Visualization in R

   Plots scatterplot (correlation – higher the speed,      Plot bar chart on gears column
    more stopping distance is required.)                        counts <- table(mtcars$gear)
    > plot (cars)                                               barplot(counts, main="Car Distribution", xlab="Number
                                                                 of Gears")
   Line chart
                                                            Let’s make a colorful stacked bar chart
    > lines(cars)
                                                                counts <- table(mtcars$vs, mtcars$gear)
   Let’s increase the complexity…                              barplot(counts, main="Car Distribution by Gears and
                                                                 VS",
   Motor Trend Car Road Tests (1974)
                                                                 xlab="Number of Gears", col=c("darkblue","red"),
    > mtcars                                                     legend = rownames(counts))
   Structure of mtcars                                     Increase the complexity of your visualization
    str(mtcars)                                                 coplot(mpg ~ disp | as.factor(cyl), data = mtcars, panel
                                                                 = panel.smooth, rows = 1)
   > plot(mtcars$wt, mtcars$mpg)
                                                            Glance over your data before you start analyzing
                                                                  pairs(mtcars, main = "mtcars data")
                                                                                                                    7
R & HANA: LETS TALK CODE
Basic R (Visualization)


   Boxplot of tooth growth in guineapigs using ToothGrowth dataset
    boxplot(len~supp*dose, data=ToothGrowth, notch=FALSE, col=(c("gold","darkgreen")),main="Tooth Growth",
    xlab="Suppliment and Dose")




We have barely scratched the surface on the topic of
visualization in R….




                                                                                                             8
R & HANA: LETS TALK CODE
Basic R:
data.frame and data.table

   data.frame                                              > str(mtcars) or class(mtcars)

     Single most important data type - a columnar          Create a simple data frame
      data-set
                                                             dkom <- data.frame(Age=c(23,24,25,26),
   data.table (Likely replacement for plyr package)         NetWorth=c(100,200,300,400), Names=c("A", "B", "C",
                                                             "D"))
     A very powerful pacakage that extends
                                                            Load a csv file into data.frame/data.table
      data.frame and simplifies data massaging and
      aggregation                                             setwd("C:/Users/i827456/Desktop/DKOM-2012/data")

     You just need to learn these two, data.frame and        allairports <- data.table(read.csv("airlines/airports.csv",
      data.table, to get most of your analytics tasks         header=TRUE, sep=",", stringsAsFactors=FALSE))
      done
                                                              allairports
     install.packages(“data.table”) to install
                                                            Write the data.frame to a csv file
     library(“data.table”)
                                                              write.csv(allairports , “output/dkomairports.csv",
                                                              row.names=FALSE)
                                                                                                                            9
R & HANA: LETS TALK CODE
     Advanced R:
     SP100 - XML Parsing and Historical Stock Data

      SAP.Monthly.Data <- to.monthly(getSymbols("SAP", auto.assign=FALSE, from="2007-01-01"))
      plot(SAP.Monthly.Data) # Plot monthly stock close since 2007
      plot(OpCl(SAP.Monthly.Data)*100) # Plot monthly returns

                                                                    2
    Perform this analysis on all SP100 components                       #Get monthly returns
                                                                        getMonthlyReturns <- function(sym) {
1
    library(XML) # For XML Parsing                                         y <- to.monthly(getSymbols('YHOO', auto.assign=FALSE,
    library(plyr) # For data.frame aggregating                          from="2007-01-01"))
    library(quantmod) #For downloading financial information               as.vector(OpCl(y)*100)
    from Yahoo, Google                                                  }
    # get the list of symbols from Wikipedia – View the wikipedia       SP100.MR <- ldply(sp100.list, getMonthlyReturns,
    to get an understanding                                             .progress="text")
    sp100.list <-
                                                                    3
    readHTMLTable('http://en.wikipedia.org/wiki/S%26P_100')[[2]         #transpose d to have tickers as columns and date as rows
    ]                                                                   SP100.MR.T <- t(SP100.MR)
    sp100.list <- as.vector(sp100.list$Symbol)                          colnames(SP100.MR.T)<- sp100.list
                                                                        rownames(SP100.MR.T) <- seq(as.Date("2007-01-01"),
                                                                        today, by="mon")


                                                                                                                                   10
R & HANA: LETS TALK CODE
Advanced R:
Geo Code Your Data – Google Maps API

  Geo-code an Address – Get Lat/Lng                             Reverse Geo-code – Get Address
getGeoCode <- function(gcStr) {                                 reverseGeoCode <- function(latlng) {
  library("RJSONIO") #Load Library                              latlngStr <- gsub(' ','%20', paste(latlng, collapse=","))#Collapse and
  gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters         Encode URL Parameters
 #Open Connection                                                 library("RJSONIO") #Load Library
 connectStr <-                                                    #Open Connection
paste('http://maps.google.com/maps/api/geocode/json?sensor=fa     connectStr <-
lse&address=',gcStr, sep="")                                    paste('http://maps.google.com/maps/api/geocode/json?sensor=false
  con <- url(connectStr)                                        &latlng=',latlngStr, sep="")
  data.json <- fromJSON(paste(readLines(con), collapse=""))       con <- url(connectStr)
  close(con)                                                      data.json <- fromJSON(paste(readLines(con), collapse=""))
  #Flatten the received JSON                                      close(con)
  data.json <- unlist(data.json)                                  #Flatten the received JSON
  if(data.json["status"]=="OK") {                                 data.json <- unlist(data.json)
    lat <- data.json["results.geometry.location.lat"]             if(data.json["status"]=="OK")
    lng <- data.json["results.geometry.location.lng"]               address <- data.json["results.formatted_address"]
    gcodes <- c(lat, lng)                                         return (address)
    names(gcodes) <- c("Lat", "Lng")                            }
    return (gcodes)
  }                                                             address <- reverseGeoCode(c(37.4418834, -122.1430195))
}
geoCodes <- getGeoCode("Palo Alto,California")                  airports <- with(airports, data.frame(Code, Adr,
                                                                   c(Lat,Lng)=laply(Adr, function(val){getGeoCode(val)})))
                                                                                                                    11
R & HANA: LETS TALK CODE
Advanced R:
Cluster Analysis using K-Mean

#Perfrom K-Mean Clustering on IRIS dataset

irisdf <- iris
m <- as.matrix(cbind(irisdf$Petal.Length,
             irisdf$Petal.Width),ncol=2)
kcl <- (kmeans(m,3))
kcl$size
kcl$withinss
irisdf$cluster <- factor(kcl$cluster)
centers <- as.data.frame(kcl$centers)

#Plot clusters and data in the clusters
library(ggplot2)
ggplot(data=irisdf, aes(x=Petal.Length, y=Petal.Width,
color=cluster )) +
 geom_point() +
 geom_point(data=centers, aes(x=V1,y=V2, color='Center'))
+ geom_point(data=centers, aes(x=V1,y=V2,
color='Center'), size=52, alpha=.3, legend=FALSE)

                                                                        Source: R-Chart
                                                                                          12
R & HANA: LETS TALK CODE
  Advanced R:
  Sentiment Analysis on #DKOM and a WordCloud
                                       TwitterTag TotalTweetsFetched PositiveTweets NegativeTweets AverageScore TotalTweets Sentiment
                                       #DKOM                       64            19              9             0         28       0.68
#Populate the list of sentiment words from Hu and Liu (http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html)
huliu.pwords <- scan('opinion-lexicon/positive-words.txt', what='character', comment.char=';')
huliu.nwords <- scan('opinion-lexicon/negative-words.txt', what='character', comment.char=';')

# Add some words
huliu.nwords <- c(huliu.nwords,'wtf','wait','waiting','epicfail', 'crash', 'bug', 'bugy', 'bugs', 'slow', 'lie')
#Remove some words
huliu.nwords <- huliu.nwords[!huliu.nwords=='sap']
huliu.nwords <- huliu.nwords[!huliu.nwords=='cloud']
#which('sap' %in% huliu.nwords)

twitterTag <- “#DKOM"
# Get 1500 tweets - an individual is only allowed to get 1500 tweets
 tweets <- searchTwitter(tag, n=1500)
  tweets.text <- laply(tweets,function(t)t$getText())
  sentimentScoreDF <- getSentimentScore(tweets.text)
  sentimentScoreDF$TwitterTag <- twitterTag

# Get rid of tweets that have zero score and seperate +ve from -ve tweets
sentimentScoreDF$posTweets <- as.numeric(sentimentScoreDF$SentimentScore >=1)
sentimentScoreDF$negTweets <- as.numeric(sentimentScoreDF$SentimentScore <=-1)

#Summarize finidings
summaryDF <- ddply(sentimentScoreDF,"TwitterTag", summarise,
        TotalTweetsFetched=length(SentimentScore),
        PositiveTweets=sum(posTweets), NegativeTweets=sum(negTweets),
        AverageScore=round(mean(SentimentScore),3))

summaryDF$TotalTweets <- summaryDF$PositiveTweets + summaryDF$NegativeTweets

#Get Sentiment Score
summaryDF$Sentiment <- round(summaryDF$PositiveTweets/summaryDF$TotalTweets, 2)



                                    Blog: Big Four and the Battle of Sentiments - Oracle, IBM, Microsoft and SAP                                     Source: R-Bloggers
                                                                                                                                                                          13
R & HANA: LETS TALK CODE
Advanced R:
Visualization Using Google Visualizations Libraries

library("googleVis")
data(Fruits)
motionChart <- gvisMotionChart(Fruits, idvar="Fruit", timevar="Year")
plot(motionChart)




   Again, we have barely scratched the surface on the
   advanced capabilities of R…




                                                                                    Source: GoolgeVis
                                                                                                        14
R & HANA: LETS TALK CODE
R – Let’s recap


   Basic data munging and visualization

   Advanced data munging, mash-ups, aggregation, visualization, sentiment analysis and more…

   There is a package for everything you can imagine in R community, if not, go and create one

 We      are just getting warmed up!!!
   Please learn more about R at www.r-project.org

                                                                wordcloud                   XML
       Data.table                     GoogleVis
                                                                     quantmod
                  twitteR                   RJSONIO                                          plyr

                                                                                                      15
R & HANA: LETS TALK CODE
R and HANA Integration – Big Data Setup


   Airlines sector in the travel industry

   22 years (1987-2008) of airlines on time performance data on US airlines


 123         million records
   Extract Transform Load – ETL work to combine this data with data on airports, data on carriers with this data
    to setup for Big Data analysis in R and HANA

 D20        with 96GB of RAM and 24 Cores
   Massive amount of data crunching using R and HANA

 Let’s       dive in…

                                                                                                               16
R & HANA: LETS TALK CODE
R and HANA Integration – Two Alternative Approaches


#1               Outside-In
                 JDBC/ODBC
                                                      #2                     Inside Out

                                                                  SQL SP/
    Your R                                                       CalcModel
    Session         OR                 HANA
                                                                                                R-Serve
                                                                                                (TCP/IP)
                 RHANA Package                                  Embedded R                      Daemon
                                                                  HANA

   Use HANA as “just another database” and connect      From inside HANA, transfer huge amounts of data to
    using JDBC/ODBC (not recommended)                     R, leverage the advanced analytical and statistical
     Row-Vector-Column                                   capabilities of R and get aggregated results sets
     Column-Vector-Row                                   back.

   Use SAP RHANA package to transfer large amounts
    of columnar datasets (recommended)

                                                                                                        17
R & HANA: LETS TALK CODE
R and HANA Integration – Outside In (ODBC Approach)

library("RODBC")
# open a new connection
conn <- odbcConnect("LocalHANA", uid="system", pwd="manager")
# create a new table in HANA
sqlSave(conn, iris, "iris", rownames = FALSE)
# Read iris back from HANA into a data.frame
newIris <- sqlFetch(conn, "iris")

# Try saiving again with append true.
                                                                  Creates a row
sqlSave(conn, newiris, "iris", rownames = FALSE, append = TRUE)
# now we have twice as many records in "SYSTEM.DFTEST"
                                                                  table, no, no for
# let's add a new column to the data frame
                                                                  HANA!!!
newIris$Useless = newIris$Sepal.Length * newdf$Sepal.Width

#Drop table
sqlDrop(conn, "SYSTEM.newIris")
sqlSave(conn, newdf, "SYSTEM.newIris", rownames = FALSE)
#Cleanup
close(conn)

                                                                                          18
R & HANA: LETS TALK CODE
R and HANA Integration – Outside In (RHANA Approach)

Use RHANA to Read and Write Dataframes                      Data transfer between two HANA machines
library("RHANA")
library("RJDBC")                                            #Read 5y historical data for bay area airports from
 jdbcDriver <- JDBC("com.sap.db.jdbc.Driver", "C:/Program   # the HANA server to this loca HANA server
Files/sap/hdbclient/ngdbc.jar", identifier.quote="`")       sql <- 'select * from airlines_hist_5y_ba'
#setup connection to RDKOM server                           system.time(getDataFrame(conn_server, sql, "airlines5yba"),
conn_server <- dbConnect(jdbcDriver,                        gcFirst=FALSE)
"jdbc:sap:rdkom12.dhcp.pal.sap.corp:30015", "system",       str(airlines5yba)
"manager")
#write this airlines data to HANA                           #setup connection to local host
system.time(air08 <- read.csv("airlines/2008-short.csv",    conn_local <- dbConnect(jdbcDriver,
header=TRUE, sep=",", na.strings = "NA",                    "jdbc:sap:localhost:30015", "system", "manager")
stringsAsFactors=FALSE))
writeDataFrame(conn_server, "air08", "SYSTEM",              writeDataFrame(conn_local, "airlines5yba", "SYSTEM",
"Airlines_08_1M", force=TRUE)                               "airlines_hist_5y_ba", force=TRUE)
#Read the data
sql <- 'select * from "Airlines_08_1M" '
system.time(getDataFrame(conn_server, sql,
"airlines081m"))

                                                                                                                     19
R & HANA: LETS TALK CODE
R and HANA Integration – Using RHANA to ETL Big Data in HANA
Eight Simple Steps

                                                  Geo-code the airports data using
#1   Acquire Airlines Data (12GB)
                                             #5   Google GeoCoding APIs (code you
                                                  saw earlier)


                                                  Merge the two datasets to extract
#2 Load 123M records into a R
   data.frame                                #6   major airports information (Not
                                                  interested in Palo Alto or Hayward
                                                  airports)


#3   Upload this data.frame into HANA
     using RHANA                             #7   Upload All Airports, Major Airports
                                                  in HANA

     Read the Airport Information (All the
#4   airports in US including major
     airports) in R
                                             #8   Ready Set Go on your real-time
                                                  big-data analytics mission!!!



                                                                                        20
R & HANA: LETS TALK CODE
   R and HANA Integration – Using RHANA to ETL Big Data in HANA
   R Script (Just 25 Lines of Code)
##############################################################################                 ####################################################################################
#Read all the airlines data, 120M records and upload this into HANA                            #Read this airlines data using RHANA from HANA
##############################################################################                 ####################################################################################
setwd("C:/Users/i827456/Desktop/DKOM-2012/data")                                               sql <- 'select * from "Airlines_Hist_Perf"'
#load libraries                                                                                getDataFrame(conn, sql, "airlinesDB")
library("RHANA")                                                                               #Read this datafram into data.table
library("RJDBC")                                                                               airlinesDB <- data.table(airlinesDB)
library("data.table")                                                                          setkey(airlineDB, Origin, UniqueCarrier, Year, Month)

#Read all CSV files from 1987 - 2008, 120M records, create a data frame to hold 120M records
filenames <- list.files(path="../data/airlines/test", pattern="*.csv")                         ####################################################################################
airlines <- data.frame()                                                                       #ETL - Read the AIRPORT Information, get major aiport informatoin extracted and upload this
for(i in filenames) {                                                                          #transfromed dataset into HANA
    file <- file.path("../data/airlines/test",i)                                               ####################################################################################
    airlines <- rbind(airlines, read.csv(file, header=TRUE, sep=","))                          allairports <- data.table(read.csv("airlines/airports.csv", header=TRUE, sep=",",
}                                                                                              stringsAsFactors=FALSE))
                                                                                               setkey(allairports, iata)
#Setup connection to HANA Box - 96Gig, 24Core
                                                                                               # Extract major airportcodes from 120M records
jdbcDriver <- JDBC("com.sap.db.jdbc.Driver", "C:/Program Files/sap/hdbclient/ngdbc.jar",       majorairportscode <- airlinesDB[, union(Origin, Dest)]
identifier.quote="`")
conn_server <- dbConnect(jdbcDriver, "jdbc:sap:rdkom12.dhcp.pal.sap.corp:30015", "system",     #Build Major airports dataset
"manager")                                                                                     majorairports <- allairports[majorairportscode]
                                                                                               majorairports$iata <- as.character(majorairports$iata)
##############################################################################
#Load this airlines data using RHANA into HANA                                                 #write both the "All Airports" and "Major Airports" dataframes to Hana
##############################################################################                 writeDataFrame(conn, "majorairports", "SYSTEM", "Major_Airports", force=TRUE)
writeDataFrame(conn, "airlines", "SYSTEM", "Airlines_Hist_Perf", force=TRUE)                   writeDataFrame(conn, "allairports", "SYSTEM", "All_Airports", force=TRUE)

# That is it, 120M records uploaded into HANA. Let's take a look at HANA Studio                # Export to local disk
                                                                                               write.csv(majorairports, "output/MajorAirports.csv", row.names=FALSE)
                                                                                               # Completed - extract, transform load


                   In few hours, you can get your big-data story moving…                                                                                                           21
R & HANA: LETS TALK CODE
R and HANA Integration – Using RHANA to Read/Aggregate
Big Data in HANA
##################################################################################
#Analysis Time - Summarize data - Airlines performance by month for all of 22 years
#################################################################################
system.time(averageDelayOverYears <- airlinesDB[,list(
           CarriersCount=length(unique(UniqueCarrier)),
           FlightCount=prettyNum(length(FlightNum), big.mark=","),
                   AirportCount_Origin=length(unique(Origin)),
           DistanceFlew_miles=prettyNum(sum(as.numeric(Distance), na.rm=TRUE), big.mark=","),
                   AvgArrDelay_mins=round(mean(ArrDelay, na.rm=TRUE), digits=2),
           AvgDepDelay_mins=round(mean(DepDelay, na.rm=TRUE), digits=2)
                    ),by=list(Year, Month)][order(Year, Month)]
)
### user system elapsed
### 17.842 1.596 19.487
write.csv(averageDelayOverYears, "output/AggrHViewofAirlines_YM.csv", row.names=FALSE)

In just under 20 seconds, 123 MILLION records were
analyzed, aggregated and sorted!!!
                                                                                                       22
R & HANA: LETS TALK CODE
R and HANA Integration – Using RHANA to Address a Business
Problem with HANA

###########################################################################################
#Get monthly data for Southwest (WN)
###########################################################################################
system.time(averageDelayOverMonths <- airlinesDB[UniqueCarrier=="WN",list(
           FlightCount=prettyNum(length(FlightNum), big.mark=","),
                   AirportCount_Origin=length(unique(Origin)),
           DistanceFlew_miles=prettyNum(sum(as.numeric(Distance), na.rm=TRUE), big.mark=","),
                   AvgArrDelay_mins=round(mean(ArrDelay, na.rm=TRUE), digits=2),
          AvgDepDelay_mins=round(mean(DepDelay, na.rm=TRUE), digits=2)
                   ),by=list(Year, Month)][order(Year, Month)]
)
# user system elapsed
# 43.214 6.804 50.143

In 50 seconds, you can address a business question for
Southwest on its growth over the last 20 years!!!
                                                                                                23
R & HANA: LETS TALK CODE
R and HANA Integration – Inside Out
Embedding R in HANA Stored Procedure
drop table json_out; create table json_out (res CLOB);
drop table json_in; create table json_in (req VARCHAR(250));          #Get nearby airports
delete from json_in; insert into json_in values('{"lat": 40.730885,    nba <- airportsDT[, list(AirportCode=iata,
"lng":-73.997383}');
                                                                       Distance=round(deg.dist(clat,           clng, lat,
DROP PROCEDURE getNearByAirports;                                     long),digits=0),
                                                                                                                                  Gets the
create procedure getNearByAirports(IN airports "Major_Airports",           Address=paste(airport, city, state, sep=", "),
                                                                                                                               distance on a
IN request json_in, OUT response json_out)                                 Lat=lat, Lng=long)][order(Distance)][1:3]
                                                                                                                              earth's spherical
LANGUAGE RLANG AS                                                      #setkey(nba, AirportCode)
                                                                                                                                  surface
BEGIN                                                                  #cat(toJSON(nba), file="outputlog2.txt", sep=",")
library(RJSONIO)                                                      objs <- apply(nba, 1, toJSON)
library(fossil)                                                        nbajson <- paste('[', paste(objs, collapse=', '), ']')
library(data.table)                                                    response <- head(data.frame(RES=nbajson))
  req <- fromJSON(as.character(request[[1]]))                         END;
clat <- as.numeric(req["lat"])                                        call getNearByAirports("Major_Airports", json_in, json_out) with
  clng <- as.numeric(req["lng"])                                      overview
  #cat(paste(clat, clng, sep=","), file="outputlog1.txt", sep=",")    select * from json_out;
  #lat <- quote()
  #lng <- quote()
  #Load airports table
  airportsDT <- data.table(airports)

            In milliseconds, you can get nearby airports based on
            lat/lng and advanced functions in R!!!                                                                                        24
R & HANA: LETS TALK CODE
R, HANA, XS Engine, JSON/HTML5

                            Browser / Mobile
                                  HTML5
                          Client-Side JavaScript



                         (JSON)       http(s)




                        Internet Comm. Manager
                               XS Engine
                         Server-Side JavaScript
                                 (XSJS)
                                      HdbNet

                        HANA DB (Index Server)
                                                               Rserve / R
                         SQL, SQL Script, L, …     TCP/IP




                                                                                  25
R & HANA: LETS TALK CODE
R, HANA, XS Engine, JSON/HTML5
Get Nearby Airports Usecase – R, HANA, XS, JSON/HTML5/Jquery
function handleGet() {                                                            JSON Response
var body = '';
response.setContentType('application/json');                                      [
var deleteJsonIn = "delete from json_in";                                             {
var insertJsonIn = "insert into json_in values ('{"lat": 40.730885, "lng":-            "AirportCode": "LGA",
73.997383}')";                                                                             "Distance": "14",
var callNBAR = "call getNearByAirports("Major_Airports", json_in, null)";
                                                                                           "Address": "LaGuardia, New York, NY",
var deleteJsonOut = "delete from json_out";
var readJsonOut = "select top 1 * from json_out";                                          "Lat": "40.77724",
var conn = getConnection();                                                                "Lng": "-73.87261"
var pstmt1 = conn.prepareStatement(deleteJsonIn);                                     },
var pstmt2 = conn.prepareStatement(insertJsonIn);                                     {
var cstmt = conn.prepareCall(callNBAR);                                                    "AirportCode": "EWR",
var pstmt3 = conn.prepareStatement(deleteJsonOut);
                                                                                           "Distance": "19",
var pstmt4 = conn.prepareStatement(readJsonOut);
var r1 = pstmt1.execute();                                                                 "Address": "Newark Intl, Newark, NJ",
var r2 = pstmt2.execute();                                                                 "Lat": "40.69250",
var r3 = pstmt3.execute();                                                                 "Lng": "-74.16866"
conn.commit();                                                                        },
var r4 = cstmt.execute();                                                             {
var rs = pstmt4.executeQuery();
                                                                                           "AirportCode": "JFK",
rs.next();
var jsonout = rs.getClob(1);                                                               "Distance": "24",
trace.error(jsonout); rs.close();                                                          "Address": "John F Kennedy Intl, New York, NY",
pstmt1.close(); pstmt2.close(); cstmt.close(); pstmt3.close(); pstmt4.close();             "Lat": "40.63975",
conn.close(); body = JSON.stringify( {"jsonout"} ); response.addBody(body);                "Lng": "-73.77893"
response.setReturnCode(200); }                                                        }
                                                                                  ]
                                                                                                                                             26
R & HANA: LETS TALK CODE
R and HANA Integration – Nothing is impossible!



A match made in heaven – the two highly sophisticated in-
memory technologies for advanced data analytics and data
visualization….



 Nothing in the world of analytics and visualization is
 impossible with R and SAP HANA!


                                                                        27
R & HANA: LETS TALK CODE
Airlines App Usecases


   A travel app which will advise you:
     Which airport you should fly out from or fly into
     Which day and which time of the day are the best to fly
     Which airlines you should avoid
     What is the most efficient way to get to your desitnation – hassle free, trouble free!

   Airlines performance data, airports data, weather data, twitter data, financial data and more

   Geocode and get nearby airports

   Show ontime performance of these airports

   Show which day are the best to fly out from a given airport

   And on and on and on…



                                                                                                         28
R & HANA: LETS TALK CODE
iPad Demo of HTML5 App (with R and HANA)




                                                                 29
R & HANA: LETS TALK CODE
Credits


   All the R Bloggers

   SAP R Project team

   SAP RHANA team

   SAP tRex team

   SAP XSE team

   SAP IT

   DKOM Organization Committee

   BI and HANA Solution and Product Management
    Teams




                                                                        30
R & HANA: LETS TALK CODE
Legal Disclaimer


The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of
SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP
has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or
release any functionality mentioned therein. This document, or any related presentation and SAP's strategy and possible
future developments, products and or platforms directions and functionality are all subject to change and may be changed by
SAP at any time for any reason without notice. The information on this document is not a commitment, promise or legal
obligation to deliver any material, code or functionality. This document is provided without a warranty of any kind, either
express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or
non-infringement. This document is for informational purposes and may not be incorporated into a contract. SAP assumes no
responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly
negligent.
All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially
from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only
as of their dates, and they should not be relied upon in making purchasing decisions.




                                                                                                                                31
R & HANA: LETS TALK CODE
© 2011 SAP AG. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any     SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects
purpose without the express permission of SAP AG. The information contained herein      Explorer, StreamWork, and other SAP products and services mentioned herein as well as
may be changed without prior notice.                                                    their respective logos are trademarks or registered trademarks of SAP AG in Germany
Some software products marketed by SAP AG and its distributors contain proprietary      and other countries.
software components of other software vendors.                                          Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports,
Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of         Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and
Microsoft Corporation.                                                                  services mentioned herein as well as their respective logos are trademarks or registered
                                                                                        trademarks of Business Objects Software Ltd. Business Objects is an
IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System
                                                                                        SAP company.
x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries,
eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise    Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase
Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5,                  products and services mentioned herein as well as their respective logos are trademarks
POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS,               or registered trademarks of Sybase, Inc. Sybase is an SAP company.
HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA,
                                                                                        All other product and service names mentioned are the trademarks of their respective
AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or
                                                                                        companies. Data contained in this document serves informational purposes only. National
registered trademarks of IBM Corporation.
                                                                                        product specifications may vary.
Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
                                                                                        The information in this document is proprietary to SAP. No part of this document may be
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or         reproduced, copied, or transmitted in any form or for any purpose without the express
registered trademarks of Adobe Systems Incorporated in the United States and/or other   prior written permission of SAP AG.
countries.
Oracle and Java are registered trademarks of Oracle and/or its affiliates.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin
are trademarks or registered trademarks of Citrix Systems, Inc.
HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World
Wide Web Consortium, Massachusetts Institute of Technology.


                                                                                                                                                                               32

Mais conteúdo relacionado

Mais procurados

A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro MorisakiA11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
Insight Technology, Inc.
 
SAP & MapR Solution Brief 2015
SAP & MapR Solution Brief 2015SAP & MapR Solution Brief 2015
SAP & MapR Solution Brief 2015
Vishwas Tengse
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
DataWorks Summit
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
 
In-Hadoop, In-Database and In-Memory Processing for Predictive Analytics
In-Hadoop, In-Database and In-Memory Processing for Predictive AnalyticsIn-Hadoop, In-Database and In-Memory Processing for Predictive Analytics
In-Hadoop, In-Database and In-Memory Processing for Predictive Analytics
DataWorks Summit
 

Mais procurados (20)

A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro MorisakiA11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
 
SAP & MapR Solution Brief 2015
SAP & MapR Solution Brief 2015SAP & MapR Solution Brief 2015
SAP & MapR Solution Brief 2015
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Sandish3Certs
Sandish3CertsSandish3Certs
Sandish3Certs
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
In-Hadoop, In-Database and In-Memory Processing for Predictive Analytics
In-Hadoop, In-Database and In-Memory Processing for Predictive AnalyticsIn-Hadoop, In-Database and In-Memory Processing for Predictive Analytics
In-Hadoop, In-Database and In-Memory Processing for Predictive Analytics
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
HANA SITSP 2011
HANA SITSP 2011HANA SITSP 2011
HANA SITSP 2011
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
 
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysHBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group Meet
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data LakeManaging a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
 
SparkR best practices for R data scientist
SparkR best practices for R data scientistSparkR best practices for R data scientist
SparkR best practices for R data scientist
 

Destaque

Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Yury Velikanov
 

Destaque (8)

Loading text data from SAP source systems
Loading text data from SAP source systemsLoading text data from SAP source systems
Loading text data from SAP source systems
 
HANA SPS07 App Function Library
HANA SPS07 App Function LibraryHANA SPS07 App Function Library
HANA SPS07 App Function Library
 
Oracle dataguard overview
Oracle dataguard overviewOracle dataguard overview
Oracle dataguard overview
 
SAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop Integration
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and concept
 
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
 
Hadoop integration with SAP HANA
Hadoop integration with SAP HANAHadoop integration with SAP HANA
Hadoop integration with SAP HANA
 
Integration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopIntegration of SAP HANA with Hadoop
Integration of SAP HANA with Hadoop
 

Semelhante a Advanced analytics with sap hana and r

Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Revolution Analytics
 

Semelhante a Advanced analytics with sap hana and r (20)

TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
It takes a village (to raise a ML model)
It takes a village (to raise a ML model)It takes a village (to raise a ML model)
It takes a village (to raise a ML model)
 
Big data analysis using spark r published
Big data analysis using spark r publishedBig data analysis using spark r published
Big data analysis using spark r published
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 
User 2013-oracle-big-data-analytics-1971985
User 2013-oracle-big-data-analytics-1971985User 2013-oracle-big-data-analytics-1971985
User 2013-oracle-big-data-analytics-1971985
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
 
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Big Data on the Cloud
Big Data on the CloudBig Data on the Cloud
Big Data on the Cloud
 
Dmm212 – Sap Hana Graph Processing
Dmm212 – Sap Hana  Graph ProcessingDmm212 – Sap Hana  Graph Processing
Dmm212 – Sap Hana Graph Processing
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 

Mais de SAP Technology

Mais de SAP Technology (20)

SAP Integration Suite L1
SAP Integration Suite L1SAP Integration Suite L1
SAP Integration Suite L1
 
Future-Proof Your Business Processes by Automating SAP S/4HANA processes with...
Future-Proof Your Business Processes by Automating SAP S/4HANA processes with...Future-Proof Your Business Processes by Automating SAP S/4HANA processes with...
Future-Proof Your Business Processes by Automating SAP S/4HANA processes with...
 
7 Top Reasons to Automate Processes with SAP Intelligent Robotic Processes Au...
7 Top Reasons to Automate Processes with SAP Intelligent Robotic Processes Au...7 Top Reasons to Automate Processes with SAP Intelligent Robotic Processes Au...
7 Top Reasons to Automate Processes with SAP Intelligent Robotic Processes Au...
 
Extend SAP S/4HANA to deliver real-time intelligent processes
Extend SAP S/4HANA to deliver real-time intelligent processesExtend SAP S/4HANA to deliver real-time intelligent processes
Extend SAP S/4HANA to deliver real-time intelligent processes
 
Process optimization and automation for SAP S/4HANA with SAP’s Business Techn...
Process optimization and automation for SAP S/4HANA with SAP’s Business Techn...Process optimization and automation for SAP S/4HANA with SAP’s Business Techn...
Process optimization and automation for SAP S/4HANA with SAP’s Business Techn...
 
Accelerate your journey to SAP S/4HANA with SAP’s Business Technology Platform
Accelerate your journey to SAP S/4HANA with SAP’s Business Technology PlatformAccelerate your journey to SAP S/4HANA with SAP’s Business Technology Platform
Accelerate your journey to SAP S/4HANA with SAP’s Business Technology Platform
 
Accelerate Your Move to an Intelligent Enterprise with SAP Cloud Platform and...
Accelerate Your Move to an Intelligent Enterprise with SAP Cloud Platform and...Accelerate Your Move to an Intelligent Enterprise with SAP Cloud Platform and...
Accelerate Your Move to an Intelligent Enterprise with SAP Cloud Platform and...
 
Transform your business with intelligent insights and SAP S/4HANA
Transform your business with intelligent insights and SAP S/4HANATransform your business with intelligent insights and SAP S/4HANA
Transform your business with intelligent insights and SAP S/4HANA
 
SAP Cloud Platform for SAP S/4HANA: Accelerate your move to an Intelligent En...
SAP Cloud Platform for SAP S/4HANA: Accelerate your move to an Intelligent En...SAP Cloud Platform for SAP S/4HANA: Accelerate your move to an Intelligent En...
SAP Cloud Platform for SAP S/4HANA: Accelerate your move to an Intelligent En...
 
Innovate collaborative applications with SAP Jam Collaboration & SAP Cloud Pl...
Innovate collaborative applications with SAP Jam Collaboration & SAP Cloud Pl...Innovate collaborative applications with SAP Jam Collaboration & SAP Cloud Pl...
Innovate collaborative applications with SAP Jam Collaboration & SAP Cloud Pl...
 
The IoT Imperative for Consumer Products
The IoT Imperative for Consumer ProductsThe IoT Imperative for Consumer Products
The IoT Imperative for Consumer Products
 
The IoT Imperative for Discrete Manufacturers - Automotive, Aerospace & Defen...
The IoT Imperative for Discrete Manufacturers - Automotive, Aerospace & Defen...The IoT Imperative for Discrete Manufacturers - Automotive, Aerospace & Defen...
The IoT Imperative for Discrete Manufacturers - Automotive, Aerospace & Defen...
 
IoT is Enabling a New Era of Shareholder Value in Energy and Natural Resource...
IoT is Enabling a New Era of Shareholder Value in Energy and Natural Resource...IoT is Enabling a New Era of Shareholder Value in Energy and Natural Resource...
IoT is Enabling a New Era of Shareholder Value in Energy and Natural Resource...
 
The IoT Imperative in Government and Healthcare
The IoT Imperative in Government and HealthcareThe IoT Imperative in Government and Healthcare
The IoT Imperative in Government and Healthcare
 
SAP S/4HANA Finance and the Digital Core
SAP S/4HANA Finance and the Digital CoreSAP S/4HANA Finance and the Digital Core
SAP S/4HANA Finance and the Digital Core
 
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANA
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANAFive Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANA
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANA
 
SAP Helps Reduce Silos Between Business and Spatial Data
SAP Helps Reduce Silos Between Business and Spatial DataSAP Helps Reduce Silos Between Business and Spatial Data
SAP Helps Reduce Silos Between Business and Spatial Data
 
Why SAP HANA?
Why SAP HANA?Why SAP HANA?
Why SAP HANA?
 
Spotlight on Financial Services with Calypso and SAP ASE
Spotlight on Financial Services with Calypso and SAP ASESpotlight on Financial Services with Calypso and SAP ASE
Spotlight on Financial Services with Calypso and SAP ASE
 
SAP ASE 16 SP02 Performance Features
SAP ASE 16 SP02 Performance FeaturesSAP ASE 16 SP02 Performance Features
SAP ASE 16 SP02 Performance Features
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Advanced analytics with sap hana and r

  • 1. 6693 Advanced Analytics with R and SAP HANA Jitender Aswani and Jens Doerpmund LETS TALK CODE
  • 2. R & HANA: LETS TALK CODE Analytics from SAP Four Topic Areas for DKOM Data Sources Analytic Access Applications Enterprise Business Performance Intelligence Management Collaboration Enterprise Governance, Information Risk, and Management Compliance Data Warehousing 2
  • 3. R & HANA: LETS TALK CODE Business Intelligence Innovate. Accelerate. Simplify. Dashboards Reporting Data and and Analysis Discovery Applications Deployment Options BI Platform SAP HANA Embedded Cloud 3
  • 4. R & HANA: LETS TALK CODE Code Agenda  R and HANA – Why choose this topic for DKOM?  Basic R  Advanced R  R and HANA Integration – A match made in tech wonder-world  R, HANA, XS Engine, JSON/HTML5  Airlines App Demo 4
  • 5. R & HANA: LETS TALK CODE R and HANA: Why choose this topic for DKOM?  Free, Fun and Fantastic  Unbelievably simple yet amazingly powerful  Unparalleled platform for analytical and statistical programming  Vibrant community – 3,670 packages contributed  In-memory Technology  Excellent support in SAP HANA for advanced analytics on big-data 5
  • 6. R & HANA: LETS TALK CODE Basic R: Setup and Explore Preloaded Data Sets  Setup is trivial  > names(cars)  Download R (http://www.r-project.org/)  Print column names And / Or  cars$speed  Download RStudio (http://rstudio.org/)  Access speed column of cars database  Run RStudio  > mycars <- cars  Start play with preloaded datasets  Copy the cars dataset over to mycars  library(help="datasets")  > mycars$product <- mycars$speed * mycars$dist  Type cars  Add a new column to mycars dataset  dataset with 50 records on two columns (speed  > names(mycars) or just type > mycars and dist)  > Titanic  > ?cars  prints more information on cars dataset 6
  • 7. R & HANA: LETS TALK CODE Basic R: Visualization in R  Plots scatterplot (correlation – higher the speed,  Plot bar chart on gears column more stopping distance is required.) counts <- table(mtcars$gear) > plot (cars) barplot(counts, main="Car Distribution", xlab="Number of Gears")  Line chart  Let’s make a colorful stacked bar chart > lines(cars) counts <- table(mtcars$vs, mtcars$gear)  Let’s increase the complexity… barplot(counts, main="Car Distribution by Gears and VS",  Motor Trend Car Road Tests (1974) xlab="Number of Gears", col=c("darkblue","red"), > mtcars legend = rownames(counts))  Structure of mtcars  Increase the complexity of your visualization str(mtcars) coplot(mpg ~ disp | as.factor(cyl), data = mtcars, panel = panel.smooth, rows = 1)  > plot(mtcars$wt, mtcars$mpg)  Glance over your data before you start analyzing pairs(mtcars, main = "mtcars data") 7
  • 8. R & HANA: LETS TALK CODE Basic R (Visualization)  Boxplot of tooth growth in guineapigs using ToothGrowth dataset boxplot(len~supp*dose, data=ToothGrowth, notch=FALSE, col=(c("gold","darkgreen")),main="Tooth Growth", xlab="Suppliment and Dose") We have barely scratched the surface on the topic of visualization in R…. 8
  • 9. R & HANA: LETS TALK CODE Basic R: data.frame and data.table  data.frame  > str(mtcars) or class(mtcars)  Single most important data type - a columnar  Create a simple data frame data-set dkom <- data.frame(Age=c(23,24,25,26),  data.table (Likely replacement for plyr package) NetWorth=c(100,200,300,400), Names=c("A", "B", "C", "D"))  A very powerful pacakage that extends  Load a csv file into data.frame/data.table data.frame and simplifies data massaging and aggregation setwd("C:/Users/i827456/Desktop/DKOM-2012/data")  You just need to learn these two, data.frame and allairports <- data.table(read.csv("airlines/airports.csv", data.table, to get most of your analytics tasks header=TRUE, sep=",", stringsAsFactors=FALSE)) done allairports  install.packages(“data.table”) to install  Write the data.frame to a csv file  library(“data.table”) write.csv(allairports , “output/dkomairports.csv", row.names=FALSE) 9
  • 10. R & HANA: LETS TALK CODE Advanced R: SP100 - XML Parsing and Historical Stock Data SAP.Monthly.Data <- to.monthly(getSymbols("SAP", auto.assign=FALSE, from="2007-01-01")) plot(SAP.Monthly.Data) # Plot monthly stock close since 2007 plot(OpCl(SAP.Monthly.Data)*100) # Plot monthly returns 2 Perform this analysis on all SP100 components #Get monthly returns getMonthlyReturns <- function(sym) { 1 library(XML) # For XML Parsing y <- to.monthly(getSymbols('YHOO', auto.assign=FALSE, library(plyr) # For data.frame aggregating from="2007-01-01")) library(quantmod) #For downloading financial information as.vector(OpCl(y)*100) from Yahoo, Google } # get the list of symbols from Wikipedia – View the wikipedia SP100.MR <- ldply(sp100.list, getMonthlyReturns, to get an understanding .progress="text") sp100.list <- 3 readHTMLTable('http://en.wikipedia.org/wiki/S%26P_100')[[2] #transpose d to have tickers as columns and date as rows ] SP100.MR.T <- t(SP100.MR) sp100.list <- as.vector(sp100.list$Symbol) colnames(SP100.MR.T)<- sp100.list rownames(SP100.MR.T) <- seq(as.Date("2007-01-01"), today, by="mon") 10
  • 11. R & HANA: LETS TALK CODE Advanced R: Geo Code Your Data – Google Maps API Geo-code an Address – Get Lat/Lng Reverse Geo-code – Get Address getGeoCode <- function(gcStr) { reverseGeoCode <- function(latlng) { library("RJSONIO") #Load Library latlngStr <- gsub(' ','%20', paste(latlng, collapse=","))#Collapse and gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters Encode URL Parameters #Open Connection library("RJSONIO") #Load Library connectStr <- #Open Connection paste('http://maps.google.com/maps/api/geocode/json?sensor=fa connectStr <- lse&address=',gcStr, sep="") paste('http://maps.google.com/maps/api/geocode/json?sensor=false con <- url(connectStr) &latlng=',latlngStr, sep="") data.json <- fromJSON(paste(readLines(con), collapse="")) con <- url(connectStr) close(con) data.json <- fromJSON(paste(readLines(con), collapse="")) #Flatten the received JSON close(con) data.json <- unlist(data.json) #Flatten the received JSON if(data.json["status"]=="OK") { data.json <- unlist(data.json) lat <- data.json["results.geometry.location.lat"] if(data.json["status"]=="OK") lng <- data.json["results.geometry.location.lng"] address <- data.json["results.formatted_address"] gcodes <- c(lat, lng) return (address) names(gcodes) <- c("Lat", "Lng") } return (gcodes) } address <- reverseGeoCode(c(37.4418834, -122.1430195)) } geoCodes <- getGeoCode("Palo Alto,California") airports <- with(airports, data.frame(Code, Adr, c(Lat,Lng)=laply(Adr, function(val){getGeoCode(val)}))) 11
  • 12. R & HANA: LETS TALK CODE Advanced R: Cluster Analysis using K-Mean #Perfrom K-Mean Clustering on IRIS dataset irisdf <- iris m <- as.matrix(cbind(irisdf$Petal.Length, irisdf$Petal.Width),ncol=2) kcl <- (kmeans(m,3)) kcl$size kcl$withinss irisdf$cluster <- factor(kcl$cluster) centers <- as.data.frame(kcl$centers) #Plot clusters and data in the clusters library(ggplot2) ggplot(data=irisdf, aes(x=Petal.Length, y=Petal.Width, color=cluster )) + geom_point() + geom_point(data=centers, aes(x=V1,y=V2, color='Center')) + geom_point(data=centers, aes(x=V1,y=V2, color='Center'), size=52, alpha=.3, legend=FALSE) Source: R-Chart 12
  • 13. R & HANA: LETS TALK CODE Advanced R: Sentiment Analysis on #DKOM and a WordCloud TwitterTag TotalTweetsFetched PositiveTweets NegativeTweets AverageScore TotalTweets Sentiment #DKOM 64 19 9 0 28 0.68 #Populate the list of sentiment words from Hu and Liu (http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html) huliu.pwords <- scan('opinion-lexicon/positive-words.txt', what='character', comment.char=';') huliu.nwords <- scan('opinion-lexicon/negative-words.txt', what='character', comment.char=';') # Add some words huliu.nwords <- c(huliu.nwords,'wtf','wait','waiting','epicfail', 'crash', 'bug', 'bugy', 'bugs', 'slow', 'lie') #Remove some words huliu.nwords <- huliu.nwords[!huliu.nwords=='sap'] huliu.nwords <- huliu.nwords[!huliu.nwords=='cloud'] #which('sap' %in% huliu.nwords) twitterTag <- “#DKOM" # Get 1500 tweets - an individual is only allowed to get 1500 tweets tweets <- searchTwitter(tag, n=1500) tweets.text <- laply(tweets,function(t)t$getText()) sentimentScoreDF <- getSentimentScore(tweets.text) sentimentScoreDF$TwitterTag <- twitterTag # Get rid of tweets that have zero score and seperate +ve from -ve tweets sentimentScoreDF$posTweets <- as.numeric(sentimentScoreDF$SentimentScore >=1) sentimentScoreDF$negTweets <- as.numeric(sentimentScoreDF$SentimentScore <=-1) #Summarize finidings summaryDF <- ddply(sentimentScoreDF,"TwitterTag", summarise, TotalTweetsFetched=length(SentimentScore), PositiveTweets=sum(posTweets), NegativeTweets=sum(negTweets), AverageScore=round(mean(SentimentScore),3)) summaryDF$TotalTweets <- summaryDF$PositiveTweets + summaryDF$NegativeTweets #Get Sentiment Score summaryDF$Sentiment <- round(summaryDF$PositiveTweets/summaryDF$TotalTweets, 2) Blog: Big Four and the Battle of Sentiments - Oracle, IBM, Microsoft and SAP Source: R-Bloggers 13
  • 14. R & HANA: LETS TALK CODE Advanced R: Visualization Using Google Visualizations Libraries library("googleVis") data(Fruits) motionChart <- gvisMotionChart(Fruits, idvar="Fruit", timevar="Year") plot(motionChart) Again, we have barely scratched the surface on the advanced capabilities of R… Source: GoolgeVis 14
  • 15. R & HANA: LETS TALK CODE R – Let’s recap  Basic data munging and visualization  Advanced data munging, mash-ups, aggregation, visualization, sentiment analysis and more…  There is a package for everything you can imagine in R community, if not, go and create one  We are just getting warmed up!!!  Please learn more about R at www.r-project.org wordcloud XML Data.table GoogleVis quantmod twitteR RJSONIO plyr 15
  • 16. R & HANA: LETS TALK CODE R and HANA Integration – Big Data Setup  Airlines sector in the travel industry  22 years (1987-2008) of airlines on time performance data on US airlines  123 million records  Extract Transform Load – ETL work to combine this data with data on airports, data on carriers with this data to setup for Big Data analysis in R and HANA  D20 with 96GB of RAM and 24 Cores  Massive amount of data crunching using R and HANA  Let’s dive in… 16
  • 17. R & HANA: LETS TALK CODE R and HANA Integration – Two Alternative Approaches #1 Outside-In JDBC/ODBC #2 Inside Out SQL SP/ Your R CalcModel Session OR HANA R-Serve (TCP/IP) RHANA Package Embedded R Daemon HANA  Use HANA as “just another database” and connect  From inside HANA, transfer huge amounts of data to using JDBC/ODBC (not recommended) R, leverage the advanced analytical and statistical  Row-Vector-Column capabilities of R and get aggregated results sets  Column-Vector-Row back.  Use SAP RHANA package to transfer large amounts of columnar datasets (recommended) 17
  • 18. R & HANA: LETS TALK CODE R and HANA Integration – Outside In (ODBC Approach) library("RODBC") # open a new connection conn <- odbcConnect("LocalHANA", uid="system", pwd="manager") # create a new table in HANA sqlSave(conn, iris, "iris", rownames = FALSE) # Read iris back from HANA into a data.frame newIris <- sqlFetch(conn, "iris") # Try saiving again with append true. Creates a row sqlSave(conn, newiris, "iris", rownames = FALSE, append = TRUE) # now we have twice as many records in "SYSTEM.DFTEST" table, no, no for # let's add a new column to the data frame HANA!!! newIris$Useless = newIris$Sepal.Length * newdf$Sepal.Width #Drop table sqlDrop(conn, "SYSTEM.newIris") sqlSave(conn, newdf, "SYSTEM.newIris", rownames = FALSE) #Cleanup close(conn) 18
  • 19. R & HANA: LETS TALK CODE R and HANA Integration – Outside In (RHANA Approach) Use RHANA to Read and Write Dataframes Data transfer between two HANA machines library("RHANA") library("RJDBC") #Read 5y historical data for bay area airports from jdbcDriver <- JDBC("com.sap.db.jdbc.Driver", "C:/Program # the HANA server to this loca HANA server Files/sap/hdbclient/ngdbc.jar", identifier.quote="`") sql <- 'select * from airlines_hist_5y_ba' #setup connection to RDKOM server system.time(getDataFrame(conn_server, sql, "airlines5yba"), conn_server <- dbConnect(jdbcDriver, gcFirst=FALSE) "jdbc:sap:rdkom12.dhcp.pal.sap.corp:30015", "system", str(airlines5yba) "manager") #write this airlines data to HANA #setup connection to local host system.time(air08 <- read.csv("airlines/2008-short.csv", conn_local <- dbConnect(jdbcDriver, header=TRUE, sep=",", na.strings = "NA", "jdbc:sap:localhost:30015", "system", "manager") stringsAsFactors=FALSE)) writeDataFrame(conn_server, "air08", "SYSTEM", writeDataFrame(conn_local, "airlines5yba", "SYSTEM", "Airlines_08_1M", force=TRUE) "airlines_hist_5y_ba", force=TRUE) #Read the data sql <- 'select * from "Airlines_08_1M" ' system.time(getDataFrame(conn_server, sql, "airlines081m")) 19
  • 20. R & HANA: LETS TALK CODE R and HANA Integration – Using RHANA to ETL Big Data in HANA Eight Simple Steps Geo-code the airports data using #1 Acquire Airlines Data (12GB) #5 Google GeoCoding APIs (code you saw earlier) Merge the two datasets to extract #2 Load 123M records into a R data.frame #6 major airports information (Not interested in Palo Alto or Hayward airports) #3 Upload this data.frame into HANA using RHANA #7 Upload All Airports, Major Airports in HANA Read the Airport Information (All the #4 airports in US including major airports) in R #8 Ready Set Go on your real-time big-data analytics mission!!! 20
  • 21. R & HANA: LETS TALK CODE R and HANA Integration – Using RHANA to ETL Big Data in HANA R Script (Just 25 Lines of Code) ############################################################################## #################################################################################### #Read all the airlines data, 120M records and upload this into HANA #Read this airlines data using RHANA from HANA ############################################################################## #################################################################################### setwd("C:/Users/i827456/Desktop/DKOM-2012/data") sql <- 'select * from "Airlines_Hist_Perf"' #load libraries getDataFrame(conn, sql, "airlinesDB") library("RHANA") #Read this datafram into data.table library("RJDBC") airlinesDB <- data.table(airlinesDB) library("data.table") setkey(airlineDB, Origin, UniqueCarrier, Year, Month) #Read all CSV files from 1987 - 2008, 120M records, create a data frame to hold 120M records filenames <- list.files(path="../data/airlines/test", pattern="*.csv") #################################################################################### airlines <- data.frame() #ETL - Read the AIRPORT Information, get major aiport informatoin extracted and upload this for(i in filenames) { #transfromed dataset into HANA file <- file.path("../data/airlines/test",i) #################################################################################### airlines <- rbind(airlines, read.csv(file, header=TRUE, sep=",")) allairports <- data.table(read.csv("airlines/airports.csv", header=TRUE, sep=",", } stringsAsFactors=FALSE)) setkey(allairports, iata) #Setup connection to HANA Box - 96Gig, 24Core # Extract major airportcodes from 120M records jdbcDriver <- JDBC("com.sap.db.jdbc.Driver", "C:/Program Files/sap/hdbclient/ngdbc.jar", majorairportscode <- airlinesDB[, union(Origin, Dest)] identifier.quote="`") conn_server <- dbConnect(jdbcDriver, "jdbc:sap:rdkom12.dhcp.pal.sap.corp:30015", "system", #Build Major airports dataset "manager") majorairports <- allairports[majorairportscode] majorairports$iata <- as.character(majorairports$iata) ############################################################################## #Load this airlines data using RHANA into HANA #write both the "All Airports" and "Major Airports" dataframes to Hana ############################################################################## writeDataFrame(conn, "majorairports", "SYSTEM", "Major_Airports", force=TRUE) writeDataFrame(conn, "airlines", "SYSTEM", "Airlines_Hist_Perf", force=TRUE) writeDataFrame(conn, "allairports", "SYSTEM", "All_Airports", force=TRUE) # That is it, 120M records uploaded into HANA. Let's take a look at HANA Studio # Export to local disk write.csv(majorairports, "output/MajorAirports.csv", row.names=FALSE) # Completed - extract, transform load In few hours, you can get your big-data story moving… 21
  • 22. R & HANA: LETS TALK CODE R and HANA Integration – Using RHANA to Read/Aggregate Big Data in HANA ################################################################################## #Analysis Time - Summarize data - Airlines performance by month for all of 22 years ################################################################################# system.time(averageDelayOverYears <- airlinesDB[,list( CarriersCount=length(unique(UniqueCarrier)), FlightCount=prettyNum(length(FlightNum), big.mark=","), AirportCount_Origin=length(unique(Origin)), DistanceFlew_miles=prettyNum(sum(as.numeric(Distance), na.rm=TRUE), big.mark=","), AvgArrDelay_mins=round(mean(ArrDelay, na.rm=TRUE), digits=2), AvgDepDelay_mins=round(mean(DepDelay, na.rm=TRUE), digits=2) ),by=list(Year, Month)][order(Year, Month)] ) ### user system elapsed ### 17.842 1.596 19.487 write.csv(averageDelayOverYears, "output/AggrHViewofAirlines_YM.csv", row.names=FALSE) In just under 20 seconds, 123 MILLION records were analyzed, aggregated and sorted!!! 22
  • 23. R & HANA: LETS TALK CODE R and HANA Integration – Using RHANA to Address a Business Problem with HANA ########################################################################################### #Get monthly data for Southwest (WN) ########################################################################################### system.time(averageDelayOverMonths <- airlinesDB[UniqueCarrier=="WN",list( FlightCount=prettyNum(length(FlightNum), big.mark=","), AirportCount_Origin=length(unique(Origin)), DistanceFlew_miles=prettyNum(sum(as.numeric(Distance), na.rm=TRUE), big.mark=","), AvgArrDelay_mins=round(mean(ArrDelay, na.rm=TRUE), digits=2), AvgDepDelay_mins=round(mean(DepDelay, na.rm=TRUE), digits=2) ),by=list(Year, Month)][order(Year, Month)] ) # user system elapsed # 43.214 6.804 50.143 In 50 seconds, you can address a business question for Southwest on its growth over the last 20 years!!! 23
  • 24. R & HANA: LETS TALK CODE R and HANA Integration – Inside Out Embedding R in HANA Stored Procedure drop table json_out; create table json_out (res CLOB); drop table json_in; create table json_in (req VARCHAR(250)); #Get nearby airports delete from json_in; insert into json_in values('{"lat": 40.730885, nba <- airportsDT[, list(AirportCode=iata, "lng":-73.997383}'); Distance=round(deg.dist(clat, clng, lat, DROP PROCEDURE getNearByAirports; long),digits=0), Gets the create procedure getNearByAirports(IN airports "Major_Airports", Address=paste(airport, city, state, sep=", "), distance on a IN request json_in, OUT response json_out) Lat=lat, Lng=long)][order(Distance)][1:3] earth's spherical LANGUAGE RLANG AS #setkey(nba, AirportCode) surface BEGIN #cat(toJSON(nba), file="outputlog2.txt", sep=",") library(RJSONIO) objs <- apply(nba, 1, toJSON) library(fossil) nbajson <- paste('[', paste(objs, collapse=', '), ']') library(data.table) response <- head(data.frame(RES=nbajson)) req <- fromJSON(as.character(request[[1]])) END; clat <- as.numeric(req["lat"]) call getNearByAirports("Major_Airports", json_in, json_out) with clng <- as.numeric(req["lng"]) overview #cat(paste(clat, clng, sep=","), file="outputlog1.txt", sep=",") select * from json_out; #lat <- quote() #lng <- quote() #Load airports table airportsDT <- data.table(airports) In milliseconds, you can get nearby airports based on lat/lng and advanced functions in R!!! 24
  • 25. R & HANA: LETS TALK CODE R, HANA, XS Engine, JSON/HTML5 Browser / Mobile HTML5 Client-Side JavaScript (JSON) http(s) Internet Comm. Manager XS Engine Server-Side JavaScript (XSJS) HdbNet HANA DB (Index Server) Rserve / R SQL, SQL Script, L, … TCP/IP 25
  • 26. R & HANA: LETS TALK CODE R, HANA, XS Engine, JSON/HTML5 Get Nearby Airports Usecase – R, HANA, XS, JSON/HTML5/Jquery function handleGet() { JSON Response var body = ''; response.setContentType('application/json'); [ var deleteJsonIn = "delete from json_in"; { var insertJsonIn = "insert into json_in values ('{"lat": 40.730885, "lng":- "AirportCode": "LGA", 73.997383}')"; "Distance": "14", var callNBAR = "call getNearByAirports("Major_Airports", json_in, null)"; "Address": "LaGuardia, New York, NY", var deleteJsonOut = "delete from json_out"; var readJsonOut = "select top 1 * from json_out"; "Lat": "40.77724", var conn = getConnection(); "Lng": "-73.87261" var pstmt1 = conn.prepareStatement(deleteJsonIn); }, var pstmt2 = conn.prepareStatement(insertJsonIn); { var cstmt = conn.prepareCall(callNBAR); "AirportCode": "EWR", var pstmt3 = conn.prepareStatement(deleteJsonOut); "Distance": "19", var pstmt4 = conn.prepareStatement(readJsonOut); var r1 = pstmt1.execute(); "Address": "Newark Intl, Newark, NJ", var r2 = pstmt2.execute(); "Lat": "40.69250", var r3 = pstmt3.execute(); "Lng": "-74.16866" conn.commit(); }, var r4 = cstmt.execute(); { var rs = pstmt4.executeQuery(); "AirportCode": "JFK", rs.next(); var jsonout = rs.getClob(1); "Distance": "24", trace.error(jsonout); rs.close(); "Address": "John F Kennedy Intl, New York, NY", pstmt1.close(); pstmt2.close(); cstmt.close(); pstmt3.close(); pstmt4.close(); "Lat": "40.63975", conn.close(); body = JSON.stringify( {"jsonout"} ); response.addBody(body); "Lng": "-73.77893" response.setReturnCode(200); } } ] 26
  • 27. R & HANA: LETS TALK CODE R and HANA Integration – Nothing is impossible! A match made in heaven – the two highly sophisticated in- memory technologies for advanced data analytics and data visualization…. Nothing in the world of analytics and visualization is impossible with R and SAP HANA! 27
  • 28. R & HANA: LETS TALK CODE Airlines App Usecases  A travel app which will advise you:  Which airport you should fly out from or fly into  Which day and which time of the day are the best to fly  Which airlines you should avoid  What is the most efficient way to get to your desitnation – hassle free, trouble free!  Airlines performance data, airports data, weather data, twitter data, financial data and more  Geocode and get nearby airports  Show ontime performance of these airports  Show which day are the best to fly out from a given airport  And on and on and on… 28
  • 29. R & HANA: LETS TALK CODE iPad Demo of HTML5 App (with R and HANA) 29
  • 30. R & HANA: LETS TALK CODE Credits  All the R Bloggers  SAP R Project team  SAP RHANA team  SAP tRex team  SAP XSE team  SAP IT  DKOM Organization Committee  BI and HANA Solution and Product Management Teams 30
  • 31. R & HANA: LETS TALK CODE Legal Disclaimer The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information on this document is not a commitment, promise or legal obligation to deliver any material, code or functionality. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This document is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions. 31
  • 32. R & HANA: LETS TALK CODE © 2011 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects purpose without the express permission of SAP AG. The information contained herein Explorer, StreamWork, and other SAP products and services mentioned herein as well as may be changed without prior notice. their respective logos are trademarks or registered trademarks of SAP AG in Germany Some software products marketed by SAP AG and its distributors contain proprietary and other countries. software components of other software vendors. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and Microsoft Corporation. services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System SAP company. x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, products and services mentioned herein as well as their respective logos are trademarks POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS, or registered trademarks of Sybase, Inc. Sybase is an SAP company. HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, All other product and service names mentioned are the trademarks of their respective AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or companies. Data contained in this document serves informational purposes only. National registered trademarks of IBM Corporation. product specifications may vary. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. The information in this document is proprietary to SAP. No part of this document may be Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or reproduced, copied, or transmitted in any form or for any purpose without the express registered trademarks of Adobe Systems Incorporated in the United States and/or other prior written permission of SAP AG. countries. Oracle and Java are registered trademarks of Oracle and/or its affiliates. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc. HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology. 32