SlideShare a Scribd company logo
1 of 24
Download to read offline
- 1 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Memory Management in
TIBCO® Enterprise Runtime for R
Michael Sannella
msannell@tibco.com
UseR!2013
- 2 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
What is TERR?
• TERR: TIBCO® Enterprise Runtime for R
• A new R-compatible statistics engine.
• The latest in a family of scripting engines.
– S, S-PLUS®, R, TERR
• Recently released with Spotfire® 5.0.
– Used to implement “predictive analytic tools”
• Free Developer's Edition available at:
www.TIBCOmmunity.com/community/
products/analytics/terr
• Commercially available for Spotfire and custom
integration.
- 3 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Building TERR
• We rebuilt the engine internals from scratch.
– Redesigned data object representation
– Redesigned memory management facilities
– Attempted to address long-standing problems
with S, S-PLUS®, R
• This talk will compare R and TERR designs for:
– Data representation
– Reference counting
– Garbage collection
- 4 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
R: Data Representation
• Basic R data objects have a fixed representation in
memory.
– The contents of a vector is represented as a C array
– Data must be in memory
– Only one representation for a given type
• Example: An R numeric vector is represented as an
object containing a C double* pointer to an array in
memory.
• Can access object elements via simple memory access
operations.
– If USE_RINTERNALS is defined, REAL(pObj) will be compiled
as a simple memory access
- 5 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
TERR: Data Representation
• TERR implements data objects via abstract C++ classes
with one or more representations.
– Manipulate object via C++ methods
– Doesn’t support direct access
– USE_RINTERNALS is not supported
• Scalar vs small vector vs big vector
– Different representations with different length fields
– Matrix of basic types and representations implemented via C++
templates
- 6 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
TERR: Data Representation
• Sequence objects: 1:1e6
> x <- 1:1e6
> object.size(x)
104 bytes
> x[1] <- 0
> object.size(x)
8000088 bytes
• String sequences: as.character(1:1e6)
– Used for data.frame row names
• Two representations for logical vectors:
– Normal logical vectors have 1-byte elements
– C code NEW_LOGICAL(n) creates logical vectors with 4-byte
elements
- 7 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
TERR: Potential Future Data Representation
• Extend preliminary support for out-of-memory vectors.
– Transparent “bigdata” objects
– Only use external files when necessary
• Access shared memory between applications.
– Example: Send data between Spotfire and TERR
• Access database tables.
• Access streaming data.
• Represent deferred evaluation “futures”
– Example: An object representing the result of adding two vectors
B and C, only performing the addition when needed
– See: Riposte runtime, Talbot et al., UseR2012
- 8 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Reference Counts
• R and TERR use reference counts to efficiently
implement object modification semantics.
• If an object has only one reference, we can modify it
without copying it.
x <- rnorm(1e6)
x[1] <- 0 # only one reference: modify in place
y <- x
x[2] <- 0 # two references: copy before modify
- 9 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Reference Count Field
• R objects contain 2-bit reference count
– “named” field
– Easy to make examples that set reference count to “max” value
that will never be decreased
– Max reference count  copy on change
• TERR objects contain 16-bit reference counts
– Tracks number of references up and down
– Identify more objects that can be modified in place, without
copying
- 10 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
One Reference: No Copy
refcnt.0 <- function(n) {
z <- as.numeric(1:n)
for(x in 1:n) {
# only one reference to vector
# so no copy is done
z[x] <- z[x]+1
}
z
}
- 11 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Two References
refcnt.1 <- function(n) {
z <- as.numeric(1:n)
for(x in 1:n) {
z2 <- z # temporarily add
z2 <- NULL # second reference
z[x] <- z[x]+1
}
z
}
- 12 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Two References via Function Call
refcnt.2 <- function(n) {
z <- as.numeric(1:n)
fn <- function(vec,idx) vec[idx]+1
for(x in 1:n) {
# temporarily add second reference
# while calling function
z[x] <- fn(z,x)
}
z
}
- 13 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Reference Count Tests: Performance
- 14 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Reference Count Tests: Performance
- 15 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
R: Garbage Collection
• R uses a mark-and-sweep garbage collector to reclaim
storage for unused objects.
• The R garbage collector is a generational garbage
collector.
– Quick sweep through “young” objects to reclaim temporary
objects
- 16 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
TERR: Garbage Collection
• TERR uses a mark-and-sweep garbage collector along
with reference counts to reclaim storage.
– When an object reference count goes to zero, can immediately
reclaim temporary object
– Calls full mark-and-sweep garbage collection (infrequently)
• After engine initialization:
– Set “static” objects to have the max reference count
– Don’t scan on future garbage collections
- 17 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Row-By-Row Test: Translated SAS®
• Suppose we have a SAS® “data step” with complex logic
for processing records one row at a time.
• Suppose we hand-translate each line directly into R
code.
– Of course, this is not going to be efficient, since R is not suited to
row-by-row operations
– However, the user may not want to entirely recode in a
vectorized form, distorting the original logic
• Question: How fast will this “non-optimized” code run in
R and TERR?
- 18 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Row-By-Row Test: SAS® Code
. . .
ELSE IF event='complaint' AND startDate NE . THEN DO;
lastComplaintDate = date;
weeknum = 1+(date-startDate)/(7*24*60*60);
IF weeknum<2 THEN DO;
comp1 = comp1+1;
END;
ELSE IF weeknum<3 THEN DO;
comp2 = comp2+1;
END;
ELSE DO;
compN = compN+1;
END;
. . .
- 19 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Row-By-Row Test: R Code
. . .
} else if (df.event[i]=="complaint" &&
!is.na(startDate)) {
lastComplaintDate <- df.date[i]
weeknum <- 1 + as.numeric(difftime(df.date[i],
startDate, units="days"))/7
if (weeknum<2) {
comp1 <- comp1+1
} else if (weeknum<3) {
comp2 <- comp2+1
} else {
compN <- compN+1
}
. . .
- 20 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Row-By-Row Test: Performance
- 21 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Row-By-Row Test: Performance
- 22 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Row-By-Row Test: Garbage Collection
• R-3.0.1:
# before processing 1M rows
Garbage collection 67 = 26+6+35 (level 2)
# after processing 1M rows
Garbage collection 40300 = 30966+7745+1589 (level 2)
• TERR:
# before processing 1M rows
40 collections
# after processing 1M rows
42 collections
- 23 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Row-By-Row Test: Analysis
• We strongly suspect that the R performance problem is
due to frequent garbage collections to reclaim temporary
objects.
• In TERR, most temporary objects are immediately
reclaimed using reference counts.
- 24 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED.
Conclusion
• TERR internals were redesigned from scratch to support
better memory management and performance.
• TERR internal data representation supports further
experiments with more efficient objects.
• Full reference counting reduces unnecessary copies,
and improves garbage collection performance.

More Related Content

What's hot

Streaming analytics overview for R
Streaming analytics overview for RStreaming analytics overview for R
Streaming analytics overview for RLou Bajuk
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
 
PaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewPaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewCisco DevNet
 
A Framework for Infrastructure Visibility, Analytics & Operational Intelligence
A Framework for Infrastructure Visibility, Analytics & Operational IntelligenceA Framework for Infrastructure Visibility, Analytics & Operational Intelligence
A Framework for Infrastructure Visibility, Analytics & Operational IntelligenceStephen Collins
 
Wells Fargo Customer Presentation
Wells Fargo Customer PresentationWells Fargo Customer Presentation
Wells Fargo Customer PresentationSplunk
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Sabri Skhiri
 
Power of Splunk Search Processing Language (SPL)
Power of Splunk Search Processing Language (SPL)Power of Splunk Search Processing Language (SPL)
Power of Splunk Search Processing Language (SPL)Splunk
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for DevelopersSplunk
 
Platform for the Research and Analysis of Cybernetic Threats
Platform for the Research and Analysis of Cybernetic ThreatsPlatform for the Research and Analysis of Cybernetic Threats
Platform for the Research and Analysis of Cybernetic ThreatsDataWorks Summit
 
Software audit for acquisition due diligence with nexB
Software audit for acquisition due diligence with nexBSoftware audit for acquisition due diligence with nexB
Software audit for acquisition due diligence with nexBnexB Inc.
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout SessionSplunk
 
Supporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with SplunkSupporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with SplunkErin Sweeney
 
Managing Open Source Software Supply Chains
Managing Open Source Software Supply ChainsManaging Open Source Software Supply Chains
Managing Open Source Software Supply ChainsnexB Inc.
 
Continuous Deployment for Deep Learning
Continuous Deployment for Deep LearningContinuous Deployment for Deep Learning
Continuous Deployment for Deep LearningDatabricks
 
Customer Presentation
Customer PresentationCustomer Presentation
Customer PresentationSplunk
 
Make Streaming IoT Analytics Work for You
Make Streaming IoT Analytics Work for YouMake Streaming IoT Analytics Work for You
Make Streaming IoT Analytics Work for YouHortonworks
 

What's hot (20)

Streaming analytics overview for R
Streaming analytics overview for RStreaming analytics overview for R
Streaming analytics overview for R
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
PaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewPaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overview
 
A Framework for Infrastructure Visibility, Analytics & Operational Intelligence
A Framework for Infrastructure Visibility, Analytics & Operational IntelligenceA Framework for Infrastructure Visibility, Analytics & Operational Intelligence
A Framework for Infrastructure Visibility, Analytics & Operational Intelligence
 
Wells Fargo Customer Presentation
Wells Fargo Customer PresentationWells Fargo Customer Presentation
Wells Fargo Customer Presentation
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
 
Power of Splunk Search Processing Language (SPL)
Power of Splunk Search Processing Language (SPL)Power of Splunk Search Processing Language (SPL)
Power of Splunk Search Processing Language (SPL)
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for Developers
 
Platform for the Research and Analysis of Cybernetic Threats
Platform for the Research and Analysis of Cybernetic ThreatsPlatform for the Research and Analysis of Cybernetic Threats
Platform for the Research and Analysis of Cybernetic Threats
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
 
Software audit for acquisition due diligence with nexB
Software audit for acquisition due diligence with nexBSoftware audit for acquisition due diligence with nexB
Software audit for acquisition due diligence with nexB
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout Session
 
Supporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with SplunkSupporting Enterprise System Rollouts with Splunk
Supporting Enterprise System Rollouts with Splunk
 
Managing Open Source Software Supply Chains
Managing Open Source Software Supply ChainsManaging Open Source Software Supply Chains
Managing Open Source Software Supply Chains
 
Continuous Deployment for Deep Learning
Continuous Deployment for Deep LearningContinuous Deployment for Deep Learning
Continuous Deployment for Deep Learning
 
Customer Presentation
Customer PresentationCustomer Presentation
Customer Presentation
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
I'm being followed by drones
I'm being followed by dronesI'm being followed by drones
I'm being followed by drones
 
Make Streaming IoT Analytics Work for You
Make Streaming IoT Analytics Work for YouMake Streaming IoT Analytics Work for You
Make Streaming IoT Analytics Work for You
 

Similar to Sannella use r2013-terr-memory-management

The Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERRThe Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERRLou Bajuk
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝OSS On Azure
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Databricks
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSonaCharles2
 
Deploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsDeploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsLou Bajuk
 
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...Chief Analytics Officer Forum
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData InfluxData
 

Similar to Sannella use r2013-terr-memory-management (20)

The Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERRThe Compatibility Challenge:Examining R and Developing TERR
The Compatibility Challenge:Examining R and Developing TERR
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
 
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝
Azure HDlnsight에서 R 및 Spark를 이용하여 확장 가능한 머신러닝
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Deploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsDeploying R in BI and Real time Applications
Deploying R in BI and Real time Applications
 
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
TIBCO presentation at the Chief Analytics Officer Forum East Coast 2016 (#CAO...
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
 

More from Lou Bajuk

R Consortium update for EARL Boston Oct 2017
R Consortium update for EARL Boston Oct 2017R Consortium update for EARL Boston Oct 2017
R Consortium update for EARL Boston Oct 2017Lou Bajuk
 
Reusing and Managing R models in an Enterprise
Reusing and Managing  R models in an EnterpriseReusing and Managing  R models in an Enterprise
Reusing and Managing R models in an EnterpriseLou Bajuk
 
R consortium update EARL London Sept 2017
R consortium update EARL London Sept 2017R consortium update EARL London Sept 2017
R consortium update EARL London Sept 2017Lou Bajuk
 
Making Data Science accessible to a wider audience
Making Data Science accessible to a wider audienceMaking Data Science accessible to a wider audience
Making Data Science accessible to a wider audienceLou Bajuk
 
R Consortium Update for EARL June 2017
R Consortium Update for EARL June 2017R Consortium Update for EARL June 2017
R Consortium Update for EARL June 2017Lou Bajuk
 
Tibco streaming analytics overview and roadmap
Tibco streaming analytics overview and roadmapTibco streaming analytics overview and roadmap
Tibco streaming analytics overview and roadmapLou Bajuk
 
Embracing data science for smarter analytics apps
Embracing data science for smarter analytics appsEmbracing data science for smarter analytics apps
Embracing data science for smarter analytics appsLou Bajuk
 
EARL Sept 2016 R consortium
EARL Sept 2016 R consortiumEARL Sept 2016 R consortium
EARL Sept 2016 R consortiumLou Bajuk
 
Real time applications using the R Language
Real time applications using the R LanguageReal time applications using the R Language
Real time applications using the R LanguageLou Bajuk
 
The Importance of an Analytics Platform
The Importance of an Analytics PlatformThe Importance of an Analytics Platform
The Importance of an Analytics PlatformLou Bajuk
 
Software Testing and the R language
Software Testing and the R languageSoftware Testing and the R language
Software Testing and the R languageLou Bajuk
 
Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Lou Bajuk
 

More from Lou Bajuk (12)

R Consortium update for EARL Boston Oct 2017
R Consortium update for EARL Boston Oct 2017R Consortium update for EARL Boston Oct 2017
R Consortium update for EARL Boston Oct 2017
 
Reusing and Managing R models in an Enterprise
Reusing and Managing  R models in an EnterpriseReusing and Managing  R models in an Enterprise
Reusing and Managing R models in an Enterprise
 
R consortium update EARL London Sept 2017
R consortium update EARL London Sept 2017R consortium update EARL London Sept 2017
R consortium update EARL London Sept 2017
 
Making Data Science accessible to a wider audience
Making Data Science accessible to a wider audienceMaking Data Science accessible to a wider audience
Making Data Science accessible to a wider audience
 
R Consortium Update for EARL June 2017
R Consortium Update for EARL June 2017R Consortium Update for EARL June 2017
R Consortium Update for EARL June 2017
 
Tibco streaming analytics overview and roadmap
Tibco streaming analytics overview and roadmapTibco streaming analytics overview and roadmap
Tibco streaming analytics overview and roadmap
 
Embracing data science for smarter analytics apps
Embracing data science for smarter analytics appsEmbracing data science for smarter analytics apps
Embracing data science for smarter analytics apps
 
EARL Sept 2016 R consortium
EARL Sept 2016 R consortiumEARL Sept 2016 R consortium
EARL Sept 2016 R consortium
 
Real time applications using the R Language
Real time applications using the R LanguageReal time applications using the R Language
Real time applications using the R Language
 
The Importance of an Analytics Platform
The Importance of an Analytics PlatformThe Importance of an Analytics Platform
The Importance of an Analytics Platform
 
Software Testing and the R language
Software Testing and the R languageSoftware Testing and the R language
Software Testing and the R language
 
Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)
 

Recently uploaded

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 

Sannella use r2013-terr-memory-management

  • 1. - 1 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Memory Management in TIBCO® Enterprise Runtime for R Michael Sannella msannell@tibco.com UseR!2013
  • 2. - 2 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. What is TERR? • TERR: TIBCO® Enterprise Runtime for R • A new R-compatible statistics engine. • The latest in a family of scripting engines. – S, S-PLUS®, R, TERR • Recently released with Spotfire® 5.0. – Used to implement “predictive analytic tools” • Free Developer's Edition available at: www.TIBCOmmunity.com/community/ products/analytics/terr • Commercially available for Spotfire and custom integration.
  • 3. - 3 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Building TERR • We rebuilt the engine internals from scratch. – Redesigned data object representation – Redesigned memory management facilities – Attempted to address long-standing problems with S, S-PLUS®, R • This talk will compare R and TERR designs for: – Data representation – Reference counting – Garbage collection
  • 4. - 4 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. R: Data Representation • Basic R data objects have a fixed representation in memory. – The contents of a vector is represented as a C array – Data must be in memory – Only one representation for a given type • Example: An R numeric vector is represented as an object containing a C double* pointer to an array in memory. • Can access object elements via simple memory access operations. – If USE_RINTERNALS is defined, REAL(pObj) will be compiled as a simple memory access
  • 5. - 5 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. TERR: Data Representation • TERR implements data objects via abstract C++ classes with one or more representations. – Manipulate object via C++ methods – Doesn’t support direct access – USE_RINTERNALS is not supported • Scalar vs small vector vs big vector – Different representations with different length fields – Matrix of basic types and representations implemented via C++ templates
  • 6. - 6 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. TERR: Data Representation • Sequence objects: 1:1e6 > x <- 1:1e6 > object.size(x) 104 bytes > x[1] <- 0 > object.size(x) 8000088 bytes • String sequences: as.character(1:1e6) – Used for data.frame row names • Two representations for logical vectors: – Normal logical vectors have 1-byte elements – C code NEW_LOGICAL(n) creates logical vectors with 4-byte elements
  • 7. - 7 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. TERR: Potential Future Data Representation • Extend preliminary support for out-of-memory vectors. – Transparent “bigdata” objects – Only use external files when necessary • Access shared memory between applications. – Example: Send data between Spotfire and TERR • Access database tables. • Access streaming data. • Represent deferred evaluation “futures” – Example: An object representing the result of adding two vectors B and C, only performing the addition when needed – See: Riposte runtime, Talbot et al., UseR2012
  • 8. - 8 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Reference Counts • R and TERR use reference counts to efficiently implement object modification semantics. • If an object has only one reference, we can modify it without copying it. x <- rnorm(1e6) x[1] <- 0 # only one reference: modify in place y <- x x[2] <- 0 # two references: copy before modify
  • 9. - 9 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Reference Count Field • R objects contain 2-bit reference count – “named” field – Easy to make examples that set reference count to “max” value that will never be decreased – Max reference count  copy on change • TERR objects contain 16-bit reference counts – Tracks number of references up and down – Identify more objects that can be modified in place, without copying
  • 10. - 10 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. One Reference: No Copy refcnt.0 <- function(n) { z <- as.numeric(1:n) for(x in 1:n) { # only one reference to vector # so no copy is done z[x] <- z[x]+1 } z }
  • 11. - 11 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Two References refcnt.1 <- function(n) { z <- as.numeric(1:n) for(x in 1:n) { z2 <- z # temporarily add z2 <- NULL # second reference z[x] <- z[x]+1 } z }
  • 12. - 12 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Two References via Function Call refcnt.2 <- function(n) { z <- as.numeric(1:n) fn <- function(vec,idx) vec[idx]+1 for(x in 1:n) { # temporarily add second reference # while calling function z[x] <- fn(z,x) } z }
  • 13. - 13 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Reference Count Tests: Performance
  • 14. - 14 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Reference Count Tests: Performance
  • 15. - 15 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. R: Garbage Collection • R uses a mark-and-sweep garbage collector to reclaim storage for unused objects. • The R garbage collector is a generational garbage collector. – Quick sweep through “young” objects to reclaim temporary objects
  • 16. - 16 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. TERR: Garbage Collection • TERR uses a mark-and-sweep garbage collector along with reference counts to reclaim storage. – When an object reference count goes to zero, can immediately reclaim temporary object – Calls full mark-and-sweep garbage collection (infrequently) • After engine initialization: – Set “static” objects to have the max reference count – Don’t scan on future garbage collections
  • 17. - 17 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Row-By-Row Test: Translated SAS® • Suppose we have a SAS® “data step” with complex logic for processing records one row at a time. • Suppose we hand-translate each line directly into R code. – Of course, this is not going to be efficient, since R is not suited to row-by-row operations – However, the user may not want to entirely recode in a vectorized form, distorting the original logic • Question: How fast will this “non-optimized” code run in R and TERR?
  • 18. - 18 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Row-By-Row Test: SAS® Code . . . ELSE IF event='complaint' AND startDate NE . THEN DO; lastComplaintDate = date; weeknum = 1+(date-startDate)/(7*24*60*60); IF weeknum<2 THEN DO; comp1 = comp1+1; END; ELSE IF weeknum<3 THEN DO; comp2 = comp2+1; END; ELSE DO; compN = compN+1; END; . . .
  • 19. - 19 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Row-By-Row Test: R Code . . . } else if (df.event[i]=="complaint" && !is.na(startDate)) { lastComplaintDate <- df.date[i] weeknum <- 1 + as.numeric(difftime(df.date[i], startDate, units="days"))/7 if (weeknum<2) { comp1 <- comp1+1 } else if (weeknum<3) { comp2 <- comp2+1 } else { compN <- compN+1 } . . .
  • 20. - 20 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Row-By-Row Test: Performance
  • 21. - 21 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Row-By-Row Test: Performance
  • 22. - 22 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Row-By-Row Test: Garbage Collection • R-3.0.1: # before processing 1M rows Garbage collection 67 = 26+6+35 (level 2) # after processing 1M rows Garbage collection 40300 = 30966+7745+1589 (level 2) • TERR: # before processing 1M rows 40 collections # after processing 1M rows 42 collections
  • 23. - 23 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Row-By-Row Test: Analysis • We strongly suspect that the R performance problem is due to frequent garbage collections to reclaim temporary objects. • In TERR, most temporary objects are immediately reclaimed using reference counts.
  • 24. - 24 - Copyright © 1999-2013 TIBCO Software Inc. ALL RIGHTS RESERVED. Conclusion • TERR internals were redesigned from scratch to support better memory management and performance. • TERR internal data representation supports further experiments with more efficient objects. • Full reference counting reduces unnecessary copies, and improves garbage collection performance.