5. Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2013. 10 2014. 08
6. Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
Incubation Status http://incubator.apache.org/projects/zeppelin.html
7. Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
2016. 03 110 Contributors world wide
1355 Stars on github repo
3 Releases
One of the most popular project in ASF
8. Zeppelin
A web-based notebook that enables interactive data analytics. You can
make beautiful data-driven, interactive and collaborative documents with
SQL, Scala and more.
10. Zeppelin
JDBC
Markdown > _ Shell
Interpreter : pluggable layer for language / processing backend integration
20+ interpreters are supported officially
2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters
12. Zeppelin
Interpreter : Easy to extend
public abstract class Interpreter {
public void open();
public void close();
public InterpreterResult interpret(String st, InterpreterContext context);
public void cancel(InterpreterContext context);
public int getProgress(InterpreterContext context);
public List<String> completion(String buf, int cursor);
public FormType getFormType();
public Scheduler getScheduler();
}
{Must have
{Good to have
Advanced {
13. Zeppelin
Notebook Repo : pluggable layer for notebook persistence
4+ Notebook repos are supported officially
2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters
14. Zeppelin
Notebook Repo : Easy to extend
public interface NotebookRepo {
public List<NoteInfo> list() throws IOException;
public Note get(String noteId) throws IOException;
public void save(Note note) throws IOException;
public void remove(String noteId) throws IOException;
public void checkpoint(String noteId, String checkPointName) throws IOException;
public void close();
}
15. Zeppelin
Visualizations : 6 Built-in visualizations comes with pivot
Table Bar Pie Area Line Scatter
Free to draw any customized visualizations inside of notebook
…
16. He liumHe
2
Platform for data analytics application that
makes visualization pluggable and more.
http://issues.apache.org/jira/browse/ZEPPELIN-533
https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal
https://github.com/apache/incubator-zeppelin/pull/836
Proposal
Umbrella issue
Pull request
Makes Zeppelin fly!
17. He liumHe
2
RESTful API Websocket
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Interpreters and Notebook storage are pluggable
18. He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Visualizations
Map
WordCloud
…
We want visualization be pluggable
19. He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC …
FileSystem
AmazonS3
Git
…
Application
Visualizations
Map
WordCloud
…
Resource Pool
SparkContext Flink Environment JDBC connection …
Analytics
ML
…
User object
Extend pluggable visualization to pluggable analytics application
20. He liumHe
2
Helium application is interaction between view, algorithm and resources
= +
View Algorithm
Zeppelin provided Resources
Application
21. He liumHe
2
Zeppelin Server
Web browser
View
Interpreter Process
Algorithm
Resource pool
Resource pool
Resource
pools are
connected
Helium application runs where resource exists
22. Helium Application: Easy to extend
public abstract class Application {
public Application(ApplicationContext context);
public abstract void run(ResourceSet args);
public abstract void unload();
}
He liumHe
2
23. He liumHe
2
Interpreter Notebook Storage
Application
Resource Pool
SparkContext Flink Environment JDBC connection …User object
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
Map
WordCloud
…Maven
Download and load on the fly
Online repository for pluggable modules
24. He liumHe
2
Helium
Registry zeppelin-packages my company + Add
XX
VisualizationWordcloud
Make your table output to word cloud
Install
R Interpreter
R is a free software environment for statistical computing and graphics. It compiles and
runs on a wide variety of UNIX platforms, Windows and MacOS
Install
ZeppelinHub Notebook Storage
Save your notebook in ZeppelinHub.
You can access control and share your notebook online
Install
Registry for pluggable modules
25. He liumHe
2
Conclusion
Helium trying to bring Zeppelin
from notebook to analytics application platform.
You can build and distribute not only your visualizations but also
analytics application that uses cluster resources provided by Zeppelin
interpreters.
Next challenge is enrich the Helium registry.