SlideShare uma empresa Scribd logo
1 de 53
YARN: HADOOP
BEYOND MAPREDUCE
Andreas Neumann @anew68
TerenceYim @chtyim
Andreas Neumann, Terence Yim
IN THIS TALK, YOU WILL
• Learn about Hadoop
• Learn about other distributed compute paradigms
• Learn about YARN and how it works
• Learn how to get more value out of your Hadoop cluster
• Learn about Weave
Andreas Neumann, Terence Yim
A WEB-APP
Database
Presentation
(stateless)
Business Logic
(stateless)
Andreas Neumann, Terence Yim
A SCALABLE WEB-APP
Database
Presentation
(stateless)
Business Logic
(stateless)
Andreas Neumann, Terence Yim
A LOGGING WEB-APP
Presentation
(stateless)
Business Logic
(stateless) log log log
Database
Andreas Neumann, Terence Yim
WHAT TO DO WITH MY LOGS?
Analyzing web logs can create real business value
• Optimize for observed user behavior
• Personalize each user’s experience
• ...
Many web logs are simply being discarded
• Too expensive?
• Too Hard to analyze?
• Too much heavy lifting?
Andreas Neumann, Terence Yim
WHAT TO DO WITH MY LOGS?
Andreas Neumann, Terence Yim
APACHE
• Open source and free
• Runs on commodity hardware - cheap to store
• Programming Paradigm: Map/Reduce (Google 2004)
• Batch processing of very large files in a distributed file system
• Simple yet powerful
• Turn your web app into a data-driven app
Andreas Neumann, Terence Yim
A MAP/REDUCE APP
Reducers
Mappers
(shuffle)
split split split
part part part
Andreas Neumann, Terence Yim
MAP/REDUCE CLUSTER
split split split
part part part
split split split
part part part
split split
part part
split split
part part
Andreas Neumann, Terence Yim
I HAVE HADOOP - WHAT NEXT?
What if:
• Map/Reduce is not the only thing I want to run?
• I would like to do real-time analysis?
• Or run a message-passing algorithm over my data?
• My cluster has more compute capacity than my big data needs?
Can I utilize its idle resources for other, compute-intensive tasks?
Andreas Neumann, Terence Yim
A MESSAGE PASSING (MPI) APP
data
data data
data data
data
Andreas Neumann, Terence Yim
A STREAMING APP
Database
events
Andreas Neumann, Terence Yim
A DISTRIBUTED LOAD TEST
test testtest
test test test
test test test
test
test
test
Web
Service
Andreas Neumann, Terence Yim
HADOOP AT CONTINUUITY
Continuuity AppFabric
• Developer centric big data app platform
• Runs many different types of jobs in Hadoop cluster
• Real-time stream processing
• Ad-hoc OLAP queries
• Map/Reduce
Andreas Neumann, Terence Yim
A MULTI-PURPOSE CLUSTER
log log log
Database
spli
t spli spli
par par par
data
data data
data data
data
test testtest
test test test
test test test
test
test
test
Web
Service
Database
events
Andreas Neumann, Terence Yim
THE ANSWER: YARN
• The Resource Manager of Hadoop 2.0
• Separates resource management from programming paradigm
• Allows (almost) all kinds of distributed application in Hadoop cluster
• Multi-tenant, secure
• Scales to many thousands of nodes
• Pluggable scheduling algorithms
Andreas Neumann, Terence Yim
YARN APPLICATION MASTER
log log log
Database
spli
t spli spli
par par par
data
data data
data data
data
test testtest
test test test
test test test
test
test
test
Web
Service
Database
events
AM AM AM
AM
AM
YARN
Resource
Manager
Andreas Neumann, Terence Yim
YARN - HOW IT WORKS
• Every node (machine) in the cluster has several slots.
• A node manager runs on each node to manages the node’s slots.
• The resource manager assigns the slots of the cluster to
applications.
• Applications must have a master.
• The master requests slots from the resource manager and starts
tasks.
Andreas Neumann, Terence Yim
YARN - HOW IT WORKS
Protocols
1) Client-RM: Submit the app master
2) RM-NM: Start the app master
3) AM-RM: Request + release containers
4) RM-NM: Start tasks in containers
YARN
Resource
Manager
Node Mgr
AM
Node Mgr
Node MgrNode Mgr
Task Task
Task
Task
TaskTask
YARN
Client
1)
2)
3)
4)
Andreas Neumann, Terence Yim
BACKGROUND: HADOOP+HDFS
Local
file system
Local
file system
. . .
Name
Node
HDFS distributed file system
Data
Node
• Every node contributes part of its
local file system to HDFS.
• Tasks can only depend on the
local file system
(JVM class path does not
understand HDFS protocol)
Data
Node
Andreas Neumann, Terence Yim
BACKGROUND: HADOOP+HDFS
YARN
Resource
Manager AM
YARN
Client
Local
file system
Local
file system
. . .
Name
Node
HDFS distributed file system
Local
file system
AM.jar
Node
Mgr
Node
Mgr
Data
Node
Data
Node
X
Andreas Neumann, Terence Yim
BACKGROUND: HADOOP+HDFS
YARN
Resource
Manager AM
YARN
Client
Local
file system
. . .
Name
Node
HDFS distributed file system
Local
file system
AM.jar
1) copy
to HDFS
AM.jar
2) copy
to local
AM.jar
3)
Node
Mgr
Node
Mgr
Data
Node
Data
Node
Andreas Neumann, Terence Yim
WRITING THE YARN CLIENT
1) Connect to the Resource Manager.
2) Request a new application ID.
3) Create a submission context and a container launch context.
4) Define the local resources for the AM.
5) Define the environment for the AM.
6) Define the command to run for the AM.
7) Define the resource limits for the AM.
8) Submit the request to start the app master.
Andreas Neumann, Terence Yim
WRITING THE YARN CLIENT
Connect to the Resource Manager:
YarnConfiguration yarnConf = new YarnConfiguration(conf);
InetSocketAddress rmAddress =
NetUtils.createSocketAddr(yarnConf.get(
YarnConfiguration.RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_ADDRESS));
LOG.info("Connecting to ResourceManager at " + rmAddress);
configuration rmServerConf = new Configuration(conf);
rmServerConf.setClass(
YarnConfiguration.YARN_SECURITY_INFO,
ClientRMSecurityInfo.class, SecurityInfo.class);
ClientRMProtocol resourceManager = ((ClientRMProtocol) rpc.getProxy(
ClientRMProtocol.class, rmAddress, appsManagerServerConf));
Andreas Neumann, Terence Yim
WRITING THE YARN CLIENT
Request an application ID, create a submission context and a launch
context:
GetNewApplicationRequest request =
Records.newRecord(GetNewApplicationRequest.class);
GetNewApplicationResponse response =
resourceManager.getNewApplication(request);
LOG.info("Got new ApplicationId=" + response.getApplicationId());
ApplicationSubmissionContext appContext =
Records.newRecord(ApplicationSubmissionContext.class);
appContext.setApplicationId(appId);
appContext.setApplicationName(appName);
ContainerLaunchContext amContainer =
Andreas Neumann, Terence Yim
WRITING THE YARN CLIENT
Define the local resources:
Map<String, LocalResource> localResources = Maps.newHashMap();
// assume the AM jar is here:
Path jarPath; // <- known path to jar file
// Create a resource with location, time stamp and file length
LocalResource amJarRsrc = Records.newRecord(LocalResource.class);
amJarRsrc.setType(LocalResourceType.FILE);
amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath));
FileStatus jarStatus = fs.getFileStatus(jarPath);
amJarRsrc.setTimestamp(jarStatus.getModificationTime());
amJarRsrc.setSize(jarStatus.getLen());
localResources.put("AppMaster.jar", amJarRsrc);
amContainer.setLocalResources(localResources);
Andreas Neumann, Terence Yim
WRITING THE YARN CLIENT
Define the environment:
// Set up the environment needed for the launch context
Map<String, String> env = new HashMap<String, String>();
// Setup the classpath needed.
// Assuming our classes are available as local resources in the
// working directory, we need to append "." to the path.
String classPathEnv = "$CLASSPATH:./*:";
env.put("CLASSPATH", classPathEnv);
// setup more environment
env.put(...);
amContainer.setEnvironment(env);
Andreas Neumann, Terence Yim
WRITING THE YARN CLIENT
Define the command to run for the AM:
// Construct the command to be executed on the launched container
String command =
"${JAVA_HOME}" + /bin/java" +
" MyAppMaster" +
" arg1 arg2 arg3" +
" 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +
" 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";
List<String> commands = new ArrayList<String>();
commands.add(command);
// Set the commands into the container spec
amContainer.setCommands(commands);
Andreas Neumann, Terence Yim
WRITING THE YARN CLIENT
Define the resource limits for the AM:
// Define the resource requirements for the container.
// For now, YARN only supports memory constraints.
// If the process takes more memory, it is killed by the framework.
Resource capability = Records.newRecord(Resource.class);
capability.setMemory(amMemory);
amContainer.setResource(capability);
// Set the container launch content into the submission context
appContext.setAMContainerSpec(amContainer);
Andreas Neumann, Terence Yim
WRITING THE YARN CLIENT
Submit the request to start the app master:
// Create the request to send to the Resource Manager
SubmitApplicationRequest appRequest =
Records.newRecord(SubmitApplicationRequest.class);
appRequest.setApplicationSubmissionContext(appContext);
// Submit the application to the ApplicationsManager
resourceManager.submitApplication(appRequest);
Andreas Neumann, Terence Yim
YARN: APPLICATION MASTER
YARN
Resource
Manager AM
YARN
Client
Local
file system
Local
file system
. . .
Name
Node
HDFS distributed file system
Node
Mgr
Node
Mgr
Data
Node
Data
Node
Andreas Neumann, Terence Yim
YARN - APPLICATION MASTER
• Now we have an Application Master running in a container.
• Next, the master starts the application’s tasks in containers
• For each task, similar procedure as starting the master
Andreas Neumann, Terence Yim
YARN: APPLICATION MASTER
YARN
Resource
Manager AM
YARN
Client
Local
file system
Local
file system
. . .
Name
Node
HDFS distributed file system
Node
Mgr
Node
Mgr
Data
Node
Data
Node
Task
TaskTask
Andreas Neumann, Terence Yim
YARN - APPLICATION MASTER
1) Connect with Resource Manager and register itself.
2) Loop:
• Send request to Resource Manager, to:
a) Send heartbeat
b) Request containers
c) Release containers
• Receive Response from Resource Manager
a) Receive containers
b) Receive notification of containers that terminated
• For each container received, send request to Node Manager to start task.
Andreas Neumann, Terence Yim
YARN - APPLICATION MASTER
• Master should terminate after all containers have terminated.
• If master crashes (or fails to send heart beat), all containers are killed.
• In the future, YARN will support restart of the master.
• Starting a task in a container
• Very similar to YARN client starting the master.
• Node manager starts and monitors task.
• If task crashes, Node Manager informs Resource Manager
• Resource Manager informs master in next response
• Master must still release the container.
Andreas Neumann, Terence Yim
YARN - APPLICATION MASTER
• Resource Manager assigns containers asynchronously:
• Requested containers are returned at the earliest in the next
call.
• Master must send empty requests until all it receives all
containers.
• Subsequent requests are incremental and can ask for additional
containers.
• Master must keep track of what it has requested and received.
Andreas Neumann, Terence Yim
YARN - APPLICATION LOGS
YARN
Resource
Manager AM
Task
TaskTask
YARN
Client
Local
file system
. . .
Name
Node
HDFS distributed file system
2) copy
to HDFS
logs
4) log access through Name Node
logs
1) log4j,
stdout,
stderr
3)
query
RM for
logs
location
Node
Mgr
Node
Mgr
Data
Node
Data
Node
Andreas Neumann, Terence Yim
EXAMPLE - DISTRIBUTED SHELL
• Example application in the YARN/Hadoop distribution
• Runs a shell command in a number of containers
• hadoop-2.0.0-alpha: 1681 lines of code
• some simplification in 2.0.3-alpa: 1543 lines
• Conclusion: YARN is complex!
Andreas Neumann, Terence Yim
YARN IS NOT SIMPLE!
Andreas Neumann, Terence Yim
YARN - SHORTCOMINGS
• Complexity
• Protocols are at very low level and very verbose.
• Client must prepare all dependent jars on HDFS.
• Client must set up environment, class path etc.
• Logs are only collected after application terminates
• What about long-running apps?
• Applications don’t survive master crash: SPOF!
• No built-in communication between containers and master.
• Hard to debug
Andreas Neumann, Terence Yim
TYPICAL YARN APP
• Many distributed applications are simple.
• For example: Distributed load test for a queuing system:
• “Run n producers and m consumers for k seconds”
• Neither producer nor consumer know that they run in YARN
• This would be easy if we run them as threads in a single JVM
• Can we make YARN as simple as Java Threads?
Andreas Neumann, Terence Yim
TYPICAL NEEDS ...
... of a distributed application:
• Monitor and restart containers
• Collect logs and metrics
• Elastic scale (dynamic adding/removing of instances)
• Long-running jobs
Andreas Neumann, Terence Yim
WEAVE IT!
Andreas Neumann, Terence Yim
HERE COMES WEAVE
• Open-Source, Apache License
• Does not replace YARN, but complements it
• Generic App Master
+ a DSL for specifying applications
+ an API for writing container tasks
+ built-in service discovery, logging, metrics, monitoring
Andreas Neumann, Terence Yim
SIMPLICITY
140. public class ApplicationMaster {
141.
.
150. private AMRMClientAsync resourceManager;
.
435. public boolean run() throws YarnRemoteException {
436. LOG.info(“Starting ApplicationMaster”);
.
498. }
.
500. public void finish() {
504. for(Thread launchThread: launchThreads) {
. ...
511. }
517. FinalApplicationStatus appStatus;
518. String appMessage = null;
538. }
.
790. public ContainerRequest setupContainerAskForRM(int numContainers) {
.
800. Resource capability = Records.newRecord(Resource.class);
801. capability.setMemory(containerMemory);
.
803. ContainerRequest request = new ContainerRequest(capability, ..);
807. }
808. }
.
.
1102. public class Client extends YarnClientImpl {
.
1336. public boolean run() throws IOException {
.
1354. GetNewApplicationResponse newApp = super.getNewApplication();
1355. ApplicationId appId = newApp.getApplicationId();
.
1583. return monitorApplication(appId);
1585. }
.
1594. private boolean monitorApplication(ApplicationId appId) throws ... {
.
1596. while(true) {
.
1606. ApplicationReport report = super.getApplicationReport(appId);
.
1647. }
1648. }
.
1657. private void forceKillApplication(ApplicationId appId) throws ... {
.
1664. super.killApplication(appId);
.
1665. }
.
1667. }
001. public class DistributedShell extends AbstractWeaveRunnable {
002. static Logger LOG = LoggerFactory.getLogger(DistributedShell.class);
003. private String command;
004.
005. public DistributedShell(String command) {
006. this.command = command;
007. }
008.
009. @Override
010. public WeaveRunnableSpecification configure() {
011. return WeaveRunnableSpecification.builder().with()
012. .setName("Executor " + command)
013. .withArguments(ImmutableMap.of("cmd", this.command))
014. .build();
015. }
016.
017. @Override
018. public void initialize(WeaveContext context) {
019. this.command = context.getSpecification().getArguments("cmd");
020. }
021.
022. @Override
023. public void run() {
024. try {
025. Process process = Runtime.getRuntime().exec(this.command);
026. InputStream stdin = proc.getInputStream();
027. BufferedReader br = new BufferedReader(new InputStreamReader(stdin));
028.
029. String line;
030. while( (line = br.readline()) != null) {
031. LOG.info(line);
032. }
033. } catch (Throwable t) {
034. LOG.error(t.getMessage());
035. }
036. }
037. }
038.
039. public static main(String[] args) {
040. String zkStr = args[0];
041. String command = args[1];
042. yarnConfig = ...
043. WeaveRunnerService weaveService = new YarnWeaveRunnerService(yarnConfig, zkStr);
044. weaveService.startAndWait();
045. WeaveController controller = weaveService.prepare(new DistributedShell(command))
046. .addLogHandler(new PrinterLogHandler(new PrintWriter(System.out)))
047. .start();
048. }
Distributed Shell - Vanilla YARN Distributed Shell - WEAVE
~ 500 lines ~ 50 lines
Andreas Neumann, Terence Yim
WEAVE
YARN
Resource
Manager
Node Mgr Node Mgr
Node MgrNode Mgr
Task
Weave
Client
...
AM+
Kafka
Task
Task
Task
Task Task
Weave
Runnable
Weave
Runner
Weave
Runnable
log
ZooKeeper
ZooKeeper
ZooKeeper
state +
messages
log stream
This is the only
programming
interface you need
Andreas Neumann, Terence Yim
RUNNABLE
Define a task:
public class EchoServer implements WeaveRunnable
{
static Logger LOG = ...
private WeaveContext context;
//....
@Override
public void run() {
ServerSocket serverSocket = new ServerSocket(0);
context.announce("echo", serverSocket.getLocalPort());
while (isRunning()) {
Socket socket = serverSocket.accept();
//...
}
}
}
WeaveRunnable = single task
SLF4J can be used for logging
- collected via Kafka
- available to client
WeaveRunnable can be run in
- Thread
- YARN container
Service endpoint announcement
Andreas Neumann, Terence Yim
RUNNER
Start a task:
WeaveRunner runner = ...;
WeaveController controller =
runner.prepare(new EchoServer())
.addLogHandler(new PrinterLogHandler(
new PrintWriter(System.out)
)
).start();
//...
controller.sendCommand(
Commands.create("stat", "incoming"));
//...
controller.stop();
WeaveRunner
- packages and its dependencies
- plus specified resources
- attaches log collector to all containers
WeaveController is used to manage
the lifecycle of a runnable or application
Controller also provides ability to
send commands to running containers
Andreas Neumann, Terence Yim
APPLICATION
Define an application:
public class WebServer implements WeaveApplication {
@Override
public WeaveSpecification configure() {
return WeaveSpecification.Builder.with()
.setName(“Jetty WebServer”)
.withRunnables()
.add(new JettyWebServer())
.withLocalFiles()
.add(“www-html”, new File(“/var/html”))
.apply()
.add(new LogsCollector())
.anyOrder()
.build();
}
}
WeaveController controller =
runner.prepare(new WebServer()).start();
WeaveApplication = collection of tasks
Application is given by
a WeaveSpecification
Application can specify additional local
directories to be made available.
Weave starts containers in no order.
WeaveRunner can also start
an application
Andreas Neumann, Terence Yim
APPLICATION
Define an application with order:
public class DataApplication implements WeaveApplication
{
@Override
public WeaveSpecification configure() {
return WeaveSpecification.Builder.with()
.setName(“Cool Data Application”)
.withRunnables()
.add(“reader”, new InputReader())
.add(“processor”, new Processor())
.add(“writer”, new OutputWriter())
.order()
.first(“reader”)
.next(“processor”)
.next(“writer”)
.build();
}
}
starts containers in given order.
Andreas Neumann, Terence Yim
SUMMARY
• Web apps generate data.
• Data driven apps with Hadoop.
• More value out of Hadoop with YARN.
• Easier with Weave.
• Now every Java developer can write distributed apps.
Andreas Neumann, Terence Yim
THANK YOU
https://github.com/continuuity/weave
http://hadoop.apache.org/docs/current/hadoop-yarn/
hadoop-yarn-site/YARN.html
Yes, we are hiring:
careers@continuuity.com

Mais conteúdo relacionado

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Destaque

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

YARN: Hadoop Beyond MapReduce - Andreas Neumann

  • 1. YARN: HADOOP BEYOND MAPREDUCE Andreas Neumann @anew68 TerenceYim @chtyim
  • 2. Andreas Neumann, Terence Yim IN THIS TALK, YOU WILL • Learn about Hadoop • Learn about other distributed compute paradigms • Learn about YARN and how it works • Learn how to get more value out of your Hadoop cluster • Learn about Weave
  • 3. Andreas Neumann, Terence Yim A WEB-APP Database Presentation (stateless) Business Logic (stateless)
  • 4. Andreas Neumann, Terence Yim A SCALABLE WEB-APP Database Presentation (stateless) Business Logic (stateless)
  • 5. Andreas Neumann, Terence Yim A LOGGING WEB-APP Presentation (stateless) Business Logic (stateless) log log log Database
  • 6. Andreas Neumann, Terence Yim WHAT TO DO WITH MY LOGS? Analyzing web logs can create real business value • Optimize for observed user behavior • Personalize each user’s experience • ... Many web logs are simply being discarded • Too expensive? • Too Hard to analyze? • Too much heavy lifting?
  • 7. Andreas Neumann, Terence Yim WHAT TO DO WITH MY LOGS?
  • 8. Andreas Neumann, Terence Yim APACHE • Open source and free • Runs on commodity hardware - cheap to store • Programming Paradigm: Map/Reduce (Google 2004) • Batch processing of very large files in a distributed file system • Simple yet powerful • Turn your web app into a data-driven app
  • 9. Andreas Neumann, Terence Yim A MAP/REDUCE APP Reducers Mappers (shuffle) split split split part part part
  • 10. Andreas Neumann, Terence Yim MAP/REDUCE CLUSTER split split split part part part split split split part part part split split part part split split part part
  • 11. Andreas Neumann, Terence Yim I HAVE HADOOP - WHAT NEXT? What if: • Map/Reduce is not the only thing I want to run? • I would like to do real-time analysis? • Or run a message-passing algorithm over my data? • My cluster has more compute capacity than my big data needs? Can I utilize its idle resources for other, compute-intensive tasks?
  • 12. Andreas Neumann, Terence Yim A MESSAGE PASSING (MPI) APP data data data data data data
  • 13. Andreas Neumann, Terence Yim A STREAMING APP Database events
  • 14. Andreas Neumann, Terence Yim A DISTRIBUTED LOAD TEST test testtest test test test test test test test test test Web Service
  • 15. Andreas Neumann, Terence Yim HADOOP AT CONTINUUITY Continuuity AppFabric • Developer centric big data app platform • Runs many different types of jobs in Hadoop cluster • Real-time stream processing • Ad-hoc OLAP queries • Map/Reduce
  • 16. Andreas Neumann, Terence Yim A MULTI-PURPOSE CLUSTER log log log Database spli t spli spli par par par data data data data data data test testtest test test test test test test test test test Web Service Database events
  • 17. Andreas Neumann, Terence Yim THE ANSWER: YARN • The Resource Manager of Hadoop 2.0 • Separates resource management from programming paradigm • Allows (almost) all kinds of distributed application in Hadoop cluster • Multi-tenant, secure • Scales to many thousands of nodes • Pluggable scheduling algorithms
  • 18. Andreas Neumann, Terence Yim YARN APPLICATION MASTER log log log Database spli t spli spli par par par data data data data data data test testtest test test test test test test test test test Web Service Database events AM AM AM AM AM YARN Resource Manager
  • 19. Andreas Neumann, Terence Yim YARN - HOW IT WORKS • Every node (machine) in the cluster has several slots. • A node manager runs on each node to manages the node’s slots. • The resource manager assigns the slots of the cluster to applications. • Applications must have a master. • The master requests slots from the resource manager and starts tasks.
  • 20. Andreas Neumann, Terence Yim YARN - HOW IT WORKS Protocols 1) Client-RM: Submit the app master 2) RM-NM: Start the app master 3) AM-RM: Request + release containers 4) RM-NM: Start tasks in containers YARN Resource Manager Node Mgr AM Node Mgr Node MgrNode Mgr Task Task Task Task TaskTask YARN Client 1) 2) 3) 4)
  • 21. Andreas Neumann, Terence Yim BACKGROUND: HADOOP+HDFS Local file system Local file system . . . Name Node HDFS distributed file system Data Node • Every node contributes part of its local file system to HDFS. • Tasks can only depend on the local file system (JVM class path does not understand HDFS protocol) Data Node
  • 22. Andreas Neumann, Terence Yim BACKGROUND: HADOOP+HDFS YARN Resource Manager AM YARN Client Local file system Local file system . . . Name Node HDFS distributed file system Local file system AM.jar Node Mgr Node Mgr Data Node Data Node X
  • 23. Andreas Neumann, Terence Yim BACKGROUND: HADOOP+HDFS YARN Resource Manager AM YARN Client Local file system . . . Name Node HDFS distributed file system Local file system AM.jar 1) copy to HDFS AM.jar 2) copy to local AM.jar 3) Node Mgr Node Mgr Data Node Data Node
  • 24. Andreas Neumann, Terence Yim WRITING THE YARN CLIENT 1) Connect to the Resource Manager. 2) Request a new application ID. 3) Create a submission context and a container launch context. 4) Define the local resources for the AM. 5) Define the environment for the AM. 6) Define the command to run for the AM. 7) Define the resource limits for the AM. 8) Submit the request to start the app master.
  • 25. Andreas Neumann, Terence Yim WRITING THE YARN CLIENT Connect to the Resource Manager: YarnConfiguration yarnConf = new YarnConfiguration(conf); InetSocketAddress rmAddress = NetUtils.createSocketAddr(yarnConf.get( YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS)); LOG.info("Connecting to ResourceManager at " + rmAddress); configuration rmServerConf = new Configuration(conf); rmServerConf.setClass( YarnConfiguration.YARN_SECURITY_INFO, ClientRMSecurityInfo.class, SecurityInfo.class); ClientRMProtocol resourceManager = ((ClientRMProtocol) rpc.getProxy( ClientRMProtocol.class, rmAddress, appsManagerServerConf));
  • 26. Andreas Neumann, Terence Yim WRITING THE YARN CLIENT Request an application ID, create a submission context and a launch context: GetNewApplicationRequest request = Records.newRecord(GetNewApplicationRequest.class); GetNewApplicationResponse response = resourceManager.getNewApplication(request); LOG.info("Got new ApplicationId=" + response.getApplicationId()); ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class); appContext.setApplicationId(appId); appContext.setApplicationName(appName); ContainerLaunchContext amContainer =
  • 27. Andreas Neumann, Terence Yim WRITING THE YARN CLIENT Define the local resources: Map<String, LocalResource> localResources = Maps.newHashMap(); // assume the AM jar is here: Path jarPath; // <- known path to jar file // Create a resource with location, time stamp and file length LocalResource amJarRsrc = Records.newRecord(LocalResource.class); amJarRsrc.setType(LocalResourceType.FILE); amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath)); FileStatus jarStatus = fs.getFileStatus(jarPath); amJarRsrc.setTimestamp(jarStatus.getModificationTime()); amJarRsrc.setSize(jarStatus.getLen()); localResources.put("AppMaster.jar", amJarRsrc); amContainer.setLocalResources(localResources);
  • 28. Andreas Neumann, Terence Yim WRITING THE YARN CLIENT Define the environment: // Set up the environment needed for the launch context Map<String, String> env = new HashMap<String, String>(); // Setup the classpath needed. // Assuming our classes are available as local resources in the // working directory, we need to append "." to the path. String classPathEnv = "$CLASSPATH:./*:"; env.put("CLASSPATH", classPathEnv); // setup more environment env.put(...); amContainer.setEnvironment(env);
  • 29. Andreas Neumann, Terence Yim WRITING THE YARN CLIENT Define the command to run for the AM: // Construct the command to be executed on the launched container String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster" + " arg1 arg2 arg3" + " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" + " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"; List<String> commands = new ArrayList<String>(); commands.add(command); // Set the commands into the container spec amContainer.setCommands(commands);
  • 30. Andreas Neumann, Terence Yim WRITING THE YARN CLIENT Define the resource limits for the AM: // Define the resource requirements for the container. // For now, YARN only supports memory constraints. // If the process takes more memory, it is killed by the framework. Resource capability = Records.newRecord(Resource.class); capability.setMemory(amMemory); amContainer.setResource(capability); // Set the container launch content into the submission context appContext.setAMContainerSpec(amContainer);
  • 31. Andreas Neumann, Terence Yim WRITING THE YARN CLIENT Submit the request to start the app master: // Create the request to send to the Resource Manager SubmitApplicationRequest appRequest = Records.newRecord(SubmitApplicationRequest.class); appRequest.setApplicationSubmissionContext(appContext); // Submit the application to the ApplicationsManager resourceManager.submitApplication(appRequest);
  • 32. Andreas Neumann, Terence Yim YARN: APPLICATION MASTER YARN Resource Manager AM YARN Client Local file system Local file system . . . Name Node HDFS distributed file system Node Mgr Node Mgr Data Node Data Node
  • 33. Andreas Neumann, Terence Yim YARN - APPLICATION MASTER • Now we have an Application Master running in a container. • Next, the master starts the application’s tasks in containers • For each task, similar procedure as starting the master
  • 34. Andreas Neumann, Terence Yim YARN: APPLICATION MASTER YARN Resource Manager AM YARN Client Local file system Local file system . . . Name Node HDFS distributed file system Node Mgr Node Mgr Data Node Data Node Task TaskTask
  • 35. Andreas Neumann, Terence Yim YARN - APPLICATION MASTER 1) Connect with Resource Manager and register itself. 2) Loop: • Send request to Resource Manager, to: a) Send heartbeat b) Request containers c) Release containers • Receive Response from Resource Manager a) Receive containers b) Receive notification of containers that terminated • For each container received, send request to Node Manager to start task.
  • 36. Andreas Neumann, Terence Yim YARN - APPLICATION MASTER • Master should terminate after all containers have terminated. • If master crashes (or fails to send heart beat), all containers are killed. • In the future, YARN will support restart of the master. • Starting a task in a container • Very similar to YARN client starting the master. • Node manager starts and monitors task. • If task crashes, Node Manager informs Resource Manager • Resource Manager informs master in next response • Master must still release the container.
  • 37. Andreas Neumann, Terence Yim YARN - APPLICATION MASTER • Resource Manager assigns containers asynchronously: • Requested containers are returned at the earliest in the next call. • Master must send empty requests until all it receives all containers. • Subsequent requests are incremental and can ask for additional containers. • Master must keep track of what it has requested and received.
  • 38. Andreas Neumann, Terence Yim YARN - APPLICATION LOGS YARN Resource Manager AM Task TaskTask YARN Client Local file system . . . Name Node HDFS distributed file system 2) copy to HDFS logs 4) log access through Name Node logs 1) log4j, stdout, stderr 3) query RM for logs location Node Mgr Node Mgr Data Node Data Node
  • 39. Andreas Neumann, Terence Yim EXAMPLE - DISTRIBUTED SHELL • Example application in the YARN/Hadoop distribution • Runs a shell command in a number of containers • hadoop-2.0.0-alpha: 1681 lines of code • some simplification in 2.0.3-alpa: 1543 lines • Conclusion: YARN is complex!
  • 40. Andreas Neumann, Terence Yim YARN IS NOT SIMPLE!
  • 41. Andreas Neumann, Terence Yim YARN - SHORTCOMINGS • Complexity • Protocols are at very low level and very verbose. • Client must prepare all dependent jars on HDFS. • Client must set up environment, class path etc. • Logs are only collected after application terminates • What about long-running apps? • Applications don’t survive master crash: SPOF! • No built-in communication between containers and master. • Hard to debug
  • 42. Andreas Neumann, Terence Yim TYPICAL YARN APP • Many distributed applications are simple. • For example: Distributed load test for a queuing system: • “Run n producers and m consumers for k seconds” • Neither producer nor consumer know that they run in YARN • This would be easy if we run them as threads in a single JVM • Can we make YARN as simple as Java Threads?
  • 43. Andreas Neumann, Terence Yim TYPICAL NEEDS ... ... of a distributed application: • Monitor and restart containers • Collect logs and metrics • Elastic scale (dynamic adding/removing of instances) • Long-running jobs
  • 44. Andreas Neumann, Terence Yim WEAVE IT!
  • 45. Andreas Neumann, Terence Yim HERE COMES WEAVE • Open-Source, Apache License • Does not replace YARN, but complements it • Generic App Master + a DSL for specifying applications + an API for writing container tasks + built-in service discovery, logging, metrics, monitoring
  • 46. Andreas Neumann, Terence Yim SIMPLICITY 140. public class ApplicationMaster { 141. . 150. private AMRMClientAsync resourceManager; . 435. public boolean run() throws YarnRemoteException { 436. LOG.info(“Starting ApplicationMaster”); . 498. } . 500. public void finish() { 504. for(Thread launchThread: launchThreads) { . ... 511. } 517. FinalApplicationStatus appStatus; 518. String appMessage = null; 538. } . 790. public ContainerRequest setupContainerAskForRM(int numContainers) { . 800. Resource capability = Records.newRecord(Resource.class); 801. capability.setMemory(containerMemory); . 803. ContainerRequest request = new ContainerRequest(capability, ..); 807. } 808. } . . 1102. public class Client extends YarnClientImpl { . 1336. public boolean run() throws IOException { . 1354. GetNewApplicationResponse newApp = super.getNewApplication(); 1355. ApplicationId appId = newApp.getApplicationId(); . 1583. return monitorApplication(appId); 1585. } . 1594. private boolean monitorApplication(ApplicationId appId) throws ... { . 1596. while(true) { . 1606. ApplicationReport report = super.getApplicationReport(appId); . 1647. } 1648. } . 1657. private void forceKillApplication(ApplicationId appId) throws ... { . 1664. super.killApplication(appId); . 1665. } . 1667. } 001. public class DistributedShell extends AbstractWeaveRunnable { 002. static Logger LOG = LoggerFactory.getLogger(DistributedShell.class); 003. private String command; 004. 005. public DistributedShell(String command) { 006. this.command = command; 007. } 008. 009. @Override 010. public WeaveRunnableSpecification configure() { 011. return WeaveRunnableSpecification.builder().with() 012. .setName("Executor " + command) 013. .withArguments(ImmutableMap.of("cmd", this.command)) 014. .build(); 015. } 016. 017. @Override 018. public void initialize(WeaveContext context) { 019. this.command = context.getSpecification().getArguments("cmd"); 020. } 021. 022. @Override 023. public void run() { 024. try { 025. Process process = Runtime.getRuntime().exec(this.command); 026. InputStream stdin = proc.getInputStream(); 027. BufferedReader br = new BufferedReader(new InputStreamReader(stdin)); 028. 029. String line; 030. while( (line = br.readline()) != null) { 031. LOG.info(line); 032. } 033. } catch (Throwable t) { 034. LOG.error(t.getMessage()); 035. } 036. } 037. } 038. 039. public static main(String[] args) { 040. String zkStr = args[0]; 041. String command = args[1]; 042. yarnConfig = ... 043. WeaveRunnerService weaveService = new YarnWeaveRunnerService(yarnConfig, zkStr); 044. weaveService.startAndWait(); 045. WeaveController controller = weaveService.prepare(new DistributedShell(command)) 046. .addLogHandler(new PrinterLogHandler(new PrintWriter(System.out))) 047. .start(); 048. } Distributed Shell - Vanilla YARN Distributed Shell - WEAVE ~ 500 lines ~ 50 lines
  • 47. Andreas Neumann, Terence Yim WEAVE YARN Resource Manager Node Mgr Node Mgr Node MgrNode Mgr Task Weave Client ... AM+ Kafka Task Task Task Task Task Weave Runnable Weave Runner Weave Runnable log ZooKeeper ZooKeeper ZooKeeper state + messages log stream This is the only programming interface you need
  • 48. Andreas Neumann, Terence Yim RUNNABLE Define a task: public class EchoServer implements WeaveRunnable { static Logger LOG = ... private WeaveContext context; //.... @Override public void run() { ServerSocket serverSocket = new ServerSocket(0); context.announce("echo", serverSocket.getLocalPort()); while (isRunning()) { Socket socket = serverSocket.accept(); //... } } } WeaveRunnable = single task SLF4J can be used for logging - collected via Kafka - available to client WeaveRunnable can be run in - Thread - YARN container Service endpoint announcement
  • 49. Andreas Neumann, Terence Yim RUNNER Start a task: WeaveRunner runner = ...; WeaveController controller = runner.prepare(new EchoServer()) .addLogHandler(new PrinterLogHandler( new PrintWriter(System.out) ) ).start(); //... controller.sendCommand( Commands.create("stat", "incoming")); //... controller.stop(); WeaveRunner - packages and its dependencies - plus specified resources - attaches log collector to all containers WeaveController is used to manage the lifecycle of a runnable or application Controller also provides ability to send commands to running containers
  • 50. Andreas Neumann, Terence Yim APPLICATION Define an application: public class WebServer implements WeaveApplication { @Override public WeaveSpecification configure() { return WeaveSpecification.Builder.with() .setName(“Jetty WebServer”) .withRunnables() .add(new JettyWebServer()) .withLocalFiles() .add(“www-html”, new File(“/var/html”)) .apply() .add(new LogsCollector()) .anyOrder() .build(); } } WeaveController controller = runner.prepare(new WebServer()).start(); WeaveApplication = collection of tasks Application is given by a WeaveSpecification Application can specify additional local directories to be made available. Weave starts containers in no order. WeaveRunner can also start an application
  • 51. Andreas Neumann, Terence Yim APPLICATION Define an application with order: public class DataApplication implements WeaveApplication { @Override public WeaveSpecification configure() { return WeaveSpecification.Builder.with() .setName(“Cool Data Application”) .withRunnables() .add(“reader”, new InputReader()) .add(“processor”, new Processor()) .add(“writer”, new OutputWriter()) .order() .first(“reader”) .next(“processor”) .next(“writer”) .build(); } } starts containers in given order.
  • 52. Andreas Neumann, Terence Yim SUMMARY • Web apps generate data. • Data driven apps with Hadoop. • More value out of Hadoop with YARN. • Easier with Weave. • Now every Java developer can write distributed apps.
  • 53. Andreas Neumann, Terence Yim THANK YOU https://github.com/continuuity/weave http://hadoop.apache.org/docs/current/hadoop-yarn/ hadoop-yarn-site/YARN.html Yes, we are hiring: careers@continuuity.com