Saturday, March 09, 2013

Brief notes on Map Reduce 2 (YARN)

 Modules:
YARN stands for “Yet-Another-Resource-Negotiator”. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker: resource management and job scheduling/monitoring, into separate daemons:
- A ResourceManager (RM) that manages the global assignment of compute resources to applications.
- A per-application ApplicationMaster (AM) that manages the application’s life cycle.
It is a more isolated and scalable model than the MR1 system.

The ResourceManager has two main components:
- Scheduler
Responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc

- ApplicationsManager (AsM)
Responsible for accepting job-submissions, negotiating the first container for executing the application specific AM and provides the service for restarting the AM container on failure.

NodeManager (NM): per-node
It's per-node slave of the ResourceManager, responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler. The design also allows plugging long-running application-specific auxiliary services to the NMs during startup. Shuffle is a typical auxiliary service loaded by the NMs.

ApplicationMaster (AM): per-application(job)
Responsible for negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress. Each AM manages the application’s individual tasks, and each task runs within a Container on each node

Resource Allocation Process:
Resources are requested in the form of containers, where each container has a number of non-static attributes.

Client - RM
Client submit request to RM/AsM, upon response from RM, sent application submission context.
Client also asks for and gets response of application report from RM/AsM.

RM - AM
AM registers itself to RM, RM gives it resource stat of the cluster.
AM sends resource allocation request to RM. RM/scheduler will give a list of containers.

AM - NM
AM sends container start request to NM.

0 Comments:

Post a Comment

<< Home