Hadoop Cluster Automation with Ansible!!

What is Ansible❓❓

Ansible is an open-source automation tool, or platform by RED HAT used for IT tasks such as configuration management, application deployment, intraservice orchestration, and provisioning.It aims to provide large productivity gains to a wide variety of automation challenges. This tool is very simple to use yet powerful enough to automate complex multi-tier IT application environments!!

What is Hadoop❓❓

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.


The NameNode is the centerpiece of an HDFS file system in Hadoop. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself.The NameNode is a Single Point of Failure for the HDFS Cluster


DataNodes store data in a Hadoop cluster and is the name of the daemon that manages the data. File data is replicated on multiple DataNodes for reliability and so that localized computation can be executed near the data. Within a cluster, DataNodes should be uniform


Client in Hadoop refers to the Interface used to communicate with the Hadoop Filesystem. There are different type of Clients available with Hadoop to perform different tasks. The basic filesystem client hdfs dfs is used to connect to a Hadoop Filesystem and perform basic file related tasks.

How Ansible Works❓❓

Ansible works by connecting to your nodes and pushing out small programs, called “Ansible modules” to them. … Ansible then executes these modules , and removes them when finished. Library of modules can reside on any machine, and there are no servers, daemons, or databases required.Ansible has its playbook concept to carry out multiple management related tasks.An Ansible playbook contains one or multiple plays, each of which define the work to be done for a configuration on a managed server. Ansible playbooks are written in YAML.

💢Task Completion💢

⏩ Here ,at first there is no any installation of Hadoop Software on namenode ,datanode and client :

Run the NameNode Playbook by command:

ansible-playbook <name.yml>

Now,check the master service is started or not at NameNode by Command:


Run the datanode playbook:

⏩ Check admin report at DataNode by Command:

hadoop dfsadmin -report

Run the Client Playbook:

Check file report at Client by command:

hadoop fs -ls /

Browse the WebUI of Hadoop Cluster :


Whole Hadoop Cluster Set up is done Using Ansible Automation Successfully✨✨

🔰Code Link:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store