In the previous post, I have explained about the working architecture of Hadoop along its components like name node, data node etc. In this post, I will be explaining how to capacity plan hadoop cluster in terms of data node. Name node capacity plan will cover in the upcoming posts. I am assuming that the deployment will start from 2013 and data and data node capacity projections will be given till 2017. Below are the assumptions which have been considered while capacity planning hadoop cluster:-
As per the above listed assumptions, starting from 1TB of dailiy data from 2013. for capacity building assuming 5% data growth per month starting from 2014 onwards. In 2013 we have 1080TB of data and by the end of 2017 we have 8711Tb of data. Almost 8 times from the starting year. Below is the table which tells how many UCS Servers(Data Nodes) will be required to handle 8711TB of data.
Graphical View of the projections:-