Jason4Zhu: Memory Configuration In Hadoop

Tuesday, October 21, 2014

Memory Configuration In Hadoop

In this post, there are some recommendations on how to configure YARN and MapReduce memory allocation settings based on the node hardware specifications.

YARN takes into account all of the available compute resources on each machine in the cluster. Based on the available resources, YARN negotiates resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.).

In a Hadoop cluster, it is vital to balance the usage of memory (RAM), processors (CPU cores) and disks so that processing is not constrained by any one of these cluster resources. As a general recommendation, allowing for two Containers per disk and per core gives the best balance for cluster utilization.

When determining the appropriate YARN and MapReduce memory configurations for a cluster node, start with the available hardware resources. Specifically, note the following values on each node:

RAM (Amount of memory)
CORES (Number of CPU cores)
DISKS (Number of disks)

The total available RAM for YARN and MapReduce should take into account the Reserved Memory. Reserved Memory is the RAM needed by system processes and other Hadoop processes (such as HBase).

Reserved Memory = Reserved for stack memory + Reserved for HBase Memory (If HBase is on the same node)

Use the following table to determine the Reserved Memory per node.

Reserved Memory Recommendations

Total Memory per Node	Recommended Reserved System Memory	Recommended Reserved HBase Memory
4 GB	1 GB	1 GB
8 GB	2 GB	1 GB
16 GB	2 GB	2 GB
24 GB	4 GB	4 GB
48 GB	6 GB	8 GB
64 GB	8 GB	8 GB
72 GB	8 GB	8 GB
96 GB	12 GB	16 GB
128 GB	24 GB	24 GB
256 GB	32 GB	32 GB
512 GB	64 GB	64 GB

The next calculation is to determine the maximum number of containers allowed per node. The following formula can be used:

# of containers = min (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)

Where MIN_CONTAINER_SIZE is the minimum container size (in RAM). This value is dependent on the amount of RAM available -- in smaller memory nodes, the minimum container size should also be smaller. The following table outlines the recommended values:

Total RAM per Node	Recommended Minimum Container Size
Less than 4 GB	256 MB
Between 4 GB and 8 GB	512 MB
Between 8 GB and 24 GB	1024 MB
Above 24 GB	2048 MB

The final calculation is to determine the amount of RAM per container:

RAM-per-container = max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))

With these calculations, the YARN and MapReduce configurations can be set:


Configuration File Configuration Setting Value Calculation
yarn-site.xml yarn.nodemanager.resource.memory-mb = containers * RAM-per-container
yarn-site.xml yarn.scheduler.minimum-allocation-mb = RAM-per-container
yarn-site.xml yarn.scheduler.maximum-allocation-mb = containers * RAM-per-container
mapred-site.xml mapreduce.map.memory.mb = RAM-per-container
mapred-site.xml         mapreduce.reduce.memory.mb = 2 * RAM-per-container
mapred-site.xml mapreduce.map.java.opts = 0.8 * RAM-per-container
mapred-site.xml mapreduce.reduce.java.opts = 0.8 * 2 * RAM-per-container
yarn-site.xml (check) yarn.app.mapreduce.am.resource.mb = 2 * RAM-per-container
yarn-site.xml (check) yarn.app.mapreduce.am.command-opts = 0.8 * 2 * RAM-per-container

Note: After installation, both yarn-site.xml and mapred-site.xml are located in the /etc/hadoop/conf folder.

Examples

Cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks.

Reserved Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase

Min container size = 2 GB

If there is no HBase:

# of containers = min (2*12, 1.8* 12, (48-6)/2) = min (24, 21.6, 21) = 21

RAM-per-container = max (2, (48-6)/21) = max (2, 2) = 2

Configuration	Value Calculation
yarn.nodemanager.resource.memory-mb	= 21 * 2 = 42*1024 MB
yarn.scheduler.minimum-allocation-mb	= 2*1024 MB
yarn.scheduler.maximum-allocation-mb	= 21 * 2 = 42*1024 MB
mapreduce.map.memory.mb	= 2*1024 MB
mapreduce.reduce.memory.mb	= 2 * 2 = 4*1024 MB
mapreduce.map.java.opts	= 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts	= 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb	= 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts	= 0.8 * 2 * 2 = 3.2*1024 MB

If HBase is included:

# of containers = min (2*12, 1.8* 12, (48-6-8)/2) = min (24, 21.6, 17) = 17

RAM-per-container = max (2, (48-6-8)/17) = max (2, 2) = 2

Configuration	Value Calculation
yarn.nodemanager.resource.memory-mb	= 17 * 2 = 34*1024 MB
yarn.scheduler.minimum-allocation-mb	= 2*1024 MB
yarn.scheduler.maximum-allocation-mb	= 17 * 2 = 34*1024 MB
mapreduce.map.memory.mb	= 2*1024 MB
mapreduce.reduce.memory.mb	= 2 * 2 = 4*1024 MB
mapreduce.map.java.opts	= 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts	= 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb	= 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts	= 0.8 * 2 * 2 = 3.2*1024 MB

Linked from http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

Relative Posts:
· VCore Configuration In Hadoop

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved
If transfering, please annotate the origin: Jason4Zhu

21 comments:

UnknownApril 14, 2016 at 6:28 AM
hi, this post is great, but I have some questions.
1) why set the parameter 0.8 for [map|reduce].java.opts? any references?
2) yarn.app.mapreduce.am.[resource.mb|command-opts] seems like in mapred-site.xml, what's the deference if I put them to yarn-site.xml?

thanks^_^
ReplyDelete
Replies
amarJune 4, 2018 at 3:10 AM
nice blog
ReplyDelete
Replies
harikasri.blogspot.comOctober 19, 2018 at 2:02 AM
This comment has been removed by the author.
ReplyDelete
Replies
kayalDecember 7, 2018 at 12:13 AM
Nice Post

best training institute for hadoop in Bangalore

best big data hadoop training in Bangalroe

hadoop training in bangalore

hadoop training institutes in bangalore

hadoop course in bangalore
ReplyDelete
Replies
UnknownOctober 15, 2019 at 2:59 AM
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Big data online training
ReplyDelete
Replies
Satyam KumarNovember 19, 2019 at 7:20 AM
This comment has been removed by the author.
ReplyDelete
Replies
easylearnNovember 26, 2019 at 8:21 PM
Thanks for sharing,the entire post absolutely rocks.Very clear and understandable content.I enjoyed reading your post and it helped me a lot.Keep posting.I heard about an AWS training in Bangalore and pune where they provide you certification course as well as placement assistant.If you are looking for any such courses please visit the site
AWS Training Institute in Pune
ReplyDelete
Replies
PadminiprwatechJanuary 13, 2020 at 11:29 PM
Thanks for sharing your innovative ideas to our vision. I have read your blog and I gathered some new information through your blog. Your blog is really very informative and unique. Keep posting like this. Awaiting for your further update.If you are looking for any Big Data related information, please visit our website Big Data training institute in Bangalore.
ReplyDelete
Replies
veeraMay 16, 2020 at 4:44 AM
Thanks for sharing such an awesome and useful blog post.Keep sharing your knowledge with us...
big data and hadoop online training
ReplyDelete
Replies
buy damaged carsNovember 6, 2020 at 10:54 PM
This comment has been removed by the author.
ReplyDelete
Replies
AddonsMarch 10, 2021 at 9:10 PM
Xmedia Solution
Xmedia Solution
Xmedia Solution
Xmedia Solution
ReplyDelete
Replies
IamLinkfeederApril 17, 2021 at 2:18 AM
David Forbes is president of Alliance Marketing Associates IncIamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder
ReplyDelete
Replies
KITS TechnologiesMay 4, 2021 at 11:41 PM
best post.
mulesoft training
linux training
ReplyDelete
Replies
KITS TechnologiesMay 18, 2021 at 1:45 AM
nice post.
hadoop training
mulesoft training
linux training
mulesoft training
ReplyDelete
Replies
Krishna Belts Pvt LtdOctober 12, 2021 at 2:38 AM
It’s really great information for becoming a better Blogger. Keep sharing, Thanks. For more details to visit belt joining kits
ReplyDelete
Replies
Hyder Ak47October 27, 2021 at 2:49 AM
Keep sharing, Thanks. For more details to visit
Mobile Prices Bangladesh
ReplyDelete
Replies
R ADKNovember 18, 2021 at 2:32 AM
human capital management consultant
ReplyDelete
Replies
superpowerbacklinksDecember 1, 2021 at 4:26 AM
great blog thanks for information
Rudraksha Mala
Lamp oil Wholesale
ReplyDelete
Replies
Brian JoeMarch 10, 2022 at 1:29 AM
Gamsat organic chemistry
CBSE organic chemistry
IIT organic chemistry
Organic Chemistry Notes
ReplyDelete
Replies
AnonymousMay 25, 2022 at 3:42 PM
smm panel
smm panel
iş ilanları
İNSTAGRAM TAKİPÇİ SATIN AL
hırdavat
beyazesyateknikservisi.com.tr
servis
JETON HİLESİ
ReplyDelete
Replies
Kary ChristAugust 29, 2022 at 5:16 AM
oxygen machine rental
oxygen machine
ReplyDelete
Replies

Configuration File	Configuration Setting	Value Calculation
yarn-site.xml	yarn.nodemanager.resource.memory-mb	= containers * RAM-per-container
yarn-site.xml	yarn.scheduler.minimum-allocation-mb	= RAM-per-container
yarn-site.xml	yarn.scheduler.maximum-allocation-mb	= containers * RAM-per-container
mapred-site.xml	mapreduce.map.memory.mb	= RAM-per-container
mapred-site.xml	mapreduce.reduce.memory.mb	= 2 * RAM-per-container
mapred-site.xml	mapreduce.map.java.opts	= 0.8 * RAM-per-container
mapred-site.xml	mapreduce.reduce.java.opts	= 0.8 * 2 * RAM-per-container
yarn-site.xml (check)	yarn.app.mapreduce.am.resource.mb	= 2 * RAM-per-container
yarn-site.xml (check)	yarn.app.mapreduce.am.command-opts	= 0.8 * 2 * RAM-per-container