More HBase GC tuning

综合技术 2015-01-13



By Lars Hofhansl


My
article on


hbase-gc-tuning-observations



explores how to config
ure the garbage collector for HBase.






There is
actuall
y a bit more to it
, especially when
block encoding


is enabled for
a column family
and the predom
in
ant access is via the scan API with row cachin g.


















Block enc oding currently requires HBase to









material

ize
each KeyValue after deco
ding during scann
ing
,
and h
ence this
has the p
otential to produce a lot of ga
r
bage
for each scan RPC, es
pec
ially when the scan re
spo
nse is lar
ge as might be the case when
scanner caching is set
to la
rger value
(see
Scan.getCaching ()

)
























My e
xperiment
s show that i
n that case it is better to run with a larger young gen of 512


M B

(-Xmn512m)




and -
c
rucially - make sure that all per RPC g
arbage across all
handlers actively performing scans
fits into the s

urv

i

vor space
.








(Not
e that this statem
ent is true whether or not block enc
oding is used
. B
lock encoding just
produces
a lot more gar
ba ge).







HBase actually has a way to
limit the siz
e
o
f
an in
dividual scan re
sponse by setting













hbase.client.scanner.max.result.size





.


Quick re cap:



The H
otspot JVM
d
ivid
es th
e hea
p into
PermGen
, Tenu
re
d Gen, and the Youn
g Gen











. YoungGen i
tsel
f is d
ivide
d into E
d
en and two survi vor spaces.


















By defa
ult the su
rvivor ratio is 8 (i.e.
each su
r
v
i
vor space is 1/8 of each, and together the
ir size is the configured young gen size)












What to do ?



With -Xm x

512m
this comes t
o ~
51
M B

for each of the two s
u
r
vivor space s.








Now you want to set
hbase.client.scanner.max.result.size
suc
h that the expected number of a handler threads times











th e max.result.size
is
less than
ea ch of the

su
rv
i
vor spaces.










With 30 handlers (default in HBase as of 0.98) this comes to 1.7MB, since not all handlers will always scan using the full buffer 2MB is probably a good setting.










Make
s sense
, doesn't
?
If per scan results across all
active hand
lers cannot
fit
int
o the surv
ivor space the
collector has no choice b
ut to promote to the tenu
red
g
enerat
ion.
That is exactly the sc
enario
one would
l ike to avoid

as we would slow
ly polut
e the tenu
red
gen with per PRC garbage
, eventually
requir
ing a full GC to defragment .


































TL;DR:



When using block encoding make sure


#
handlers * max.results.size < survivor space



, and use a
slightly larger young generat ion:









-Xmx512m b


(in hbase-e nv.sh)











hbase.client.scanner.max.result.size


= 2097152


(in hbase-size.xml)





责编内容by:HBase (源链)。感谢您的支持!

您可能感兴趣的

当云HBase2.0被赋能了search 概述 云HBase2.0也就是我们即将要上线的ApsaraDB for HBase2.0。它不仅兼容开源HBase2.0,也承载着阿里多年大规模HBa...
FunData — 电竞大数据系统架构演进 背景来源: FunData作为电竞数据平台,v1.0 beta版本主要提供由Valve公司出品的顶级MOBA类游戏DOTA2相关数据接口(...
Opentsdb详解 Opentsdb安装 因为Opentsdb的数据是存储在Hbase中,所以安装Opentsdb之前需要安装Hbase。 1: 直接从 github...
理解索引(上) 最近有个需求,要修改现有存储结构,涉及查询条件和查询效率的考量,看了几篇索引和HBase相关的文章,回忆了相关知识,结合项目需求,说说自己的理解和总结。 ...
HBase Operations: Read and Write Operations 1. HBase Operations Today, in this HBase article “HBase Operations: ...