Elasticsearch索引优化

分片参数配置方案

查询硬件配置信息：GET /_nodes/stats

节点数：

"_nodes" : {

"total" : 40,

"successful" : 40,

"failed" : 0

}

节点内存：

"mem" : {

"total_in_bytes" : 66899951616,

"free_in_bytes" : 1588084736,

"used_in_bytes" : 65311866880,

"free_percent" : 2,

"used_percent" : 98

jvm堆内存：

jvm" : {

"timestamp" : 1744340357005,

"uptime_in_millis" : 15410642521,

"mem" : {

"heap_used_in_bytes" : 4387023328,

"heap_used_percent" : 13,

"heap_committed_in_bytes" : 32212254720,

"heap_max_in_bytes" : 32212254720,

"non_heap_used_in_bytes" : 162215208,

"non_heap_committed_in_bytes" : 214827008

}

节点硬盘：

"fs" : {

"timestamp" : 1744340357005,

"total" : {

"total_in_bytes" : 540991217664,

"free_in_bytes" : 467969482752,

"available_in_bytes" : 440465022976

}

"data" : [

{

"path" : "/premium_ssd0/960072765296218116/data/nodes/0",

"mount" : "/premium_ssd0 (/dev/vdb1)",

"type" : "ext4",

"total_in_bytes" : 540991217664,

"free_in_bytes" : 467969482752,

"available_in_bytes" : 440465022976

}

得出的硬件配置为：40 节点，每个节点64GB 内存、500GB SSD、jvm堆32G

1. ‌单节点最大分片数计算‌

公式：节点可用存储容量 / 单个分片目标大小*0.7（如 20-50GB，ssd：50-100GB）,需留出 20%-30% 的冗余空间）

可用存储容量‌：SSD 500GB × (1 - 30%) = ‌350GB‌（保留 30% 冗余空间）‌

‌单个分片目标大小‌：取中间值 ‌30GB‌（20-50GB 范围）‌

单节点推荐分片数‌：

350GB / 30GB ≈ 12 个分片/节点

考虑 JVM 堆内存限制（64GB 内存的 50% 为 32GB，单个分片占用约 4GB 堆内存）

场景	分片大小建议	单分片堆内存估算
‌日志类（低查询负载）‌	≤50GB	3-5GB
‌搜索类（高查询负载）‌	≤30GB	2-4GB
‌混合型（均衡负载）‌	30-40GB	2.5-4GB

32GB / 4GB = 8 个分片/节点

最终取较小值：8 分片/节点‌

2. ‌主分片数计算

主分片数计算‌：

公式：主分片数 = 总数据量预估峰值 / 单个分片目标大小 × 1.2（预留20%增长空间）

假设总数据量预估峰值为 ‌100GB‌：

主分片数 = 100GB / 30GB × 1.2 = 4 主分片

总分片数上限：建议总分片数≤节点数x50

节点数 × 50 = 40 × 50 = 2000 分片

需确保总分片数 ≤ 2000‌

总分片数‌（含 1 副本）：

公式：总分片数 = 主分片数 × (副分片数 + 1)

4 主分片 × (1 + 1) = 8 分片

3、副本数计算

副本数越多，查询请求可分散到更多分片副本上，提升并行处理能力；但副本过多会导致存储冗余和资源浪费。

最优平衡点公式：推荐副本数=min⁡(节点数−1,（总节点数/主分片数)−1)

副本数=min(39,9)=9，但需结合硬件资源限制，实际建议 ‌3-5‌

因素	分析
‌资源利用率‌	单节点分片数：总数据量 100GB，主分片 4，副本数设为 ‌5‌ 时，总分片数 = 4×(1+5)=24，单节点分片数 24/40≈0.6，资源占用极低（堆内存、磁盘 IO 均无压力）‌
‌查询并行度‌	每个主分片的副本分布在多个节点上，查询时可同时从 ‌5+1=6‌ 个分片（主+副本）拉取数据，并行度提升 6 倍‌
‌容错与恢复速度‌	副本数 5 时，允许最多 5 个节点同时故障，且数据恢复速度更快（更多副本参与恢复）‌

3. ‌关键参数配置‌

index.auto_expand_replicas: 0-5 //自动扩充副本数量

number_of_shards: 8 //主分片数量

number_of_replicas: 5 //副本数量

max_merged_segment: 30gb //最大段大小

total_shards_per_node: 8 //单节点最大分片数，若为1检索最快，适用于低并发，数据量<=30GB场景

设置参数命令:

PUT /mathgpt_question_perfect_search_5/_settings

{

"index.merge.policy": {

"segments_per_tier": 15,

"max_merged_segment": "30gb",

"floor_segment": "100mb"

"index.auto_expand_replicas": "0-5",

"number_of_replicas": 1,

"index.routing.allocation.total_shards_per_node": 8

}

合并索引，低峰期操作

POST /mathgpt_question_perfect_search_5/_forcemerge?max_num_segments=1&flush=true

创建索引时才能指定的参数：

number_of_shards: 4 //主分区数量

routing_partition_size: 4 //此参数影响分片数据分布，默认为number_of_shards

经验值为主分片数的 1/3 至 1/2（建议5-10）

routing_partition_size = max(2, min(主分片数, 分片组期望并行度))

目标分片 = (hash(routing) + hash(_id) % routing_partition_size) % 总分片数

4、避免单节点负载过重‌，分片分布验证‌：

执行命令查看分片分布均衡性

GET _cat/shards/mathgpt_question_perfect_search_5?v&h=index,shard,node,prirep

优化分片分布，避免热点节点‌：

PUT /_cluster/settings

{

"persistent": {

"cluster.routing.allocation.balance.shard": 0.5,

"cluster.routing.allocation.balance.index": 0.5

}

查看每个节点的分片数

GET /_cat/allocation?v&h=node,shards,disk.used,disk.ratio

5、后续监控与调优‌：

通过 GET /_nodes/stats/indices 监控分片写入延迟和拒绝率‌

若磁盘使用率超过 70%，需扩容或清理旧数据‌

堆内存占用需 ≤32GB，避免 JVM 性能下降‌

数据增长扩容（_split）：通过预定义 number_of_routing_shards 支持多次拆分‌

POST /mathgpt_question_perfect_search_5/_split/mathgpt_question_perfect_search_5_new

{

"settings": {

"index.number_of_shards": 20,

"index.blocks.write": null // 拆分后自动恢复写入

}

数据归档后缩容（_shrink）：降低分片数量以优化集群性能‌

POST /mathgpt_question_perfect_search_5/_shrink/mathgpt_question_perfect_search_5_new

{

"settings": {

"index.number_of_shards": 2,

"index.number_of_replicas": 1

}

全量同步数据时，优化写入性能：

阶段1：创建索引（写入优化配置）

PUT /mathgpt_question_perfect_search_5

{

"settings": {

"index": {

"number_of_shards": 10,

"number_of_replicas": 0,

"refresh_interval": "-1",

"translog": {

"durability": "async",

"sync_interval": "120s",

"flush_threshold_size": "2gb"

}

阶段2：用bulk批量写入数据

阶段3：切换为检索模式

PUT /mathgpt_question_perfect_search_5/_settings

{

"index": {

"number_of_replicas": 5,

"refresh_interval": "1s"

}

执行段合并和缓存预热

POST /mathgpt_question_perfect_search_5/_forcemerge

POST /mathgpt_question_perfect_search_5/_cache/clear?fielddata=true

2025-04-11 15:06:35

返回我要评论

sqlite数据库安装与使用

webx博客

Elasticsearch索引优化

共有0条评论！

发表评论

标签

热评文章

最新评论