오픈인프라데이
Ceph issue 사례
2019.07.11
오픈소스컨설팅 이 영주
( yjlee@osci.kr )
Contents
01. 구성도
02. Issue 발생
03. 해결 과정
구성도
01
01. 구성도
●
전체 구성도
Controller Node
Compute NodeStorage Node
Deploy
FireWall
Router
5/26
01. 구성도
●
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
6/26
01. 구성도
●
Ceph OBJ 흐름
PG: Placement Group
Object를 저장하기 위한 OSD의 group.
복제본 수에 맞춰 member의 수가 달라짐.
OSD: Object Storage Daemon
object를 최종 저장하는 곳
Monitor: ceph OSD의 
변화를  monitoring
하여  crush map을 
만드는 주체 주체.
[root@ceph-osd01 ~]# rados ls -p vms
rbd_data.1735e637a64d5.0000000000000000
rbd_header.1735e637a64d5
rbd_directory
rbd_children
rbd_info
rbd_data.1735e637a64d5.0000000000000003
rbd_data.1735e637a64d5.0000000000000002
rbd_id.893f4f3d-f6d9-4521-997c-72caa861ac24_disk
rbd_data.1735e637a64d5.0000000000000001
rbd_object_map.1735e637a64d5
[root@ceph-osd01 ~]#
OBJ의  기본 크기는 크기는 주체4MBMB
CRUSH: Controlled Replication Under
Scalable Hashing
Object를 분산 저장하기위한 알고리즘.
Issue 발생
02
02. Issue 발생
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
OSD 중 1개가 90%가 되어
Read/Write가 안되는 주체 상태
[root@osc-ceph01 ~]# ceph pg dump |grep -i full_ratio
dumped all in format plain
full_ratio 0.9
nearfull_ratio 0.8
[root@osc-ceph01 ~]# ceph daemon mon.`hostname` config show |grep -i osd_full_ratio
"mon_osd_full_ratio": "0.9",
[root@osc-ceph01 ~]#
02. Issue 발생
- Ceph community Trouble shooting guide
참조  : http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space
02. Issue 발생
Ceph 구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
osd.8의 pg 1.11f를 삭제
02. Issue 발생
[root@ceph-mon02 ~]# ceph -s
cluster f5078395-0236-47fd-ad02-8a6daadc7475
health HEALTH_ERR
1 pgs are stuck inactive for more than 300 seconds
162 pgs backfill_wait
37 pgs backfilling
322 pgs degraded
1 pgs down
2 pgs peering
4 pgs recovering
119 pgs recovery_wait
1 pgs stuck inactive
322 pgs stuck unclean
199 pgs undersized
recovery 592647/43243812 objects degraded (1.370%)
recovery 488046/43243812 objects misplaced (1.129%)
1 mons down, quorum 0,2 ceph-mon01,ceph-mon03
monmap e1: 3 mons at {ceph-mon01=10.10.50.201:6789/0,ceph-mon02=10.10.50.202:6789/0,ceph-mon03=10.10.50.203:6789/0}
election epoch 480, quorum 0,2 ceph-mon01,ceph-mon03
osdmap e27606: 128 osds: 125 up, 125 in; 198 remapped pgs
flags sortbitwise
pgmap v58287759: 10240 pgs, 4 pools, 54316 GB data, 14076 kobjects
157 TB used, 71440 GB / 227 TB avail
592647/43243812 objects degraded (1.370%)
488046/43243812 objects misplaced (1.129%)
9916 active+clean
162 active+undersized+degraded+remapped+wait_backfill
119 active+recovery_wait+degraded
37 active+undersized+degraded+remapped+backfilling
4 active+recovering+degraded
1 down+peering
1 peering
300초 넘게 통신이 안되는 pg가 1개 (1.11f) ...
osd가 down되어 backfill을 기다리고 있는 pg가 162개
pglog의 범위를 벗어나
backfill을 진행 하고 있는 pg가 37개
3copy를 채우지 못해 성능이 떨어진 pg가 322개
문제의 down된 pg 1개... (1.11f) )
상태를 결정중인 pg 2개
(recovery, backfill)
recovery를 기다리고 있는 pg 119개
pglog를 보고 복구중인 pg 4개
(해당 pg I/O block됨)
up상태의 osd가 없어서 inactive 된 pg 1개
(1.11f)
3벌 복제에 못미치는 pg가 322개
pool의 복제본 수에 못미치는 pg가 199개
Monitor 1개 죽음
pg 1.11f를 갖고 있는
OSD 3개 죽음
02. Issue 발생
구성도
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
ceph-mon1 ceph-mon2 ceph-mon3
Images Pool
Openstack image들이
들어가 있음.
Volumes Pool
pg 1.11f는 모든 openstack
volume들의 정보를
조금씩 갖고 있음.
pg 1개가 down되면 해당 pool의
모든 data들을 쓸 수가 없다.
[root@osc-ceph01 ~]# ceph pg dump |head
dumped all in format plain
...
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
1.11f 0 0 0 0 0 0 3080 3080 active+clean 2019-07-10 08:12:46.623592 921'8580 10763:10591 [8,4,7] 8 [8,4,7] 8
921'8580 2019-07-10 08:12:46.623572 921'8580 2019-07-07 19:44:32.652377
...
Primary pg가
모든 I/O를 책임진다.
해결 과정
03
03. 해결 과정
writeout_from: 30174'649621, trimmed:
-1> 2018-10-24 15:28:44.487997 7fb622e2d700 5 write_log with: dirty_to: 0'0,
dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0,
writeout_from: 30174'593316, trimmed:
0> 2018-10-24 15:28:44.502006 7fb61de23700 -1 osd/SnapMapper.cc: In function
'void SnapMapper::add_oid(const hobject_t&,
const std::set<snapid_t>&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
thread 7fb61de23700 time 2018-10-24 15:28:44.497739
osd/SnapMapper.cc: 228: FAILED assert(r == -2)
분석 결과...
3벌 복제된 pg간 충돌이나서 해당 pg를
갖고 있는 osd가 down된다.
이것은 redhat ceph 3.1(luminous)에서 fix되었으니
upgrade를 해라!!
그러나...
- Redhat Openstack 9(Mitaka)는 Redhat ceph 3.1을 지원 안함.
- Redhat ceph 3.1로 upgrade하기전에 openstack을
10(Newton)까지 upgrade 필요.
- Redhat openstack 9는 TripleO로 되어져 있음.
(Upgrade process가 굉장히 복잡함...)
- Redhat ceph upgrade 시 Error상태에서 해야 함.
- 렁나ㅣ러아니ㅓㄹ아ㅣㄴ;ㅓㄹ아ㅣ;ㄴ며랴ㅓ야냋
03. 해결 과정
Openstack upgrade
- 실패...
- 재설치 후 모든  후 모든  모든  vm 복구 
Ceph 3.1 upgrade
- ceph ansible을  사용하지 않고  않고  manualy upgrade 함.
03. 해결 과정
ceph-osd1
...
ceph-osd2
...
ceph-osd3
...
...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8
vms Pool
Nova에 의해 생성되는
vm image를 저장
12345_disk 기존 vm rbd
67890_disk 신규 vm rbd
신규 VM1
ID=67890
기존 VM1
ID=12345
복구 과정
- 신규 vm생성 (ID 67890)
- vms pool에 있는 rbd 67890_disk삭제
- 12345_disk를 67890_disk로 이름변경
- 이걸 모든vm에 적용...
[root@ceph01 ~]# rbd list -p vms
12345_disk
67890_disk
[root@ceph01 ~]# rbd rm -p vms 67890_disk
Removing image: 100% complete...done.
[root@ceph01 ~]#
[root@ceph01 ~]# rbd mv -p vms 12345_disk 67890_disk
[root@ceph01 ~]# rbd ls -p vms
67890_disk
03. 해결 과정
Redhat Ceph 3.1 upgrade 후 ...
- 비슷한 문제 발생
- pg 1.11f 를 갖고 있는 osd들이 up down을 반복 함.
[root@ceph-mon01 osc]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
noscrub,nodeep-scrub flag(s) set
5 scrub errors
Possible data damage: 1 pg inconsistent
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
flags noscrub,nodeep-scrub
data:
pools: 4 pools, 10240 pgs
objects: 12200k objects, 46865 GB
usage: 137 TB used, 97628 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+inconsistent
io:
client: 0 B/s rd, 1232 kB/s wr, 19 op/s rd, 59 op/s wr
[root@ceph-mon01 osc]# ceph health detail
HEALTH_ERR noscrub,nodeep-scrub flag(s) set; 5 scrub errors; Possible data
damage: 1 pg inconsistent
OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
OSD_SCRUB_ERRORS 5 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 1.11f is active+clean+inconsistent, acting [113,105,10]
[root@ceph-mon01 osc]#
OTL...
03. 해결 과정
하지만 문제되는 Object를 특정지을 수 있었음.
[root@ceph-mon01 ~]# rados list-inconsistent-obj 1.11f --format=json-pretty
{
"epoch": 34376,
"inconsistents": [
{
"object": {
"name": "rbd_data.39edab651c7b53.0000000000003600",
"nspace": "",
"locator": "",
03. 해결 과정
Object rbd_data.39edab651c7b53.0000000000003600는 고객 DB Service vm의 root filesystem volume이었음.
다행이도 DB data에는 문제가 없었고...
문제가 된 DB vm의 root filesystem을 담고 있는 RBD image를 삭제 함. 하지만 여전히 상태는 HEALTH_ERR ...
[root@ceph-mon01 osc]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
4 scrub errors
Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
data:
pools: 4 pools, 10240 pgs
objects: 12166k objects, 46731 GB
usage: 136 TB used, 98038 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+inconsistent+snaptrim_error
io:
client: 0 B/s rd, 351 kB/s wr, 15 op/s rd, 51 op/s wr
[root@ceph-mon01 osc]# ceph health detail
HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent, 1 pg
snaptrim_error
OSD_SCRUB_ERRORS 4 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error
pg 1.11f is active+clean+inconsistent+snaptrim_error, acting [113,105,10]
[root@ceph-mon01 osc]#
03. 해결 과정
- 문제되는 object의 snapshot id 54(0x36)이 문제가 되어서 error가 발생중임.
- ??? 이미 지웠는데??
2018-11-16 18:45:00.163319 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 10: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
2018-11-16 18:45:00.163330 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 105: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
2018-11-16 18:45:00.163333 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 113: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36
data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36]
dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0])
$ printf "%dn" 0x36
54
[root@ceph-osd08 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-113 --pgid 1.11f --op list | grep 39edab651c7b53
Error getting attr on : 1.11f_head,#-3:f8800000:::scrub_1.11f:head#, (61) No data available
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":54,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":63,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":-2,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
03. 해결 과정
- 문제되는 object를 갖고 있는 rbd image를 찾아보자!
[root@ceph-mon01 osc]# rbd info volume-13076ffc-6520-4db8-b238-1ba6108bfe52 -p volumes
rbd image 'volume-13076ffc-6520-4db8-b238-1ba6108bfe52':
size 53248 MB in 13312 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.62cb510d494de
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
[root@ceph-mon01 osc]#
[root@ceph-mon01 osc]# cat rbd-info.sh
#!/bin/bash
for i in `rbd list -p volumes`
do
rbd info volumes/$i |grep rbd_data.39edab651c7b53
echo --- $i done ----
done
[root@ceph-mon01 osc]# bash rbd-info.sh
rbd info에서 object의
prefix를 볼 수 있다.
모든 rbd image에서
문제되는 object를
찾는 script
[root@ceph-mon01 osc]# bash rbd-info.sh
--- rbdtest done ----
--- volume-00b0de1a-bfab-40e0-a444-b6c2d0de3905 done ----
--- volume-02d9c884-fc30-4700-87fd-950855ae361d done ----
...
[root@ceph-mon01 osc]# 결과는 ...
역시나 없음...
03. 해결 과정
- 해당 snapshot을 갖고 있는 volume이 삭제 되었으니 오류에 대한 조건이 더이상 존재하지 않음.
- repair를 다시 해보라고 함.
[root@ceph-mon01 ~]# date ; ceph pg repair 1.11f
Wed Nov 28 18:16:25 KST 2018
instructing pg 1.11f on osd.113 to repair
[root@ceph-mon01 ~]# ceph health detail
HEALTH_ERR noscrub,nodeep-scrub flag(s) set; Possible data damage: 1 pg repair
OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
PG_DAMAGED Possible data damage: 1 pg repair
pg 1.11f is active+clean+scrubbing+deep+repair, acting [113,105,10]
[root@ceph-mon01 ~]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_ERR
noscrub,nodeep-scrub flag(s) set
Possible data damage: 1 pg repair
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
flags noscrub,nodeep-scrub
data:
pools: 4 pools, 10240 pgs
objects: 12321k objects, 47365 GB
usage: 138 TB used, 96138 GB / 232 TB avail
pgs: 10239 active+clean
1 active+clean+scrubbing+deep+repair
io:
client: 598 kB/s rd, 1145 kB/s wr, 18 op/s rd, 63 op/s wr
pg 1.11f를 repair중
03. 해결 과정
- ceph log를 확인.
[root@ceph-mon01 ~]# ceph -w
...
2018-11-28 18:21:26.654955 osd.113 [ERR] 1.11f repair stat mismatch, got 3310/3312 objects, 91/92 clones, 3243/3245
dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 67/68 whiteouts, 13579894784/13584089088 bytes, 0/0 hit_set_archive bytes.
2018-11-28 18:21:26.655657 osd.113 [ERR] 1.11f repair 1 errors, 1 fixed
2018-11-28 18:19:28.979704 mon.ceph-mon01 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg repair)
2018-11-28 18:20:30.652593 mon.ceph-mon01 [WRN] Health check update: nodeep-scrub flag(s) set (OSDMAP_FLAGS)
2018-11-28 18:20:35.394445 mon.ceph-mon01 [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set)
2018-11-28 18:20:35.394457 mon.ceph-mon01 [INF] Cluster is now healthy
어..?! fixed???
03. 해결 과정
- HEALTH_OK
[root@ceph-mon01 ~]# ceph -s
cluster:
id: f5078395-0236-47fd-ad02-8a6daadc7475
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03
osd: 128 osds: 128 up, 128 in
data:
pools: 4 pools, 10240 pgs
objects: 12321k objects, 47366 GB
usage: 138 TB used, 96138 GB / 232 TB avail
pgs: 10216 active+clean
24 active+clean+scrubbing+deep
io:
client: 424 kB/s rd, 766 kB/s wr, 18 op/s rd, 72 op/s wr
Q&A
오픈소스컨설팅 이 영주
( yjlee@osci.kr )
Thank you
감사합니다
Cloud & Collaboration
T. 02-516-0711 E. sales@osci.kr
서울시강남구 테헤란로83길32,5층(삼성동, 나라키움삼성동A빌딩)
www.osci.kr

Ceph issue 해결 사례

  • 1.
  • 2.
    Contents 01. 구성도 02. Issue발생 03. 해결 과정
  • 3.
  • 4.
    01. 구성도 ● 전체 구성도 ControllerNode Compute NodeStorage Node Deploy FireWall Router
  • 5.
    5/26 01. 구성도 ● Ceph 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3
  • 6.
    6/26 01. 구성도 ● Ceph OBJ흐름 PG: Placement Group Object를 저장하기 위한 OSD의 group. 복제본 수에 맞춰 member의 수가 달라짐. OSD: Object Storage Daemon object를 최종 저장하는 곳 Monitor: ceph OSD의 변화를 monitoring 하여 crush map을 만드는 주체 주체. [root@ceph-osd01 ~]# rados ls -p vms rbd_data.1735e637a64d5.0000000000000000 rbd_header.1735e637a64d5 rbd_directory rbd_children rbd_info rbd_data.1735e637a64d5.0000000000000003 rbd_data.1735e637a64d5.0000000000000002 rbd_id.893f4f3d-f6d9-4521-997c-72caa861ac24_disk rbd_data.1735e637a64d5.0000000000000001 rbd_object_map.1735e637a64d5 [root@ceph-osd01 ~]# OBJ의 기본 크기는 크기는 주체4MBMB CRUSH: Controlled Replication Under Scalable Hashing Object를 분산 저장하기위한 알고리즘.
  • 7.
  • 8.
    02. Issue 발생 Ceph구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 OSD 중 1개가 90%가 되어 Read/Write가 안되는 주체 상태 [root@osc-ceph01 ~]# ceph pg dump |grep -i full_ratio dumped all in format plain full_ratio 0.9 nearfull_ratio 0.8 [root@osc-ceph01 ~]# ceph daemon mon.`hostname` config show |grep -i osd_full_ratio "mon_osd_full_ratio": "0.9", [root@osc-ceph01 ~]#
  • 9.
    02. Issue 발생 -Ceph community Trouble shooting guide 참조 : http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space
  • 10.
    02. Issue 발생 Ceph구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 osd.8의 pg 1.11f를 삭제
  • 11.
    02. Issue 발생 [root@ceph-mon02~]# ceph -s cluster f5078395-0236-47fd-ad02-8a6daadc7475 health HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds 162 pgs backfill_wait 37 pgs backfilling 322 pgs degraded 1 pgs down 2 pgs peering 4 pgs recovering 119 pgs recovery_wait 1 pgs stuck inactive 322 pgs stuck unclean 199 pgs undersized recovery 592647/43243812 objects degraded (1.370%) recovery 488046/43243812 objects misplaced (1.129%) 1 mons down, quorum 0,2 ceph-mon01,ceph-mon03 monmap e1: 3 mons at {ceph-mon01=10.10.50.201:6789/0,ceph-mon02=10.10.50.202:6789/0,ceph-mon03=10.10.50.203:6789/0} election epoch 480, quorum 0,2 ceph-mon01,ceph-mon03 osdmap e27606: 128 osds: 125 up, 125 in; 198 remapped pgs flags sortbitwise pgmap v58287759: 10240 pgs, 4 pools, 54316 GB data, 14076 kobjects 157 TB used, 71440 GB / 227 TB avail 592647/43243812 objects degraded (1.370%) 488046/43243812 objects misplaced (1.129%) 9916 active+clean 162 active+undersized+degraded+remapped+wait_backfill 119 active+recovery_wait+degraded 37 active+undersized+degraded+remapped+backfilling 4 active+recovering+degraded 1 down+peering 1 peering 300초 넘게 통신이 안되는 pg가 1개 (1.11f) ... osd가 down되어 backfill을 기다리고 있는 pg가 162개 pglog의 범위를 벗어나 backfill을 진행 하고 있는 pg가 37개 3copy를 채우지 못해 성능이 떨어진 pg가 322개 문제의 down된 pg 1개... (1.11f) ) 상태를 결정중인 pg 2개 (recovery, backfill) recovery를 기다리고 있는 pg 119개 pglog를 보고 복구중인 pg 4개 (해당 pg I/O block됨) up상태의 osd가 없어서 inactive 된 pg 1개 (1.11f) 3벌 복제에 못미치는 pg가 322개 pool의 복제본 수에 못미치는 pg가 199개 Monitor 1개 죽음 pg 1.11f를 갖고 있는 OSD 3개 죽음
  • 12.
    02. Issue 발생 구성도 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 ceph-mon1 ceph-mon2 ceph-mon3 Images Pool Openstack image들이 들어가 있음. Volumes Pool pg 1.11f는 모든 openstack volume들의 정보를 조금씩 갖고 있음. pg 1개가 down되면 해당 pool의 모든 data들을 쓸 수가 없다. [root@osc-ceph01 ~]# ceph pg dump |head dumped all in format plain ... pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 1.11f 0 0 0 0 0 0 3080 3080 active+clean 2019-07-10 08:12:46.623592 921'8580 10763:10591 [8,4,7] 8 [8,4,7] 8 921'8580 2019-07-10 08:12:46.623572 921'8580 2019-07-07 19:44:32.652377 ... Primary pg가 모든 I/O를 책임진다.
  • 13.
  • 14.
    03. 해결 과정 writeout_from:30174'649621, trimmed: -1> 2018-10-24 15:28:44.487997 7fb622e2d700 5 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: false, divergent_priors: 0, writeout_from: 30174'593316, trimmed: 0> 2018-10-24 15:28:44.502006 7fb61de23700 -1 osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' thread 7fb61de23700 time 2018-10-24 15:28:44.497739 osd/SnapMapper.cc: 228: FAILED assert(r == -2) 분석 결과... 3벌 복제된 pg간 충돌이나서 해당 pg를 갖고 있는 osd가 down된다. 이것은 redhat ceph 3.1(luminous)에서 fix되었으니 upgrade를 해라!! 그러나... - Redhat Openstack 9(Mitaka)는 Redhat ceph 3.1을 지원 안함. - Redhat ceph 3.1로 upgrade하기전에 openstack을 10(Newton)까지 upgrade 필요. - Redhat openstack 9는 TripleO로 되어져 있음. (Upgrade process가 굉장히 복잡함...) - Redhat ceph upgrade 시 Error상태에서 해야 함. - 렁나ㅣ러아니ㅓㄹ아ㅣㄴ;ㅓㄹ아ㅣ;ㄴ며랴ㅓ야냋
  • 15.
    03. 해결 과정 Openstackupgrade - 실패... - 재설치 후 모든 후 모든 모든 vm 복구 Ceph 3.1 upgrade - ceph ansible을 사용하지 않고 않고 manualy upgrade 함.
  • 16.
    03. 해결 과정 ceph-osd1 ... ceph-osd2 ... ceph-osd3 ... ...osd.0osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 osd.8 vms Pool Nova에 의해 생성되는 vm image를 저장 12345_disk 기존 vm rbd 67890_disk 신규 vm rbd 신규 VM1 ID=67890 기존 VM1 ID=12345 복구 과정 - 신규 vm생성 (ID 67890) - vms pool에 있는 rbd 67890_disk삭제 - 12345_disk를 67890_disk로 이름변경 - 이걸 모든vm에 적용... [root@ceph01 ~]# rbd list -p vms 12345_disk 67890_disk [root@ceph01 ~]# rbd rm -p vms 67890_disk Removing image: 100% complete...done. [root@ceph01 ~]# [root@ceph01 ~]# rbd mv -p vms 12345_disk 67890_disk [root@ceph01 ~]# rbd ls -p vms 67890_disk
  • 17.
    03. 해결 과정 RedhatCeph 3.1 upgrade 후 ... - 비슷한 문제 발생 - pg 1.11f 를 갖고 있는 osd들이 up down을 반복 함. [root@ceph-mon01 osc]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR noscrub,nodeep-scrub flag(s) set 5 scrub errors Possible data damage: 1 pg inconsistent services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in flags noscrub,nodeep-scrub data: pools: 4 pools, 10240 pgs objects: 12200k objects, 46865 GB usage: 137 TB used, 97628 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+inconsistent io: client: 0 B/s rd, 1232 kB/s wr, 19 op/s rd, 59 op/s wr [root@ceph-mon01 osc]# ceph health detail HEALTH_ERR noscrub,nodeep-scrub flag(s) set; 5 scrub errors; Possible data damage: 1 pg inconsistent OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set OSD_SCRUB_ERRORS 5 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 1.11f is active+clean+inconsistent, acting [113,105,10] [root@ceph-mon01 osc]# OTL...
  • 18.
    03. 해결 과정 하지만문제되는 Object를 특정지을 수 있었음. [root@ceph-mon01 ~]# rados list-inconsistent-obj 1.11f --format=json-pretty { "epoch": 34376, "inconsistents": [ { "object": { "name": "rbd_data.39edab651c7b53.0000000000003600", "nspace": "", "locator": "",
  • 19.
    03. 해결 과정 Objectrbd_data.39edab651c7b53.0000000000003600는 고객 DB Service vm의 root filesystem volume이었음. 다행이도 DB data에는 문제가 없었고... 문제가 된 DB vm의 root filesystem을 담고 있는 RBD image를 삭제 함. 하지만 여전히 상태는 HEALTH_ERR ... [root@ceph-mon01 osc]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR 4 scrub errors Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in data: pools: 4 pools, 10240 pgs objects: 12166k objects, 46731 GB usage: 136 TB used, 98038 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+inconsistent+snaptrim_error io: client: 0 B/s rd, 351 kB/s wr, 15 op/s rd, 51 op/s wr [root@ceph-mon01 osc]# ceph health detail HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error OSD_SCRUB_ERRORS 4 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent, 1 pg snaptrim_error pg 1.11f is active+clean+inconsistent+snaptrim_error, acting [113,105,10] [root@ceph-mon01 osc]#
  • 20.
    03. 해결 과정 -문제되는 object의 snapshot id 54(0x36)이 문제가 되어서 error가 발생중임. - ??? 이미 지웠는데?? 2018-11-16 18:45:00.163319 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 10: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) 2018-11-16 18:45:00.163330 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 105: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) 2018-11-16 18:45:00.163333 7fb827aca700 -1 log_channel(cluster) log [ERR] : 1.11f shard 113: soid 1:f886c0a3:::rbd_data.39edab651c7b53.0000000000003600:36 data_digest 0x43d61c5d != data_digest 0x86baff34 from auth oi 1:f886c0a3::: rbd_data.39edab651c7b53.0000000000003600:36(14027'236814 osd.113.0:29524 [36] dirty|data_digest|omap_digest s 4194304 uv 235954 dd 86baff34 od ffffffff alloc_hint [0 0 0]) $ printf "%dn" 0x36 54 [root@ceph-osd08 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-113 --pgid 1.11f --op list | grep 39edab651c7b53 Error getting attr on : 1.11f_head,#-3:f8800000:::scrub_1.11f:head#, (61) No data available ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":54,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}] ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":63,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}] ["1.11f",{"oid":"rbd_data.39edab651c7b53.0000000000003600","key":"","snapid":-2,"hash":3305333023,"max":0,"pool":1,"namespace":"","max":0}]
  • 21.
    03. 해결 과정 -문제되는 object를 갖고 있는 rbd image를 찾아보자! [root@ceph-mon01 osc]# rbd info volume-13076ffc-6520-4db8-b238-1ba6108bfe52 -p volumes rbd image 'volume-13076ffc-6520-4db8-b238-1ba6108bfe52': size 53248 MB in 13312 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.62cb510d494de format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: [root@ceph-mon01 osc]# [root@ceph-mon01 osc]# cat rbd-info.sh #!/bin/bash for i in `rbd list -p volumes` do rbd info volumes/$i |grep rbd_data.39edab651c7b53 echo --- $i done ---- done [root@ceph-mon01 osc]# bash rbd-info.sh rbd info에서 object의 prefix를 볼 수 있다. 모든 rbd image에서 문제되는 object를 찾는 script [root@ceph-mon01 osc]# bash rbd-info.sh --- rbdtest done ---- --- volume-00b0de1a-bfab-40e0-a444-b6c2d0de3905 done ---- --- volume-02d9c884-fc30-4700-87fd-950855ae361d done ---- ... [root@ceph-mon01 osc]# 결과는 ... 역시나 없음...
  • 22.
    03. 해결 과정 -해당 snapshot을 갖고 있는 volume이 삭제 되었으니 오류에 대한 조건이 더이상 존재하지 않음. - repair를 다시 해보라고 함. [root@ceph-mon01 ~]# date ; ceph pg repair 1.11f Wed Nov 28 18:16:25 KST 2018 instructing pg 1.11f on osd.113 to repair [root@ceph-mon01 ~]# ceph health detail HEALTH_ERR noscrub,nodeep-scrub flag(s) set; Possible data damage: 1 pg repair OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set PG_DAMAGED Possible data damage: 1 pg repair pg 1.11f is active+clean+scrubbing+deep+repair, acting [113,105,10] [root@ceph-mon01 ~]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_ERR noscrub,nodeep-scrub flag(s) set Possible data damage: 1 pg repair services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in flags noscrub,nodeep-scrub data: pools: 4 pools, 10240 pgs objects: 12321k objects, 47365 GB usage: 138 TB used, 96138 GB / 232 TB avail pgs: 10239 active+clean 1 active+clean+scrubbing+deep+repair io: client: 598 kB/s rd, 1145 kB/s wr, 18 op/s rd, 63 op/s wr pg 1.11f를 repair중
  • 23.
    03. 해결 과정 -ceph log를 확인. [root@ceph-mon01 ~]# ceph -w ... 2018-11-28 18:21:26.654955 osd.113 [ERR] 1.11f repair stat mismatch, got 3310/3312 objects, 91/92 clones, 3243/3245 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 67/68 whiteouts, 13579894784/13584089088 bytes, 0/0 hit_set_archive bytes. 2018-11-28 18:21:26.655657 osd.113 [ERR] 1.11f repair 1 errors, 1 fixed 2018-11-28 18:19:28.979704 mon.ceph-mon01 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg repair) 2018-11-28 18:20:30.652593 mon.ceph-mon01 [WRN] Health check update: nodeep-scrub flag(s) set (OSDMAP_FLAGS) 2018-11-28 18:20:35.394445 mon.ceph-mon01 [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set) 2018-11-28 18:20:35.394457 mon.ceph-mon01 [INF] Cluster is now healthy 어..?! fixed???
  • 24.
    03. 해결 과정 -HEALTH_OK [root@ceph-mon01 ~]# ceph -s cluster: id: f5078395-0236-47fd-ad02-8a6daadc7475 health: HEALTH_OK services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon01(active), standbys: ceph-mon02, ceph-mon03 osd: 128 osds: 128 up, 128 in data: pools: 4 pools, 10240 pgs objects: 12321k objects, 47366 GB usage: 138 TB used, 96138 GB / 232 TB avail pgs: 10216 active+clean 24 active+clean+scrubbing+deep io: client: 424 kB/s rd, 766 kB/s wr, 18 op/s rd, 72 op/s wr
  • 25.
  • 26.
    Thank you 감사합니다 Cloud &Collaboration T. 02-516-0711 E. sales@osci.kr 서울시강남구 테헤란로83길32,5층(삼성동, 나라키움삼성동A빌딩) www.osci.kr