title: “Pool Operations”
date: 2020-05-03T18:33:40
slug: pool-operations
Create Pool:
ceph osd pool create nextcloud 32 32
pool ‘nextcloud’ created
Delete Pool:
ceph osd pool delete 32 32 –yes-i-really-really-mean-it
pool ’32’ removed
title: “Pool Operations”
date: 2020-05-03T18:33:40
slug: pool-operations
Create Pool:
ceph osd pool create nextcloud 32 32
pool ‘nextcloud’ created
Delete Pool:
ceph osd pool delete 32 32 –yes-i-really-really-mean-it
pool ’32’ removed
title: “Troubleshooting Techniques”
date: 2020-04-22T09:15:36
slug: troubleshooting-techniques
https://github.com/rook/rook/blob/master/Documentation/ceph-common-issues.md
title: “Edit replication size”
date: 2020-04-18T09:15:29
slug: edit-replication-size
kubectl edit CephFilesystem myfs -n rook-ceph
spec:
dataPools:
- compressionMode: ""
crushRoot: ""
deviceClass: ""
erasureCoded:
algorithm: ""
codingChunks: 0
dataChunks: 0
failureDomain: ""
replicated:
requireSafeReplicaSize: false
size: 3
targetSizeRatio: 0
metadataPool:
compressionMode: ""
crushRoot: ""
deviceClass: ""
erasureCoded:
algorithm: ""
codingChunks: 0
dataChunks: 0
failureDomain: ""
replicated:
requireSafeReplicaSize: false
size: 3
targetSizeRatio: 0
title: “Replace OSD”
date: 2020-04-18T08:54:15
slug: replace-osd
Exec in Toolbox:
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
Useful Commands:
ceph status
ceph osd tree
ceph osd ls (for getting OSD ID)
Remove faulty OSD (in example osd with ID 1):
# ceph osd out osd.1
marked out osd.1.
# ceph osd crush remove osd.1
removed item id 1 name 'osd.1' from crush map
# ceph auth del osd.1
updated
# ceph osd rm osd.1
removed osd.1
Remove Deployment for faulty OSD (will be created again in next step)
# kubectl delete deployment -n rook-ceph rook-ceph-osd-1
deployment.extensions "rook-ceph-osd-1" deleted
Scale Operator Down and up again to detect noes OSDs:
# kubectl scale deployment rook-ceph-operator --replicas=0 -n rook-ceph
deployment.extensions/rook-ceph-operator scaled
# kubectl get pods --all-namespaces -o wide|grep operator
# kubectl scale deployment rook-ceph-operator --replicas=1 -n rook-ceph
deployment.extensions/rook-ceph-operator scaled
# kubectl get pods --all-namespaces -o wide|grep operator
rook-ceph-system rook-ceph-operator-76cf7f88f-g9pxr 0/1 ContainerCreating 0 2s kube-ceph02
New OSD prepare PODs should be created:
kubectl get pods -n rook-ceph -o wide
Check with:
ceph status
ceph osd tree
ceph osd list
In Case of problems:
On OSD Node:
view /var/lib/rook/rook-ceph/log/ceph-volume.log
Check Container logs:
kubectl logs -n rook-ceph rook-ceph-osd-prepare-k8s-node01-hl4rj
Check MGR Container for Recovery:
kubectl logs -f -n rook-ceph rook-ceph-mgr-a-7cb4ccffc6-dnz2q
debug 2020-04-18 08:57:59.352 7ffb18b9b700 0 log\_channel(cluster) log [DBG] : pgmap v940: 64 pgs: 2 active+undersized+degraded+remapped+backfilling, 55 active+clean, 7 active+undersized+degraded+remapped+backfill\_wait; 14 GiB data, 39 GiB used, 708 GiB / 750 GiB avail; 799 KiB/s rd, 5.0 MiB/s wr, 231 op/s; 2648/33588 objects degraded (7.884%); 5.7 MiB/s, 4 objects/s recovering
192.168.1.7 - - [18/Apr/2020:08:58:01] "GET / HTTP/1.1" 200 155 "" "kube-probe/1.18"
debug 2020-04-18 08:58:01.352 7ffb18b9b700 0 log\_channel(cluster) log [DBG] : pgmap v942: 64 pgs: 2 active+undersized+degraded+remapped+backfilling, 55 active+clean, 7 active+undersized+degraded+remapped+backfill\_wait; 14 GiB data, 39 GiB used, 708 GiB / 750 GiB avail; 738 KiB/s rd, 6.0 MiB/s wr, 270 op/s; 2648/33588 objects degraded (7.884%); 7.1 MiB/s, 5 objects/s recovering
title: “Namespace is in termination state”
date: 2020-04-13T07:43:26
slug: namespace-is-in-termination-state
Get more Namespace info:
get namespaces rook-ceph -o json
{
"lastTransitionTime": "2020-04-13T06:53:57Z",
"message": "Some resources are remaining: cephfilesystems.ceph.rook.io has 1 resource instances",
"reason": "SomeResourcesRemain",
"status": "True",
"type": "NamespaceContentRemaining"
},
{
"lastTransitionTime": "2020-04-13T06:53:57Z",
"message": "Some content in the namespace has finalizers remaining: cephfilesystem.ceph.rook.io in 1 resource instances",
"reason": "SomeFinalizersRemain",
"status": "True",
"type": "NamespaceFinalizersRemaining"
}
],
"phase": "Terminating"
}
CRD can be deleted ba setting the finalizer to []:
kubectl get crd
NAME CREATED AT
alertmanagers.monitoring.coreos.com 2020-04-11T11:06:08Z
cephfilesystems.ceph.rook.io 2020-04-11T21:12:52Z
kubectl edit crd cephfilesystems.ceph.rook.io
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
creationTimestamp: "2020-04-11T21:12:52Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2020-04-13T07:08:32Z"
finalizers:
- customresourcecleanup.apiextensions.k8s.io
generation: 1
managedFields:
- apiVersion: apiextensions.k8s.io/v1beta1
Modify to:
finalizers: []
title: “quicker detection of a Node down”
date: 2020-04-13T06:53:00
slug: quicker-detection-of-a-node-down
In your Kubernetes cluster a node can die or reboot.
This kind of tools like Kubernetes are high available and designed to be robust and auto recover in such scenarios, and Kubernetes accomplish this very well.
But, you might notice that when a Node gets down, the pods of the broken node are still running for some time and they still get requests, and those requests, will fail.
That time can be reduced, because in my opinion, by default is too high. There are a bunch of parameters to tweak in the Kubelet and in the Controller Manager.
This is the workflow of what happens when a node gets down:
1- The Kubelet posts its status to the masters using –node-status-update-frequency=10s
2- A node dies
3- The kube controller manager is the one monitoring the nodes, using –-node-monitor-period=5s it checks, in the masters, the node status reported by the Kubelet.
4- Kube controller manager will see the node is unresponsive, and has this grace period –node-monitor-grace-period=40s until it considers the node unhealthy. This parameter must be N times node-status-update-frequency being N the number of retries allowed for the Kubelet to post node status. N is a constant in the code equals to 5, check var nodeStatusUpdateRetry in https://github.com/kubernetes/kubernetes/blob/e54ebe5ebd39181685923429c573a0b9e7cd6fd6/pkg/kubelet/kubelet.go
Note that the default values don’t fulfill what the documentation says, because:
node-status-update-frequency x N != node-monitor-grace-period (10 x 5 != 40)
But what i can understand, is that 5 post attempts of 10s each, are done in 40s, the first one in second zero, second one in second 10, and so on until the fifth and last one is done in second 40.
So the real equation would be:
node-status-update-frequency x (N-1) != node-monitor-grace-period
More info:
5- Once the node is marked as unhealthy, the kube controller manager will remove its pods based on –pod-eviction-timeout=5m0s
This is a very important timeout, by default it’s 5m which in my opinion is too high, because although the node is already marked as unhealthy the kube controller manager won’t remove the pods so they will be accessible through their service and requests will fail.
6- Kube proxy has a watcher over the API, so the very first moment the pods are evicted the proxy will notice and update the iptables of the node, removing the endpoints from the services so the failing pods won’t be accessible anymore.
These values can be tweaked so you will get less failed requests if a node gets down.
I’ve set these in my cluster:.
kubelet: node-status-update-frequency=4s (from 10s)
controller-manager: node-monitor-period=2s (from 5s)
controller-manager: node-monitor-grace-period=16s (from 40s)
controller-manager: pod-eviction-timeout=30s (from 5m)
The results are quite good, we’ve moved from a node down detection of 5m40s to 46s
title: “Update Dashboard Password”
date: 2020-04-12T07:22:41
slug: update-dashboard-password
[root@rook-ceph-operator-9bd79cdcf-npkm8 /]# ceph -c /var/lib/rook/rook-ceph/rook-ceph.config dashboard set-login-credentials admin admin
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
\*\*\* WARNING: this command is deprecated. \*\*\*
\*\*\* Please use the ac-user-\* related commands to manage users. \*\*\*
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
Username and password updated
title: “Erase Ceph Disc”
date: 2020-04-11T21:12:04
slug: erase-ceph-disc
sgdisk --zap-all /dev/vdb
dd if=/dev/zero of=/dev/vdb bs=1M count=100 oflag=direct,dsync
ls /dev/mapper/ceph-\* | xargs -I% -- dmsetup remove %
rm -rf /dev/ceph-\*
rm -rf /var/lib/rook
fdisk -l
title: “Proxy Port to local”
date: 2020-04-11T08:01:44
slug: proxy-port-to-local
kubectl port-forward --namespace logging $POD\_NAME 5601:5601
title: “Backup and Restore ETCD”
date: 2020-04-06T13:27:03
slug: backup-and-restore-etcd
Get etcdctl Tool:
https://github.com/etcd-io/etcd/releases/download/v3.4.7/etcd-v3.4.7-linux-amd64.tar.gz
Create a Snapshot
ETCDCTL\_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /tmp/snapshot-pre-boot.db
Status of Snapshot:
ETCDCTL\_API=3 etcdctl snapshot status /tmp/snapshot-pre-boot.db -w table
Restore ETCD Snapshot to a new folder
ETCDCTL\_API=3 etcdctl snapshot restore -h
ETCDCTL\_API=3 etcdctl \
--endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--name=master \
--cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
--data-dir /var/lib/etcd-from-backup \
--initial-cluster=master=https://127.0.0.1:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls=https://127.0.0.1:2380 \
snapshot restore /tmp/snapshot-pre-boot.db
Modify /etc/kubernetes/manifests/etcd.yaml
– –data-dir=/var/lib/etcd-from-backup
– –initial-cluster-token=etcd-cluster-1
– mountPath: /etc/kubernetes/pki/etcd
path: /var/lib/etcd-from-backup
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://172.17.0.45:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd-from-backup
- --initial-cluster-token=etcd-cluster-1
- --initial-advertise-peer-urls=https://172.17.0.45:2380
- --initial-cluster=master=https://172.17.0.45:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.45:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://172.17.0.45:2380
- --name=master
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
.
.
.
volumeMounts:
- mountPath: /var/lib/etcd-from-backup
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd-from-backup
type: DirectoryOrCreate
name: etcd-data
status: {}