Author Archives: admin

Pool Operations

title: “Pool Operations”
date: 2020-05-03T18:33:40
slug: pool-operations

Create Pool:
ceph osd pool create nextcloud 32 32
pool ‘nextcloud’ created

Delete Pool:
ceph osd pool delete 32 32 –yes-i-really-really-mean-it
pool ’32’ removed

Troubleshooting Techniques

title: “Troubleshooting Techniques”
date: 2020-04-22T09:15:36
slug: troubleshooting-techniques

https://github.com/rook/rook/blob/master/Documentation/ceph-common-issues.md

Edit replication size

title: “Edit replication size”
date: 2020-04-18T09:15:29
slug: edit-replication-size

kubectl edit CephFilesystem myfs -n rook-ceph
spec:
 dataPools:
 - compressionMode: ""
 crushRoot: ""
 deviceClass: ""
 erasureCoded:
 algorithm: ""
 codingChunks: 0
 dataChunks: 0
 failureDomain: ""
 replicated:
 requireSafeReplicaSize: false
 size: 3
 targetSizeRatio: 0
 metadataPool:
 compressionMode: ""
 crushRoot: ""
 deviceClass: ""
 erasureCoded:
 algorithm: ""
 codingChunks: 0
 dataChunks: 0
 failureDomain: ""
 replicated:
 requireSafeReplicaSize: false
 size: 3
 targetSizeRatio: 0

Replace OSD

title: “Replace OSD”
date: 2020-04-18T08:54:15
slug: replace-osd

Exec in Toolbox:

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

Useful Commands:

ceph status
ceph osd tree
ceph osd ls (for getting OSD ID)

Remove faulty OSD (in example osd with ID 1):

# ceph osd out osd.1
marked out osd.1.

# ceph osd crush remove osd.1
removed item id 1 name 'osd.1' from crush map

# ceph auth del osd.1
updated

# ceph osd rm osd.1
removed osd.1

Remove Deployment for faulty OSD (will be created again in next step)

# kubectl delete deployment -n rook-ceph rook-ceph-osd-1
deployment.extensions "rook-ceph-osd-1" deleted

Scale Operator Down and up again to detect noes OSDs:

# kubectl scale deployment rook-ceph-operator --replicas=0 -n rook-ceph
deployment.extensions/rook-ceph-operator scaled

# kubectl get pods --all-namespaces -o wide|grep operator

# kubectl scale deployment rook-ceph-operator --replicas=1 -n rook-ceph
deployment.extensions/rook-ceph-operator scaled

# kubectl get pods --all-namespaces -o wide|grep operator
rook-ceph-system rook-ceph-operator-76cf7f88f-g9pxr 0/1 ContainerCreating 0 2s kube-ceph02

New OSD prepare PODs should be created:

kubectl get pods -n rook-ceph -o wide

Check with:

ceph status
ceph osd tree
ceph osd list

In Case of problems:

On OSD Node:

view /var/lib/rook/rook-ceph/log/ceph-volume.log

Check Container logs:

kubectl logs -n rook-ceph rook-ceph-osd-prepare-k8s-node01-hl4rj

Check MGR Container for Recovery:

kubectl logs -f -n rook-ceph rook-ceph-mgr-a-7cb4ccffc6-dnz2q
debug 2020-04-18 08:57:59.352 7ffb18b9b700 0 log\_channel(cluster) log [DBG] : pgmap v940: 64 pgs: 2 active+undersized+degraded+remapped+backfilling, 55 active+clean, 7 active+undersized+degraded+remapped+backfill\_wait; 14 GiB data, 39 GiB used, 708 GiB / 750 GiB avail; 799 KiB/s rd, 5.0 MiB/s wr, 231 op/s; 2648/33588 objects degraded (7.884%); 5.7 MiB/s, 4 objects/s recovering
192.168.1.7 - - [18/Apr/2020:08:58:01] "GET / HTTP/1.1" 200 155 "" "kube-probe/1.18"
debug 2020-04-18 08:58:01.352 7ffb18b9b700 0 log\_channel(cluster) log [DBG] : pgmap v942: 64 pgs: 2 active+undersized+degraded+remapped+backfilling, 55 active+clean, 7 active+undersized+degraded+remapped+backfill\_wait; 14 GiB data, 39 GiB used, 708 GiB / 750 GiB avail; 738 KiB/s rd, 6.0 MiB/s wr, 270 op/s; 2648/33588 objects degraded (7.884%); 7.1 MiB/s, 5 objects/s recovering

Namespace is in termination state

title: “Namespace is in termination state”
date: 2020-04-13T07:43:26
slug: namespace-is-in-termination-state

Get more Namespace info:

get namespaces rook-ceph -o json

{
"lastTransitionTime": "2020-04-13T06:53:57Z",
"message": "Some resources are remaining: cephfilesystems.ceph.rook.io has 1 resource instances",
"reason": "SomeResourcesRemain",
"status": "True",
"type": "NamespaceContentRemaining"
},
{
"lastTransitionTime": "2020-04-13T06:53:57Z",
"message": "Some content in the namespace has finalizers remaining: cephfilesystem.ceph.rook.io in 1 resource instances",
"reason": "SomeFinalizersRemain",
"status": "True",
"type": "NamespaceFinalizersRemaining"
}
],
"phase": "Terminating"
}

CRD can be deleted ba setting the finalizer to []:

kubectl get crd
NAME CREATED AT
alertmanagers.monitoring.coreos.com 2020-04-11T11:06:08Z
cephfilesystems.ceph.rook.io 2020-04-11T21:12:52Z

kubectl edit crd cephfilesystems.ceph.rook.io

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
 creationTimestamp: "2020-04-11T21:12:52Z"
 deletionGracePeriodSeconds: 0
 deletionTimestamp: "2020-04-13T07:08:32Z"
 finalizers:
 - customresourcecleanup.apiextensions.k8s.io
 generation: 1
 managedFields:
 - apiVersion: apiextensions.k8s.io/v1beta1

Modify to:

 finalizers: []

quicker detection of a Node down

title: “quicker detection of a Node down”
date: 2020-04-13T06:53:00
slug: quicker-detection-of-a-node-down

In your Kubernetes cluster a node can die or reboot.

This kind of tools like Kubernetes are high available and designed to be robust and auto recover in such scenarios, and Kubernetes accomplish this very well.

But, you might notice that when a Node gets down, the pods of the broken node are still running for some time and they still get requests, and those requests, will fail.

That time can be reduced, because in my opinion, by default is too high. There are a bunch of parameters to tweak in the Kubelet and in the Controller Manager.

This is the workflow of what happens when a node gets down:

1- The Kubelet posts its status to the masters using –node-status-update-frequency=10s

2- A node dies

3- The kube controller manager is the one monitoring the nodes, using –-node-monitor-period=5s it checks, in the masters, the node status reported by the Kubelet.

4- Kube controller manager will see the node is unresponsive, and has this grace period –node-monitor-grace-period=40s until it considers the node unhealthy. This parameter must be N times node-status-update-frequency being N the number of retries allowed for the Kubelet to post node status. N is a constant in the code equals to 5, check var nodeStatusUpdateRetry in https://github.com/kubernetes/kubernetes/blob/e54ebe5ebd39181685923429c573a0b9e7cd6fd6/pkg/kubelet/kubelet.go

Note that the default values don’t fulfill what the documentation says, because:

node-status-update-frequency x N != node-monitor-grace-period (10 x 5 != 40)

But what i can understand, is that 5 post attempts of 10s each, are done in 40s, the first one in second zero, second one in second 10, and so on until the fifth and last one is done in second 40.

So the real equation would be:

node-status-update-frequency x (N-1) != node-monitor-grace-period

More info:

https://github.com/kubernetes/kubernetes/blob/3d1b1a77e4aca2db25d465243cad753b913f39c4/pkg/controller/node/nodecontroller.go

5- Once the node is marked as unhealthy, the kube controller manager will remove its pods based on –pod-eviction-timeout=5m0s

This is a very important timeout, by default it’s 5m which in my opinion is too high, because although the node is already marked as unhealthy the kube controller manager won’t remove the pods so they will be accessible through their service and requests will fail.

6- Kube proxy has a watcher over the API, so the very first moment the pods are evicted the proxy will notice and update the iptables of the node, removing the endpoints from the services so the failing pods won’t be accessible anymore.

These values can be tweaked so you will get less failed requests if a node gets down.

I’ve set these in my cluster:.

kubelet: node-status-update-frequency=4s (from 10s)

controller-manager: node-monitor-period=2s (from 5s)
controller-manager: node-monitor-grace-period=16s (from 40s)
controller-manager: pod-eviction-timeout=30s (from 5m)

The results are quite good, we’ve moved from a node down detection of 5m40s to 46s

Update Dashboard Password

title: “Update Dashboard Password”
date: 2020-04-12T07:22:41
slug: update-dashboard-password

[root@rook-ceph-operator-9bd79cdcf-npkm8 /]# ceph -c /var/lib/rook/rook-ceph/rook-ceph.config dashboard set-login-credentials admin admin
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
\*\*\* WARNING: this command is deprecated. \*\*\*
\*\*\* Please use the ac-user-\* related commands to manage users. \*\*\*
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
Username and password updated

Erase Ceph Disc

title: “Erase Ceph Disc”
date: 2020-04-11T21:12:04
slug: erase-ceph-disc

sgdisk --zap-all /dev/vdb
dd if=/dev/zero of=/dev/vdb bs=1M count=100 oflag=direct,dsync
ls /dev/mapper/ceph-\* | xargs -I% -- dmsetup remove %
rm -rf /dev/ceph-\*
rm -rf /var/lib/rook
fdisk -l

Proxy Port to local

title: “Proxy Port to local”
date: 2020-04-11T08:01:44
slug: proxy-port-to-local

kubectl port-forward --namespace logging $POD\_NAME 5601:5601

Backup and Restore ETCD

title: “Backup and Restore ETCD”
date: 2020-04-06T13:27:03
slug: backup-and-restore-etcd

Get etcdctl Tool:
https://github.com/etcd-io/etcd/releases/download/v3.4.7/etcd-v3.4.7-linux-amd64.tar.gz

Create a Snapshot

ETCDCTL\_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 \
 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
 --cert=/etc/kubernetes/pki/etcd/server.crt
 --key=/etc/kubernetes/pki/etcd/server.key \
 snapshot save /tmp/snapshot-pre-boot.db

Status of Snapshot:

ETCDCTL\_API=3 etcdctl snapshot status /tmp/snapshot-pre-boot.db -w table

Restore ETCD Snapshot to a new folder

ETCDCTL\_API=3 etcdctl snapshot restore -h

ETCDCTL\_API=3 etcdctl \
 --endpoints=https://[127.0.0.1]:2379 \
 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
 --name=master \
 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
 --data-dir /var/lib/etcd-from-backup \
 --initial-cluster=master=https://127.0.0.1:2380 \
 --initial-cluster-token etcd-cluster-1 \
 --initial-advertise-peer-urls=https://127.0.0.1:2380 \
 snapshot restore /tmp/snapshot-pre-boot.db

Modify /etc/kubernetes/manifests/etcd.yaml
– –data-dir=/var/lib/etcd-from-backup
– –initial-cluster-token=etcd-cluster-1
– mountPath: /etc/kubernetes/pki/etcd
path: /var/lib/etcd-from-backup

spec:
 containers:
 - command:
 - etcd
 - --advertise-client-urls=https://172.17.0.45:2379
 - --cert-file=/etc/kubernetes/pki/etcd/server.crt
 - --client-cert-auth=true
 - --data-dir=/var/lib/etcd-from-backup
 - --initial-cluster-token=etcd-cluster-1
 - --initial-advertise-peer-urls=https://172.17.0.45:2380
 - --initial-cluster=master=https://172.17.0.45:2380
 - --key-file=/etc/kubernetes/pki/etcd/server.key
 - --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.45:2379
 - --listen-metrics-urls=http://127.0.0.1:2381
 - --listen-peer-urls=https://172.17.0.45:2380
 - --name=master
 - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
 - --peer-client-cert-auth=true
 - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
 - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
 - --snapshot-count=10000
 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
.
.
.
 volumeMounts:
 - mountPath: /var/lib/etcd-from-backup
 name: etcd-data
 - mountPath: /etc/kubernetes/pki/etcd
 name: etcd-certs
 hostNetwork: true
 priorityClassName: system-cluster-critical
 volumes:
 - hostPath:
 path: /etc/kubernetes/pki/etcd
 type: DirectoryOrCreate
 name: etcd-certs
 - hostPath:
 path: /var/lib/etcd-from-backup
 type: DirectoryOrCreate
 name: etcd-data
status: {}