menu

IaaS 平台完整部署手册

  • date_range 23/01/2023 12:12
    点击量:
    info
    sort
    中台
    label
    aPAAS
    iPAAS
    IaaS
    SaaS

IaaS 平台完整部署手册(多云多网异构纳管版)

3 节点 Ubuntu 22.04 + Kubernetes + CNCF 全栈

基于 CNCF 开源项目构建完整 IaaS 平台,支持多云异构纳管、多网异构纳管,替代 OpenStack

项目
集群规模 3节点(Master+Worker 混合部署)
K8s版本 v1.28.2
节点配置 4C8G
节点IP 10.10.10.121 / 122 / 123
VIP地址 10.10.10.198
多云纳管 Karmada + Cluster API
多网纳管 Kube-OVN + Submariner + Cilium
编写日期 XXXX年XX月XX日

一、整体架构

┌───────────────────────────────────────────────────────────┐
│                   统一管理面(iaas-1)                       │
│     Karmada 多集群联邦 + Backstage 开发者门户                │
├───────────────────────────────────────────────────────────┤
│     API 网关:Envoy (Emissary-Ingress) + OPA 策略引擎       │
├───────────┬────────────────────────┬──────────────────────┤
│  集群A(自建) │   集群B(阿里云 ACK)    │  集群C(华为云 CCE)  │
│  iaas-1/2/3  │   Cluster API 纳管    │  Cluster API 纳管   │
│  KubeVirt    │   ECS 虚拟机          │  ECS 虚拟机          │
│  Kube-OVN    │   Flannel/Terway     │  Canal/Yangtse      │
├──────────────┴────────────────────┴──────────────────────┤
│           Submariner 跨集群网络互通                         │
├───────────────────────────────────────────────────────────┤
│  存储:Longhorn/Rook-Ceph │ 镜像:Harbor(跨集群镜像复制)    │
├───────────────────────────────────────────────────────────┤
│  可观测:Thanos(跨集群指标聚合)+ Grafana + Jaeger           │
├───────────────────────────────────────────────────────────┤
│  安全:OPA Gatekeeper + Falco + cert-manager               │
└───────────────────────────────────────────────────────────┘

二、节点规划

主机名 IP 角色 说明
iaas-1 10.10.10.121 Master+Worker etcd + 控制面 + Karmada 控制面
iaas-2 10.10.10.122 Master+Worker etcd + 控制面 + 业务Pod
iaas-3 10.10.10.123 Master+Worker etcd + 控制面 + 业务Pod
  • VIP: 10.10.10.198(HAProxy + Keepalived)
  • Pod CIDR: 10.244.0.0/16
  • Service CIDR: 10.96.0.0/12

⚠️ 4C8G 资源紧张,多云纳管的公有云集群使用免费试用/按量付费,不占本地资源


三、系统初始化(3台全部执行)

3.1 设置主机名

# --- 10.10.10.121 ---
sudo hostnamectl set-hostname iaas-1

# --- 10.10.10.122 ---
sudo hostnamectl set-hostname iaas-2

# --- 10.10.10.123 ---
sudo hostnamectl set-hostname iaas-3

3.2 配置 hosts

cat << EOF | sudo tee -a /etc/hosts
10.10.10.198 k8s-vip
10.10.10.121 iaas-1
10.10.10.122 iaas-2
10.10.10.123 iaas-3
EOF

3.3 关闭 swap + 内核参数 + KVM 支持

# 关闭 swap
sudo swapoff -a
sudo sed -i '/swap/d' /etc/fstab

# 加载内核模块
cat << EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter

# KubeVirt 虚拟化模块
sudo modprobe vhost_net
echo "vhost_net" | sudo tee -a /etc/modules-load.d/kubevirt.conf

# 内核参数
cat << EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
sudo sysctl --system

3.4 验证虚拟化支持

egrep -c '(vmx|svm)' /proc/cpuinfo
# 大于 0 说明支持

ls /dev/kvm
# 如果不存在:
sudo apt-get install -y qemu-kvm
sudo modprobe kvm
sudo modprobe kvm_intel  # Intel CPU

⚠️ 如果是 Parallels 虚拟机,需要先在 Parallels 配置中勾选”启用 nested 虚拟化”

3.5 安装 containerd

sudo apt-get update
sudo apt-get install -y containerd

sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' \
  /etc/containerd/config.toml

sudo systemctl restart containerd
sudo systemctl enable containerd

3.6 安装 kubeadm / kubelet / kubectl v1.28.2

sudo apt-get install -y apt-transport-https ca-certificates curl gpg

curl -fsSL https://mirrors.aliyun.com/kubernetes-new/core/\
stable/v1.28/deb/Release.key | \
  sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
  https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/deb/ \
  /" | sudo tee /etc/apt/sources.list.d/kubernetes.list

sudo apt-get update
sudo apt-get install -y kubelet=1.28.2-1.1 kubeadm=1.28.2-1.1 \
  kubectl=1.28.2-1.1
sudo apt-mark hold kubelet kubeadm kubectl

3.7 配置免密 SSH(iaas-1 上执行)

ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa
ssh-copy-id iaas-2
ssh-copy-id iaas-3

四、HAProxy + Keepalived(iaas-1 和 iaas-2)

4.1 安装

sudo apt-get install -y haproxy keepalived psmisc

4.2 HAProxy 配置(两台相同)

cat << 'EOF' | sudo tee /etc/haproxy/haproxy.cfg
global
    log /dev/log local0
    maxconn 4096
    daemon

defaults
    mode tcp
    timeout connect 5s
    timeout client  30s
    timeout server  30s

frontend k8s-api
    bind *:8443
    default_backend k8s-masters

backend k8s-masters
    balance roundrobin
    option tcp-check
    server iaas-1 10.10.10.121:6443 check inter 3s fall 3 rise 2
    server iaas-2 10.10.10.122:6443 check inter 3s fall 3 rise 2
    server iaas-3 10.10.10.123:6443 check inter 3s fall 3 rise 2

listen stats
    bind *:9090
    mode http
    stats enable
    stats uri /stats
EOF

sudo systemctl restart haproxy
sudo systemctl enable haproxy

4.3 Keepalived 配置

iaas-1(MASTER):

cat << 'EOF' | sudo tee /etc/keepalived/keepalived.conf
global_defs {
    router_id LVS_IAAS
}

vrrp_script check_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 3
    weight -2
    fall 10
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface ens33
    virtual_router_id 52
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass IaaS2026
    }
    virtual_ipaddress {
        10.10.10.198/24
    }
    track_script {
        check_haproxy
    }
}
EOF

sudo systemctl restart keepalived
sudo systemctl enable keepalived

⚠️ interface ens33 改为实际网卡名,用 ip addr 查看

iaas-2(BACKUP): 同上,只改 state BACKUPpriority 90

4.4 验证 VIP

ping -c 3 10.10.10.198
ip addr show | grep 10.10.10.198

五、初始化 Kubernetes 集群

5.1 初始化第一个 Master(iaas-1)

sudo kubeadm init \
  --kubernetes-version=v1.28.2 \
  --control-plane-endpoint='k8s-vip:8443' \
  --apiserver-advertise-address=10.10.10.121 \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16 \
  --service-cidr=10.96.0.0/12 \
  --image-repository=registry.aliyuncs.com/google_containers

mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

⚠️ 记录输出的 join 命令

5.2 加入 iaas-2 和 iaas-3

# iaas-2
sudo kubeadm join k8s-vip:8443 --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane --certificate-key <cert-key> \
  --apiserver-advertise-address=10.10.10.122

# iaas-3
sudo kubeadm join k8s-vip:8443 --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane --certificate-key <cert-key> \
  --apiserver-advertise-address=10.10.10.123

# 两台都配置 kubectl
mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

5.3 允许 Master 调度 Pod

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

5.4 安装 Calico CNI

kubectl apply -f https://raw.githubusercontent.com/projectcalico/\
  calico/v3.26.4/manifests/calico.yaml

kubectl set env daemonset/calico-node -n kube-system \
  IP_AUTODETECTION_METHOD=cidr=10.10.10.0/24

watch kubectl get nodes

如果镜像拉不下来,用华为云镜像源拉取后打标签,参考第十九章镜像处理

5.5 安装 Helm

curl https://mirrors.huaweicloud.com/helm/v3.13.3/\
helm-v3.13.3-linux-amd64.tar.gz -o helm.tar.gz
tar -zxvf helm.tar.gz
sudo mv linux-amd64/helm /usr/local/bin/helm
helm version

六、存储层:Longhorn

6.1 前置依赖(3台都执行)

for node in iaas-1 iaas-2 iaas-3; do
  ssh $node 'sudo apt-get install -y open-iscsi nfs-common'
  ssh $node 'sudo systemctl enable iscsid && sudo systemctl start iscsid'
done

6.2 安装 Longhorn

helm repo add longhorn https://charts.longhorn.io
helm repo update

helm install longhorn longhorn/longhorn \
  --namespace longhorn-system --create-namespace \
  --set defaultSettings.defaultReplicaCount=2 \
  --set defaultSettings.defaultDataLocality=best-effort \
  --set longhornUI.replicas=1

kubectl patch storageclass longhorn -p \
  '{"metadata":{"annotations":{\
  "storageclass.kubernetes.io/is-default-class":"true"}}}'

kubectl -n longhorn-system get pods -w

6.3 暴露 Longhorn UI

kubectl -n longhorn-system patch svc longhorn-frontend -p \
  '{"spec":{"type":"NodePort","ports":[{\
  "port":80,"targetPort":8000,"nodePort":30800}]}}'

Longhorn UI: http://10.10.10.121:30800


七、计算层:KubeVirt(IaaS 核心)

7.1 安装 KubeVirt

export KUBEVIRT_VERSION=v1.3.1

kubectl create -f https://github.com/kubevirt/kubevirt/releases/\
download/${KUBEVIRT_VERSION}/kubevirt-operator.yaml

kubectl create -f https://github.com/kubevirt/kubevirt/releases/\
download/${KUBEVIRT_VERSION}/kubevirt-cr.yaml

# 如果没有硬件虚拟化(嵌套虚拟化未开启),启用软件模拟
kubectl -n kubevirt patch kubevirt kubevirt --type=merge -p \
  '{"spec":{"configuration":{"developerConfiguration":{\
  "useEmulation":true}}}}'

kubectl -n kubevirt wait kv kubevirt \
  --for condition=Available --timeout=300s

7.2 安装 virtctl

curl -L -o /usr/local/bin/virtctl \
  https://github.com/kubevirt/kubevirt/releases/download/\
${KUBEVIRT_VERSION}/virtctl-${KUBEVIRT_VERSION}-linux-amd64

chmod +x /usr/local/bin/virtctl

7.3 安装 CDI(镜像导入)

export CDI_VERSION=v1.59.0

kubectl create -f https://github.com/kubevirt/\
containerized-data-importer/releases/download/\
${CDI_VERSION}/cdi-operator.yaml

kubectl create -f https://github.com/kubevirt/\
containerized-data-importer/releases/download/\
${CDI_VERSION}/cdi-cr.yaml

7.4 创建第一台虚拟机

cat << 'YAML' | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: tenant-a
  labels:
    tenant-id: tenant-a
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-01
  namespace: tenant-a
spec:
  running: true
  template:
    spec:
      domain:
        cpu:
          cores: 1
        memory:
          guest: 1Gi
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinit
              disk:
                bus: virtio
          interfaces:
            - name: default
              masquerade: {}
      networks:
        - name: default
          pod: {}
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/cirros-container-disk-demo
        - name: cloudinit
          cloudInitNoCloud:
            userData: |
              #cloud-config
              hostname: vm-01
              password: VM@2026
              chpasswd:
                expire: false
              ssh_pwauth: true
YAML
# 验证虚拟机
kubectl -n tenant-a get vm
kubectl -n tenant-a get vmi

# 进入控制台
virtctl -n tenant-a console vm-01

# 生命周期管理
virtctl -n tenant-a start vm-01
virtctl -n tenant-a stop vm-01
virtctl -n tenant-a restart vm-01
virtctl -n tenant-a migrate vm-01

八、网络层:Kube-OVN(VPC/子网/浮动IP)

⚠️ Kube-OVN 替代 Calico,提供 VPC 级网络隔离。如果资源紧张可先跳过,继续用 Calico。

8.1 卸载 Calico

kubectl delete -f https://raw.githubusercontent.com/projectcalico/\
  calico/v3.26.4/manifests/calico.yaml

8.2 安装 Kube-OVN

wget https://raw.githubusercontent.com/kubeovn/kube-ovn/\
release-1.12/dist/images/install.sh

sed -i 's/MASTER_NODES=""/MASTER_NODES="iaas-1,iaas-2,iaas-3"/' install.sh
sed -i 's/POD_CIDR="10.16.0.0\/16"/POD_CIDR="10.244.0.0\/16"/' install.sh
sed -i 's/SVC_CIDR="10.96.0.0\/12"/SVC_CIDR="10.96.0.0\/12"/' install.sh

bash install.sh

kubectl -n kube-system get pods | grep kube-ovn

8.3 创建 VPC(租户网络隔离)

cat << 'YAML' | kubectl apply -f -
apiVersion: kubeovn.io/v1
kind: Vpc
metadata:
  name: tenant-a-vpc
spec:
  namespaces:
    - tenant-a
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: tenant-a-subnet
spec:
  vpc: tenant-a-vpc
  protocol: IPv4
  cidrBlock: 172.16.1.0/24
  gateway: 172.16.1.1
  namespaces:
    - tenant-a
  excludeIps:
    - 172.16.1.1
---
apiVersion: kubeovn.io/v1
kind: Vpc
metadata:
  name: tenant-b-vpc
spec:
  namespaces:
    - tenant-b
---
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: tenant-b-subnet
spec:
  vpc: tenant-b-vpc
  protocol: IPv4
  cidrBlock: 172.16.2.0/24
  gateway: 172.16.2.1
  namespaces:
    - tenant-b
  excludeIps:
    - 172.16.2.1
YAML
kubectl get vpc
kubectl get subnet

8.4 浮动 IP

cat << 'YAML' | kubectl apply -f -
apiVersion: kubeovn.io/v1
kind: IptablesFloatingIP
metadata:
  name: tenant-a-eip-01
spec:
  internalIp: 172.16.1.10
  externalIp: 10.10.10.200
YAML

九、多云纳管:Karmada(CNCF 孵化)

这是多云异构纳管的核心组件,在 iaas-1 上部署 Karmada 控制面

9.1 安装 karmadactl CLI

curl -sL https://raw.githubusercontent.com/karmada-io/karmada/\
master/hack/install-cli.sh | bash

karmadactl version

9.2 安装 Karmada 控制面

# 方式1:karmadactl init(推荐)
sudo karmadactl init \
  --karmada-apiserver-advertise-address=10.10.10.121 \
  --etcd-storage-mode=hostPath \
  --karmada-data=/var/lib/karmada

# 方式2:Helm 安装
helm repo add karmada https://raw.githubusercontent.com/\
karmada-io/karmada/master/charts
helm repo update

helm install karmada karmada/karmada \
  --namespace karmada-system --create-namespace \
  --set apiServer.hostNetwork=true \
  --set apiServer.serviceType=NodePort
# 验证
kubectl --kubeconfig /etc/karmada/karmada-apiserver.config \
  get clusters

# 设置 alias 方便操作
echo 'alias kctl="kubectl --kubeconfig /etc/karmada/karmada-apiserver.config"' \
  >> ~/.bashrc
source ~/.bashrc

9.3 注册自建集群(Host Cluster)

karmadactl --kubeconfig /etc/karmada/karmada-apiserver.config \
  join iaas-cluster \
  --cluster-kubeconfig=$HOME/.kube/config \
  --cluster-context=kubernetes-admin@kubernetes

# 验证
kctl get clusters
# 预期看到:iaas-cluster   Ready

9.4 注册阿里云 ACK 集群

# 前提:已在阿里云创建 ACK 集群,下载 kubeconfig 到本地
# 保存为 ~/.kube/aliyun-ack.config

karmadactl --kubeconfig /etc/karmada/karmada-apiserver.config \
  join aliyun-cluster \
  --cluster-kubeconfig=~/.kube/aliyun-ack.config

kctl get clusters
# 预期:iaas-cluster   Ready
#       aliyun-cluster Ready

9.5 注册华为云 CCE 集群

# 前提:已在华为云创建 CCE 集群,下载 kubeconfig
# 保存为 ~/.kube/huawei-cce.config

karmadactl --kubeconfig /etc/karmada/karmada-apiserver.config \
  join huawei-cluster \
  --cluster-kubeconfig=~/.kube/huawei-cce.config

kctl get clusters
# 预期:iaas-cluster    Ready
#       aliyun-cluster  Ready
#       huawei-cluster  Ready

9.6 多云调度策略

# 策略1:按租户级别调度(金融客户 → 自建,普通客户 → 公有云)
cat << 'YAML' | kctl apply -f -
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: tenant-finance-policy
  namespace: tenant-a
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
  placement:
    clusterAffinity:
      clusterNames:
        - iaas-cluster
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: tenant-normal-policy
  namespace: tenant-b
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
  placement:
    clusterAffinity:
      clusterNames:
        - aliyun-cluster
        - huawei-cluster
YAML
# 策略2:跨云容灾(主自建 + 备阿里云)
cat << 'YAML' | kctl apply -f -
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: disaster-recovery-policy
  namespace: production
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: core-service
  placement:
    clusterAffinity:
      clusterNames:
        - iaas-cluster
        - aliyun-cluster
    replicaScheduling:
      replicaSchedulingType: Divided
      weightPreference:
        staticWeightList:
          - targetCluster:
              clusterNames: [iaas-cluster]
            weight: 3
          - targetCluster:
              clusterNames: [aliyun-cluster]
            weight: 1
YAML
# 策略3:故障自动切换
cat << 'YAML' | kctl apply -f -
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: failover-policy
  namespace: production
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: core-service
  failover:
    clusterFailover:
      decisionConditions:
        tolerationSeconds: 30
      purgeMode: Graciously
  placement:
    clusterAffinity:
      clusterNames:
        - iaas-cluster
        - aliyun-cluster
        - huawei-cluster
YAML

十、多云集群生命周期:Cluster API(CNCF 孵化)

10.1 安装 clusterctl

curl -L https://github.com/kubernetes-sigs/cluster-api/releases/\
latest/download/clusterctl-linux-amd64 -o /usr/local/bin/clusterctl

chmod +x /usr/local/bin/clusterctl
clusterctl version

10.2 初始化 Cluster API

# 初始化核心组件
clusterctl init

# 安装各云 Provider(按需选择)
# OpenStack
clusterctl init --infrastructure=openstack

# AWS
clusterctl init --infrastructure=aws

# Azure
clusterctl init --infrastructure=azure

10.3 声明式创建公有云 K8s 集群

# 示例:在 OpenStack 上创建 K8s 集群
cat << 'YAML' | kubectl apply -f -
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: openstack-prod-cluster
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.128.0.0/12"]
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: openstack-prod-cp
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: OpenStackCluster
    name: openstack-prod
YAML
# 查看集群状态
kubectl get clusters
clusterctl describe cluster openstack-prod-cluster

# 创建完成后自动注册到 Karmada
karmadactl join openstack-cluster \
  --kubeconfig /etc/karmada/karmada-apiserver.config \
  --cluster-kubeconfig=<generated-kubeconfig>

十一、跨集群网络:Submariner(CNCF sandbox)

11.1 安装 subctl CLI

curl -Ls https://get.submariner.io | VERSION=v0.17.0 bash

subctl version

11.2 部署 Broker(iaas-1 上)

subctl deploy-broker --kubeconfig $HOME/.kube/config

# 会生成 broker-info.subm 文件
ls broker-info.subm

11.3 各集群加入 Submariner

# 自建集群加入
subctl join --kubeconfig $HOME/.kube/config \
  broker-info.subm \
  --clusterid iaas-cluster \
  --natt=false

# 阿里云集群加入
subctl join --kubeconfig ~/.kube/aliyun-ack.config \
  broker-info.subm \
  --clusterid aliyun-cluster

# 华为云集群加入
subctl join --kubeconfig ~/.kube/huawei-cce.config \
  broker-info.subm \
  --clusterid huawei-cluster

11.4 验证跨集群网络

# 查看所有连接
subctl show all

# 验证跨集群 DNS
# 在自建集群的 Pod 里可以直接访问阿里云集群的 Service:
# curl http://my-service.my-namespace.svc.clusterset.local

# 跨集群连通性测试
subctl verify --kubeconfig $HOME/.kube/config \
  --toconfig ~/.kube/aliyun-ack.config \
  --only connectivity

11.5 跨集群 Service 导出

# 把自建集群的 MySQL 导出,让其他集群可以访问
cat << 'YAML' | kubectl apply -f -
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: mysql
  namespace: apaas
YAML

# 在阿里云集群中可以直接访问:
# mysql -h mysql.apaas.svc.clusterset.local -P 3306

十二、多网异构纳管

12.1 架构说明

不同集群使用不同 CNI,通过 Karmada 按网络需求调度:

┌─────────────────────────────────────────────────────┐
│                Karmada 联邦调度                       │
├──────────────┬──────────────┬───────────────────────┤
│  自建集群      │  高性能集群    │  公有云集群            │
│  Kube-OVN     │  Cilium      │  Flannel/云厂商 CNI   │
│  VPC 隔离     │  eBPF 加速    │  托管网络             │
│  确定性网络    │  微秒级延迟    │  通用业务             │
│              │              │                      │
│  适用场景:    │  适用场景:    │  适用场景:            │
│  金融/电力    │  AI推理/5G    │  OA/CRM/普通SaaS     │
├──────────────┴──────────────┴───────────────────────┤
│          Submariner 跨集群网络互通                     │
└─────────────────────────────────────────────────────┘

12.2 按网络需求调度策略

cat << 'YAML' | kctl apply -f -
# 确定性网络业务 → Kube-OVN 集群
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: detnet-policy
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      labelSelector:
        matchLabels:
          network-type: deterministic
  placement:
    clusterAffinity:
      clusterNames:
        - iaas-cluster
---
# 高性能网络业务 → Cilium 集群
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: highperf-policy
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      labelSelector:
        matchLabels:
          network-type: high-performance
  placement:
    clusterAffinity:
      clusterNames:
        - cilium-cluster
---
# 普通业务 → 公有云集群
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: general-policy
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      labelSelector:
        matchLabels:
          network-type: general
  placement:
    clusterAffinity:
      clusterNames:
        - aliyun-cluster
        - huawei-cluster
YAML

12.3 业务部署示例

# 电力调度系统 — 需要确定性网络
cat << 'YAML' | kctl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: power-dispatch
  namespace: production
  labels:
    network-type: deterministic
spec:
  replicas: 3
  selector:
    matchLabels:
      app: power-dispatch
  template:
    metadata:
      labels:
        app: power-dispatch
        network-type: deterministic
    spec:
      containers:
        - name: dispatch
          image: harbor.local/iaas/power-dispatch:v1.0
          resources:
            requests: { cpu: '2', memory: 4Gi }
YAML
# Karmada 自动调度到 iaas-cluster(Kube-OVN 确定性网络)

十三、镜像层:Harbor(跨集群镜像复制)

13.1 安装 Harbor

helm repo add harbor https://helm.goharbor.io
helm repo update

helm install harbor harbor/harbor \
  --namespace harbor --create-namespace \
  --set expose.type=nodePort \
  --set expose.nodePort.ports.http.nodePort=30003 \
  --set externalURL=http://10.10.10.121:30003 \
  --set persistence.persistentVolumeClaim.registry.size=50Gi \
  --set persistence.persistentVolumeClaim.database.size=5Gi \
  --set persistence.persistentVolumeClaim.redis.size=2Gi \
  --set harborAdminPassword=Harbor@2026

kubectl -n harbor get pods -w

Harbor UI: http://10.10.10.121:30003 (admin / Harbor@2026)

13.2 配置跨集群镜像复制

Harbor UI → 仓库管理 → 新建目标 →
  提供者:Harbor
  目标名:aliyun-acr
  目标URL:https://registry.cn-hangzhou.aliyuncs.com
  访问凭证:阿里云 ACR 账号密码

Harbor UI → 复制管理 → 新建规则 →
  名称:sync-to-aliyun
  复制模式:Push-based
  源:harbor.local/iaas/**
  目标:aliyun-acr
  触发模式:事件驱动(推送时自动同步)

自建集群 push 镜像后,Harbor 自动同步到阿里云 ACR,公有云集群直接从 ACR 拉取


十四、安全层

14.1 OPA Gatekeeper

helm install gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system --create-namespace \
  --set replicas=1

多云安全策略:

cat << 'YAML' | kubectl apply -f -
# 强制所有 namespace 有 tenant-id 标签
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              items: { type: string }
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels
        violation[{"msg": msg}] {
          provided := {l | input.review.object.metadata.labels[l]}
          required := {l | l := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("Missing: %v", [missing])
        }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-tenant-label
spec:
  match:
    kinds:
      - apiGroups: ['']
        kinds: [Namespace]
    namespaceSelector:
      matchLabels:
        apaas-managed: 'true'
  parameters:
    labels: ['tenant-id']
YAML

14.2 Falco(运行时安全)

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
  --namespace falco --create-namespace \
  --set falcosidekick.enabled=true

14.3 cert-manager

helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set installCRDs=true \
  --set replicas=1

十五、可观测层:跨集群统一监控

15.1 Prometheus + Grafana(每个集群部署)

helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.adminPassword=admin2026 \
  --set grafana.service.type=NodePort \
  --set grafana.service.nodePort=31301 \
  --set prometheus.service.type=NodePort \
  --set prometheus.service.nodePort=31090 \
  --set prometheus.prometheusSpec.retention=3d \
  --set prometheus.prometheusSpec.resources.requests.memory=512Mi

Grafana: http://10.10.10.121:31301 (admin / admin2026)

Prometheus: http://10.10.10.121:31090

15.2 Thanos — 跨集群指标聚合(CNCF 孵化)

helm repo add bitnami https://charts.bitnami.com/bitnami

# 在 iaas-1 部署 Thanos Query(聚合多个集群的 Prometheus)
helm install thanos bitnami/thanos \
  --namespace monitoring \
  --set query.enabled=true \
  --set query.service.type=NodePort \
  --set query.service.nodePort=31191 \
  --set query.stores[0]=prometheus-kube-prometheus-prometheus.monitoring.svc:10901 \
  --set queryFrontend.enabled=true \
  --set compactor.enabled=false \
  --set storegateway.enabled=false
# 每个成员集群的 Prometheus 开启 Sidecar
# 在 Helm values 中添加:
# prometheus.prometheusSpec.thanos:
#   objectStorageConfig:
#     name: thanos-objstore-secret
#     key: objstore.yml

# Thanos Query 自动聚合所有集群的指标
# 在 Thanos Query UI 中可以:
# - 查看所有集群的统一视图
# - 跨集群 PromQL 查询

Thanos Query: http://10.10.10.121:31191

15.3 KubeVirt Dashboard

Grafana UI → Dashboards → Import → ID: 11748
→ 可以看到所有 VM 的 CPU、内存、网络、磁盘指标

十六、API 网关

helm install emissary datawire/emissary-ingress \
  --namespace emissary --create-namespace \
  --set replicaCount=1 \
  --set service.type=NodePort \
  --set service.nodePorts.http=30080

十七、验证清单

# 检查项 验证命令 预期
1 K8s 节点 kubectl get nodes 3 Ready
2 VIP ping 10.10.10.198
3 HAProxy curl http://10.10.10.198:9090/stats 200
4 CNI kubectl -n kube-system get pods \| grep calico Running
5 Longhorn kubectl -n longhorn-system get pods Running
6 KubeVirt kubectl -n kubevirt get pods Running
7 CDI kubectl -n cdi get pods Running
8 创建VM kubectl -n tenant-a get vmi Running
9 Karmada kctl get clusters 集群 Ready
10 Submariner subctl show all Connected
11 Kube-OVN kubectl get vpc VPC 列表
12 Harbor curl http://10.10.10.121:30003 200
13 OPA kubectl -n gatekeeper-system get pods Running
14 Falco kubectl -n falco get pods Running
15 cert-manager kubectl -n cert-manager get pods Running
16 Grafana curl http://10.10.10.121:31301 200
17 Prometheus curl http://10.10.10.121:31090 200
18 Thanos curl http://10.10.10.121:31191 200
19 Emissary kubectl -n emissary get pods Running

十八、访问入口汇总

服务 URL 账号 密码
Longhorn UI http://10.10.10.121:30800 - -
Harbor http://10.10.10.121:30003 admin Harbor@2026
Grafana http://10.10.10.121:31301 admin admin2026
Prometheus http://10.10.10.121:31090 - -
Thanos Query http://10.10.10.121:31191 - -
HAProxy Stats http://10.10.10.198:9090/stats - -
Emissary 网关 http://10.10.10.121:30080 - -

十九、镜像拉取问题处理

国内环境拉不下来 Docker Hub 镜像时的通用方案:

方案1:华为云镜像源

# 模板
sudo ctr -n k8s.io image pull \
  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/<镜像名>:<tag>
sudo ctr -n k8s.io image tag \
  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/<镜像名>:<tag> \
  docker.io/<镜像名>:<tag>

方案2:Mac 上拉取再传

# Mac 上(有 VPN)
docker pull <镜像名>:<tag>
docker save <镜像名>:<tag> -o image.tar
scp image.tar root@10.10.10.121:~/

# 节点上导入
sudo ctr -n k8s.io image import ~/image.tar

方案3:批量同步脚本

#!/bin/bash
# sync-images.sh — 在 Mac 上执行
IMAGES=(
  "longhornio/longhorn-manager:v1.6.2"
  "longhornio/longhorn-engine:v1.6.2"
  "quay.io/kubevirt/virt-operator:v1.3.1"
  "quay.io/kubevirt/virt-api:v1.3.1"
  "quay.io/kubevirt/virt-controller:v1.3.1"
  "quay.io/kubevirt/virt-handler:v1.3.1"
  "quay.io/kubevirt/virt-launcher:v1.3.1"
)

for img in "${IMAGES[@]}"; do
  docker pull $img
done

docker save ${IMAGES[@]} -o all-images.tar

for node in 10.10.10.121 10.10.10.122 10.10.10.123; do
  scp all-images.tar root@$node:~/
  ssh root@$node 'sudo ctr -n k8s.io image import ~/all-images.tar'
done

二十、部署顺序与耗时估算

# 步骤 耗时 优先级 章节
1 系统初始化 + containerd + kubeadm 30min P0 第三章
2 HAProxy + Keepalived 15min P0 第四章
3 kubeadm init + join + Calico 20min P0 第五章
4 Longhorn 存储 10min P0 第六章
5 KubeVirt + CDI + 创建VM 15min P0 第七章
6 Kube-OVN(替代 Calico) 20min P1 第八章
7 Karmada 多云纳管 20min P1 第九章
8 Cluster API 多云集群管理 15min P1 第十章
9 Submariner 跨集群网络 15min P1 第十一章
10 Harbor 镜像仓库 10min P1 第十三章
11 OPA + Falco + cert-manager 10min P2 第十四章
12 Prometheus + Grafana + Thanos 15min P2 第十五章
13 Emissary 网关 5min P2 第十六章
  总计 约 3.5h    
  • P0 = 最小可用 IaaS(K8s + 存储 + KubeVirt + 创建VM)
  • P1 = 多云多网能力(Karmada + Submariner + Kube-OVN + Harbor)
  • P2 = 可观测 + 安全 + 网关

二十一、与 OpenStack 功能对标

OpenStack 本方案 多云增强
Nova(计算) KubeVirt ✅ Karmada 跨云 VM 调度
Cinder(块存储) Longhorn ✅ 跨集群 PVC 迁移
Neutron(网络) Kube-OVN ✅ Submariner 跨云网络互通
Glance(镜像) Harbor + CDI ✅ Harbor 跨云镜像复制
Keystone(认证) OPA + K8s RBAC ✅ Karmada 统一 RBAC
Horizon(控制台) Grafana + Longhorn UI ✅ Thanos 跨云统一监控
Heat(编排) Helm + Karmada ✅ 一次定义多云分发
Ceilometer(监控) Prometheus + Thanos ✅ 跨云指标聚合
Octavia(负载均衡) Emissary ✅ 跨云流量管理
Magnum(集群管理) Cluster API ✅ 声明式创建多云集群
— (无此能力) Karmada ✅ 多云联邦调度(OpenStack 没有)
— (无此能力) Submariner ✅ 跨云网络互通(OpenStack 没有)

二十二、多云多网业务场景总结

场景 网络需求 调度目标 核心组件
电力调度系统 确定性低延迟 自建 Kube-OVN 集群 Karmada + Kube-OVN
AI 推理服务 eBPF 高吞吐 Cilium 集群 Karmada + Cilium
企业 OA/CRM 普通网络 公有云集群 Karmada + 云厂商 CNI
金融客户独享 VPC 隔离 自建集群 Kube-OVN VPC
跨云容灾 跨集群互通 主自建 + 备公有云 Submariner + Karmada Failover
多云镜像分发 自动同步 Harbor 镜像复制
跨云统一监控 指标聚合 Thanos + Grafana


评论:


技术文章推送

手机、电脑实用软件分享

微信搜索公众号: AndrewYG的算法世界
wechat 微信公众号:AndrewYG的算法世界

热门文章