技术文档

文档


clickhouse集群搭建

clickhouse集群搭建

ck服务配置

  • ch_id保证唯一,用于区分clickhouse服务,需要多少个服务执行多少次
ch_id=1;

helm template /media/app-audit/helm-charts/component/clickhouse-1.3.0.tgz --set-string cfg.namespace=app-audit --set-string cfg.prod_name=app-audit --set-string cfg.host_opt_base_dir=/opt/dsd/ --set-string cfg.dns_domain=qzprod --set-string cfg.cluster_regi_url=registry.common.svc.qzprod:8092/ --set cfg.enabled.secret=true --set-string resources.limits.cpu=1000m --set-string resources.requests.cpu=100m --set-string cfg.clickhouse.max_memory_usage=2G --set cfg.name.svc=clickhouse-server-${ch_id} --set cfg.name.hl_svc=clickhouse-headless-service-${ch_id} --set cfg.name.deploy=clickhouse-app-${ch_id} --set cfg.name.pod=clickhouse-app-${ch_id} --set cfg.name.sn=tabix-secret-${ch_id} --set cfg.name.ing=clickhouse-ingress-${ch_id} --set cfg.name.cm=clickhouse-conf-${ch_id} --set cfg.name.initdb=init-clickhouse-db-${ch_id} --set cfg.name.metrica=clickhouse-metrica-${ch_id} --set nodeSelector.clickhouse=${ch_id} --set cfg.labels.pod.ins=clickhouse-${ch_id} --set cfg.labels.svc.app=clickhouse-${ch_id} --set cfg.labels.deploy.app=clickhouse-${ch_id} --set cfg.labels.hl_svc.app=clickhouse-${ch_id} | akapply -f -
ch_id=2;

helm template /media/app-audit/helm-charts/component/clickhouse-1.3.0.tgz --set-string cfg.namespace=app-audit --set-string cfg.prod_name=app-audit --set-string cfg.host_opt_base_dir=/opt/dsd/ --set-string cfg.dns_domain=qzprod --set-string cfg.cluster_regi_url=registry.common.svc.qzprod:8092/ --set cfg.enabled.secret=true --set-string resources.limits.cpu=1000m --set-string resources.requests.cpu=100m --set-string cfg.clickhouse.max_memory_usage=2G --set cfg.name.svc=clickhouse-server-${ch_id} --set cfg.name.hl_svc=clickhouse-headless-service-${ch_id} --set cfg.name.deploy=clickhouse-app-${ch_id} --set cfg.name.pod=clickhouse-app-${ch_id} --set cfg.name.sn=tabix-secret-${ch_id} --set cfg.name.ing=clickhouse-ingress-${ch_id} --set cfg.name.cm=clickhouse-conf-${ch_id} --set cfg.name.initdb=init-clickhouse-db-${ch_id} --set cfg.name.metrica=clickhouse-metrica-${ch_id} --set nodeSelector.clickhouse=${ch_id} --set cfg.labels.pod.ins=clickhouse-${ch_id} --set cfg.labels.svc.app=clickhouse-${ch_id} --set cfg.labels.deploy.app=clickhouse-${ch_id} --set cfg.labels.hl_svc.app=clickhouse-${ch_id} | akapply -f -
  • 查看clickhouse-server
 akget svc | grep clickhouse

给节点打标签

# 根据kafka部署数量,给节点打上选择标签
# clickhouse-app-1 则节点选择器为clickhouse=1,clickhouse-app-2 则节点选择器为clickhouse=2
kubectl label node 节点1  clickhouse=1 --overwrite
kubectl label node 节点2  clickhouse=2 --overwrite
...
  • 查看标签和节点是否对应
 akget po -o wide | grep clickhouse

生成通用配置

akget cm clickhouse-conf-1 -o yaml > /tmp/clickhouse-cluster-conf.yaml
  • 编辑yaml
 vi /tem/clickhouse-cluster-conf.yaml
apiVersion: v1
data:
  config.xml: "<?xml version=\"1.0\"?>\n<yandex>\n    <path>/var/lib/clickhouse/data/</path>\n
    \   <tmp_path>/var/lib/clickhouse/data/tmp/</tmp_path>\n    <user_files_path>/var/lib/clickhouse/data/user_files/</user_files_path>\n
    \   <format_schema_path>/var/lib/clickhouse/data/format_schemas/</format_schema_path>\n
    \   \n    <include_from>/etc/clickhouse-server/metrica.d/metrica.xml</include_from>\n\n
    \   <users_config>users.xml</users_config>\n    \n    <display_name>clickhouse</display_name>\n
    \   <listen_host>0.0.0.0</listen_host>\n    <http_port>8123</http_port>\n    <tcp_port>9000</tcp_port>\n
    \   <interserver_http_port>9009</interserver_http_port>\n    <max_connections>4096</max_connections>\n
    \   <keep_alive_timeout>28800</keep_alive_timeout>\n    <max_concurrent_queries>4096</max_concurrent_queries>\n
    \   <uncompressed_cache_size>8589934592</uncompressed_cache_size>\n    <mark_cache_size>5368709120</mark_cache_size>\n
    \   <timezone>Asia/Shanghai</timezone>\n    <umask>022</umask>\n    <mlock_executable>false</mlock_executable>\n
    \   <remote_servers incl=\"clickhouse_remote_servers\" optional=\"true\" />\n
    \   <zookeeper incl=\"zookeeper-servers\" optional=\"true\" />\n    <macros incl=\"macros\"
    optional=\"true\" />\n    <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>\n
    \   <max_session_timeout>36000</max_session_timeout>\n    <default_session_timeout>6000</default_session_timeout>\n
    \   <disable_internal_dns_cache>1</disable_internal_dns_cache>\n    <merge_tree>\n
    \       <parts_to_delay_insert>300</parts_to_delay_insert>\n        <parts_to_throw_insert>900</parts_to_throw_insert>\n
    \   </merge_tree>\n    <query_log>\n        <database>system</database>\n        <table>query_log</table>\n
    \       <partition_by>toYYYYMM(event_date)</partition_by>\n        <flush_interval_milliseconds>7500</flush_interval_milliseconds>\n
    \   </query_log>\n\n    <query_thread_log>\n        <database>system</database>\n
    \       <table>query_thread_log</table>\n        <partition_by>toYYYYMM(event_date)</partition_by>\n
    \       <flush_interval_milliseconds>7500</flush_interval_milliseconds>\n    </query_thread_log>\n
    \   <max_table_size_to_drop>0</max_table_size_to_drop>\n    <max_partition_size_to_drop>0</max_partition_size_to_drop>\n
    \   <distributed_ddl>\n        <path>/clickhouse/task_queue/ddl</path>\n    </distributed_ddl>\n
    \   <logger>\n        <level>warning</level>\n        <log>/var/lib/clickhouse/log//clickhouse-server.log</log>\n
    \       <errorlog>/var/lib/clickhouse/log//clickhouse-server.err.log</errorlog>\n
    \       <size>500M</size>\n        <count>10</count>\n    </logger>\n    <max_open_files>2.62144e+06</max_open_files>\n</yandex>"
  remote_servers.xml: |-
    <yandex>
      <include_from>/etc/clickhouse-server/metrica.d/metrica.xml</include_from>
      <remote_servers incl="clickhouse_remote_servers">
        <clicks_cluster>   <!-- 集群自定义名称  -->
            <shard>  <!-- 分片,多个分片写多个shard  -->
              <internal_replication>false</internal_replication>   <!-- 默认false,true表示insert分布式表时,会在分片的所有副本都写入一份  -->
              <replica>     <!-- 副本,没个分片中可以有多个副本,每个副本用replica标签包含  -->
                  <host>clickhouse-app-1-0.clickhouse-headless-service-1.app-audit.svc.qzprod</host>
                  <user>audit</user>    <!-- user  -->
                  <password>f4b858bd47d88e99f0b7ad6adb4d2631ce28370929f1891179eb2186c7dd0a10</password>  <!-- 密码  -->
                  <port>9000</port>   <!-- 端口  -->
              </replica>
             </shard>
             <shard>
              <internal_replication>false</internal_replication>
              <replica>
                  <host>clickhouse-app-2-0.clickhouse-headless-service-2.app-audit.svc.qzprod</host>
                  <user>audit</user>
                  <password>f4b858bd47d88e99f0b7ad6adb4d2631ce28370929f1891179eb2186c7dd0a10</password>
                  <port>9000</port>
              </replica>
            </shard>
        </clicks_cluster>
      </remote_servers>
      <listen_host>0.0.0.0</listen_host>
      <path>/var/lib/clickhouse/data/</path>
      <tmp_path>/var/lib/clickhouse/data/tmp/</tmp_path>
      <user_files_path>/var/lib/clickhouse/data/user_files/</user_files_path>
      <logger>
          <level>debug</level>
          <log>/var/lib/clickhouse/log/clickhouse-server.log</log>
          <errorlog>/var/lib/clickhouse/log/clickhouse-server.err.log</errorlog>
          <console>1</console>
      </logger>
      <compression incl="clickhouse_compression">
          <case>
              <method>lz4</method>
          </case>
      </compression>
      <distributed_ddl>
          <path>/clickhouse/task_queue/ddl</path>
      </distributed_ddl>
      <macros incl="macros">
      </macros>
      <zookeeper incl="zookeeper-servers">
        <node index="1">   <!-- zk节点,可配置多个,用多个node标签,index唯一  -->
            <host>zookeeper-server</host>   <!-- host  -->
            <port>2182</port>  <!-- 端口  -->
        </node>
      </zookeeper>
    </yandex>
kind: ConfigMap
metadata:
  annotations:
    chart_version: 1.3.0
    git-branch: charts-clickhouse-1.3.0
    git-commit_hash: 3266d024f6fb59bb4d6b5f11af0536808a5c985c
    git-committer_date: "2021-10-26T11:01:11+08:00"
    git-committer_email: 2037461051@qq.com
    git-committer_name: shihuai
    git-repository: git@gitee.com:QuanZhiKeJi/olive-charts.git
    git-revision: 3266d024f6fb59bb4d6b5f11af0536808a5c985c
  labels:
    app: clickhouse-1
  name: clickhouse-cluster-conf
  namespace: app-audit
  • 应用配置
    akapply -f /tmp/clickhouse-cluster-conf.yaml
  • 查看配置,出现clickhouse-cluster-conf即为成功
    akget cm

修改所有 clickhouse-metrica 修改配置中的macros,macros 唯一,每个分片独有一个macros

 akedit cm clickhouse-metrica-1(配置名称)
     <macros>
        <shard>01</shard>
        <replica>clickhouse-app-1-0.clickhouse-headless-service-1.app-audit.svc.qzprod</replica>
     </macros>
    </yandex>

修改所有 sts

akedit sts clickhouse-app-1

修改完成所有配置 重启所有服务

 akdelete pod clickhouse-app-1-0(node)

验证

  • 创建数据库
    create database dbName on cluster clicks_cluster;
  • 创建本地表

    CREATE TABLE dbName.test_local on cluster clicks_cluster (`id` String) ENGINE = ReplicatedMergeTree PARTITION BY id ORDER BY (id) SETTINGS index_granularity = 8192
  • 创建分布式表

    create table dbName.test_all on cluster clicks_cluster as dbName.test_local ENGINE = Distributed(clicks_cluster,dbName,test_local,rand())
  • 插入数据

    insert into dbName.test_local values('1');
  • 查询分布式表,正确结果会显示所有本地表数据汇总
    select * from dbName.test_all

分片/副本介绍

  • shard 代表分片,replica代表副本
下面代表1分片0副本
<shard>
      <replica>
      <replica>
</shard>
下面代表2分片1副本
<shard>
      <replica>
      <replica>
      <replica>
      <replica>
</shard>
<shard>
      <replica>
      <replica>
      <replica>
      <replica>
</shard>
  • replica 标签代表副本,一个分片中有N个副本就会有N个replica

    • internal_replication 为true时会自动在分片内的所有副本都写入一边
    • default_database 代表默认数据库
    • host clickhouse所在服务ip地址映射
    • port 端口号
    • user 数据库用户名
    • password 数据库密码
  • zookeeper配置

    • host: zk所在服务器ip
    • port:端口号
  • macros配置
    • shard 分片
    • replica 副本

新增/减少集群配置

新增 分片/副本

  • 新增clickhouse服务
  • 修改clickhouse-cluster-conf,添加新的shard(分片)/replica(副本)标签,修改配置clickhouse-metrica,修改sts clickhouse-app

  • 重启所有服务
  • akexec -it clickhouse-app-1-0(服务名称) bash
  • clickhouse-client 进入clickhouse客户端
  • 输入sql:select * from system.clusters,出现下列配置即为配置生效

减少 分片/副本

  • 在clichouse-cluster-conf中删除对应的shard(分片)/replica(副本)
  • 重启所有服务
  • akexec -it clickhouse-app-1-0(服务名称) bash
  • clickhouse-client 进入clickhouse客户端
  • 输入sql:select * from system.clusters,查看配置数量及对应host_name是否正确

页面列表

ITEM_HTML