Greenplum 计算能力估算

1.如何评估Greenplum master 空间以及segment元数据占用的空间?

Greenplum master节点是用来存储元数据的,包括: 序列,表,临时表,分区,函数,视图,类型,操作符,规则,触发器 等。

segment 上也会存储部分元数据,序列,表,临时表,函数,视图,类型,操作符,规则,触发器 等。

master比segment更多的信息包括:分布策略,分区表,以及一些特殊的配置元数据。

相关的表:

  • gp_distribution_policy
  • pg_partition
  • pg_partition_encoding
  • pg_partition_rule
  • pg_statistic

仅仅从元数据的角度来看,master比segment存储的信息略多一些,主要是表的分布策略和表分区的定义。

1.1.如何评估master的空间?

主要考虑几个因素 :

  1. 定义多少个对象
  • 序列对应的元表:pg_class , pg_statistic, pg_attribute 平均每个序列一条记录 10 万个序列,约占用30万条元数据。
  • 表对应的元表:pg_class (2), pg_statistic ( 64, only on master ) , pg_attribute ( 64 ) ,gp_distribution_policy (1)。(有变长字段,会新增TOAST元数据) 1000万张表(含分区表),约占用14亿条元数据。
  • 临时表对应的元表:pg_class (2), pg_statistic ( 64, only on master ) , pg_attribute ( 64 ) 。(有变长字段,会新增TOAST元数据) ,1万张临时表,约占用130万条元数据。
  • 分区:pg_partition (每个表1条), pg_partition_encoding (一般0), pg_partition_rule (每个分区表一条) ,2万主表,900万个分区表,约占用902万条元数据。
  • 函数:pg_proc (每个函数1条) ,10万函数,约占用10万条元数据。
  • 视图:pg_class10万视图,约占用10万条元数据。
  • 类型:pg_type ,1万类型,约占用1万条元数据。
  • 操作符:pg_operator, pg_op… 1万操作符,约占用5万条元数据。
  • 规则:pg_rewrite ,1万规则,约占用1万条元数据。
  • 触发器:pg_trigger ,1万个触发器,约占用1万条元数据。
  1. 是否使用临时对象 临时表,会产生元数据,会话关闭后,自动释放,从而产生垃圾,可能导致元数据膨胀。
  2. 膨胀率 不断的新增,删除表。或修改字段定义。会导致元数据变化,可能导致元数据膨胀。 特别是存在长事务时,由于只能回收到该事务起点以前的事务产生的垃圾,这样容易造成垃圾积累。 假设膨胀率为30%,正常情况下比这个要少点。

1.2.如何推算master节点需要多少空间?

首先需要评估每个元表的平均记录大小, 单位字节:

代码语言:javascript
复制
postgres=# select relname,relkind,round((relpages::numeric*8*1024)/reltuples::numeric,2) from pg_class where relpages<>0 and reltuples<>0 and relkind='r' and reltuples>100 order by 1;
           relname           | relkind |  round  
-----------------------------+---------+---------
 gp_distribution_policy      | r       |   40.96
 gp_fastsequence             | r       |   47.63
 gp_persistent_relation_node | r       |   33.57
 gp_relation_node            | r       |   39.77
 pg_aggregate                | r       |   60.68
 pg_amop                     | r       |   29.20
 pg_amproc                   | r       |   31.51
 pg_appendonly               | r       |  163.84
 pg_attrdef                  | r       |  160.63
 pg_attribute                | r       |   93.85
 pg_attribute_encoding       | r       |   83.22
 pg_cast                     | r       |   30.57
 pg_class                    | r       |  137.23
 pg_constraint               | r       |  548.95
 pg_conversion               | r       |   62.06
 pg_depend                   | r       |   21.42
 pg_description              | r       |   17.75
 pg_index                    | r       |   77.14
 pg_inherits                 | r       |   42.67
 pg_opclass                  | r       |   58.10
 pg_operator                 | r       |   48.19
 pg_partition_rule           | r       |  341.33
 pg_proc                     | r       |   50.83
 pg_rewrite                  | r       | 1079.57
 pg_stat_last_operation      | r       |  138.51
 pg_statistic                | r       |   78.21
 pg_type                     | r       |   93.19
 pg_window                   | r       |   28.44
 sql_features                | r       |   25.24
 supplier                    | r       |   38.89

其次,需要告知在集群中有多少元数据。

假设用户需要在GP集群中创建 : 10万个序列,1000万张表(包含分区表),同时存在1万张临时表,10万函数,10万视图,1万自定义类型,1万自定义操作符,1万条规则,1万个触发器。

需要约14.1090亿条元数据,平均每条元数据假设200字节(实际可能更小,参考各个元表的relpages81024/reltuples 得到的一个参考值),约260GB。 算上膨胀率,Master约占用空间338GB空间。

segment的元数据大小评估: 需要扣除一下表中的数据:

代码语言:javascript
复制
gp_distribution_policy 
pg_partition 
pg_partition_encoding 
pg_partition_rule 
pg_statistic

上面的例子,约比master少7亿数据。约占170GB元数据空间。

2.Greenplum 计算能力估算 - 暨多大表需要分区,单个分区多大适宜

在数据仓库业务中,单表的数据量通常是非常巨大的。Greenplum在这方面做了很多的优化

  1. 支持列存储
  2. 支持向量计算
  3. 支持分区表
  4. 支持btree, bitmap, gist索引接口
  5. 执行并行计算
  6. 支持HASH JOIN

提高数据筛选的效率是一个较为低廉有效的优化手段,比如表分区。 但是分区是不是越多越好呢? 实际上分区过多也会引入导致优化器生成执行计划较慢,元数据过多,SYSCACHE过大等问题。 设置多大分区应该权衡影响,同时又要考虑计算能力。

2.1.单个SEGMENT多大数据量合适

GPDB是一个分布式数据库,执行一条复杂QUERY时,所有的SEGMENT可能并行参与计算。 那么最慢的SEGMENT就成为了整个SQL的瓶颈,单个SEGMENT多少记录合适呢? 可以做一个简单的测试,生成一份测试报告,以供参考。

创建数据库

代码语言:javascript
复制
postgres=# create database test;
CREATE DATABASE
postgres=#

创建3种常用字段类型,分别测试这几种类型的聚合统计能力,JOIN能力。

2.1.1.int8类型

代码语言:javascript
复制
postgres=# \c test gpadmin
You are now connected to database "test" as user "gpadmin".
test=#
test=# create temp table t1 (id int8) with (APPENDONLY=true, ORIENTATION=column);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
test=# create temp table t2 (id int8) with (APPENDONLY=true, ORIENTATION=column);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
test=# \timing
Timing is on.
test=# insert into t1 select generate_series(1,10000000000);
INSERT 0 10000000000
Time: 2706383.532 ms
test=#
test=# insert into t2 select * from t1;
INSERT 0 10000000000
Time: 2032211.224 ms
test=#

2.1.2.text类型

代码语言:javascript
复制
test=# create temp table tt1 (id text) with (APPENDONLY=true, ORIENTATION=column);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
Time: 1464.706 ms
test=# create temp table tt2 (id text) with (APPENDONLY=true, ORIENTATION=column); 
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
Time: 728.658 ms
test=# insert into tt1 select id from t1;  
INSERT 0 10000000000
Time: 2928301.434 ms
test=# 
test=# insert into tt2 select id from tt1;
INSERT 0 10000000000
Time: 3018062.771 ms
test=#

2.1.3.numeric类型

代码语言:javascript
复制
test=# create temp table ttt1 (id numeric) with (APPENDONLY=true, ORIENTATION=column); 
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
Time: 1001.412 ms
test=#  create temp table ttt2 (id numeric) with (APPENDONLY=true, ORIENTATION=column);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
Time: 849.584 ms
test=#  insert into ttt1 select id from t1;
INSERT 0 10000000000
Time: 3018062.771 ms
postgres=> insert into ttt2 select id from t1;  
INSERT 0 10000000000
Time: 3018062.771 ms

测试环境为单物理机(64线程机器),48个SEGMENT。1亿记录。

2.2.聚合查询

2.2.1.int8类型

代码语言:javascript
复制
postgres=> explain analyze select count(*),sum(id),avg(id),min(id),max(id),stddev(id) from t1;  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=2524480.55..2524480.56 rows=1 width=120)  
   Rows out:  1 rows with 0.002 ms to first row, 159 ms to end, start offset by 1.624 ms.  
   ->  Gather Motion 48:1  (slice1; segments: 48)  (cost=2524480.02..2524480.52 rows=1 width=120)  
         Rows out:  48 rows at destination with 221 ms to end, start offset by 1.626 ms.  
         ->  Aggregate  (cost=2524480.02..2524480.02 rows=1 width=120)  
               Rows out:  Avg 1.0 rows x 48 workers.  Max 1 rows (seg0) with 0.001 ms to first row, 134 ms to end, start offset by 10 ms.  
               ->  Append-only Columnar Scan on t1  (cost=0.00..1024480.00 rows=2083334 width=8)  
                     Rows out:  0 rows (seg0) with 10 ms to end, start offset by 37 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 315K bytes.  
   (slice1)    Executor memory: 378K bytes avg x 48 workers, 378K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 382.093 ms  
(16 rows)

2.2.2.text类型

代码语言:javascript
复制
postgres=> explain analyze select count(*),sum(t1.id::int8),avg(t1.id::int8),min(t1.id::int8),max(t1.id::int8),stddev(t1.id::int8) from tt1 t1;  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=2527178.57..2527178.58 rows=1 width=120)  
   Rows out:  1 rows with 0.003 ms to first row, 798 ms to end, start offset by 1.382 ms.  
   ->  Gather Motion 48:1  (slice1; segments: 48)  (cost=2527178.02..2527178.53 rows=1 width=120)  
         Rows out:  48 rows at destination with 1006 ms to end, start offset by 1.385 ms.  
         ->  Aggregate  (cost=2527178.02..2527178.04 rows=1 width=120)  
               Rows out:  Avg 1.0 rows x 48 workers.  Max 1 rows (seg0) with 0.003 ms to first row, 926 ms to end, start offset by 14 ms.  
               ->  Append-only Columnar Scan on tt1 t1  (cost=0.00..1027178.00 rows=2083334 width=8)  
                     Rows out:  0 rows (seg0) with 16 ms to end, start offset by 36 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 315K bytes.  
   (slice1)    Executor memory: 378K bytes avg x 48 workers, 378K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 1805.789 ms  
(16 rows)

2.2.3.numeric类型

代码语言:javascript
复制
postgres=> explain analyze select count(*),sum(id),avg(id),min(id),max(id),stddev(id) from ttt1;  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=2527512.55..2527512.56 rows=1 width=168)  
   Rows out:  1 rows with 0.001 ms to first row, 1712 ms to end, start offset by 1.292 ms.  
   ->  Gather Motion 48:1  (slice1; segments: 48)  (cost=2527512.02..2527512.52 rows=1 width=168)  
         Rows out:  48 rows at destination with 1926 ms to end, start offset by 1.293 ms.  
         ->  Aggregate  (cost=2527512.02..2527512.02 rows=1 width=168)  
               Rows out:  Avg 1.0 rows x 48 workers.  Max 1 rows (seg0) with 0 ms to first row, 1849 ms to end, start offset by 4.436 ms.  
               ->  Append-only Columnar Scan on ttt1  (cost=0.00..1027512.00 rows=2083334 width=8)  
                     Rows out:  0 rows (seg0) with 7.385 ms to end, start offset by 53 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 315K bytes.  
   (slice1)    Executor memory: 378K bytes avg x 48 workers, 378K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 3640.108 ms  
(16 rows)

2.3 JOIN 聚合

2.3.1.int8类型

代码语言:javascript
复制
    postgres=> explain analyze select count(*),sum(t1.id),avg(t1.id),min(t1.id),max(t1.id),stddev(t1.id) from t1 join t2 using (id);  
                                                                              QUERY PLAN                                                                            
    --------------------------------------------------------------------------------------------------------------------------------------------------------------  
     Aggregate  (cost=6689588.56..6689588.57 rows=1 width=120)  
       Rows out:  1 rows with 0.003 ms to first row, 908 ms to end, start offset by 1.505 ms.  
       ->  Gather Motion 48:1  (slice1; segments: 48)  (cost=6689588.02..6689588.52 rows=1 width=120)  
             Rows out:  48 rows at destination with 1517 ms to end, start offset by 1.508 ms.  
             ->  Aggregate  (cost=6689588.02..6689588.03 rows=1 width=120)  
                   Rows out:  Avg 1.0 rows x 48 workers.  Max 1 rows (seg0) with 0.002 ms to first row, 81 ms to end, start offset by 11 ms.  
                   ->  Hash Join  (cost=2372137.00..5189588.00 rows=2083334 width=8)  
                         Hash Cond: t1.id = t2.id  
                         Rows out:  Avg 2083333.3 rows x 48 workers.  Max 2083479 rows (seg42) with 0.013 ms to first row, 1359 ms to end, start offset by 38 ms.  
                         Executor memory:  65105K bytes avg, 65109K bytes max (seg42).  
                         Work_mem used:  65105K bytes avg, 65109K bytes max (seg42). Workfile: (0 spilling, 0 reused)  
                         ->  Append-only Columnar Scan on t1  (cost=0.00..1024480.00 rows=2083334 width=8)  
                               Rows out:  0 rows (seg0) with 0.003 ms to end, start offset by 38 ms.  
                         ->  Hash  (cost=1024480.00..1024480.00 rows=2083334 width=8)  
                               Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.  
                               ->  Append-only Columnar Scan on t2  (cost=0.00..1024480.00 rows=2083334 width=8)  
                                     Rows out:  0 rows (seg0) with 30 ms to end, start offset by 54 ms.  
     Slice statistics:  
       (slice0)    Executor memory: 315K bytes.  
       (slice1)    Executor memory: 378K bytes avg x 48 workers, 378K bytes max (seg0).  Work_mem: 65109K bytes max.  
     Statement statistics:  
       Memory used: 128000K bytes  
     Settings:  optimizer=off  
     Optimizer status: legacy query optimizer  
     Total runtime: 2426.790 ms  
    (25 rows)

2.3.2.text类型

代码语言:javascript
复制
postgres=> explain analyze select count(*),sum(t1.id::int8),avg(t1.id::int8),min(t1.id::int8),max(t1.id::int8),stddev(t1.id::int8) from tt1 t1 join tt2 using (id);  
                                                                         QUERY PLAN                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=6694984.57..6694984.58 rows=1 width=120)  
   Rows out:  1 rows with 0.001 ms to first row, 2068 ms to end, start offset by 1.423 ms.  
   ->  Gather Motion 48:1  (slice1; segments: 48)  (cost=6694984.02..6694984.53 rows=1 width=120)  
         Rows out:  48 rows at destination with 3169 ms to end, start offset by 1.425 ms.  
         ->  Aggregate  (cost=6694984.02..6694984.04 rows=1 width=120)  
               Rows out:  Avg 1.0 rows x 48 workers.  Max 1 rows (seg0) with 0.004 ms to first row, 1049 ms to end, start offset by 11 ms.  
               ->  Hash Join  (cost=2374835.00..5194984.00 rows=2083334 width=8)  
                     Hash Cond: t1.id = tt2.id  
                     Rows out:  Avg 2083333.3 rows x 48 workers.  Max 2084068 rows (seg4) with 0.012 ms to first row, 2240 ms to end, start offset by 60 ms.  
                     Executor memory:  65105K bytes avg, 65128K bytes max (seg4).  
                     Work_mem used:  65105K bytes avg, 65128K bytes max (seg4). Workfile: (0 spilling, 0 reused)  
                     ->  Append-only Columnar Scan on tt1 t1  (cost=0.00..1027178.00 rows=2083334 width=8)  
                           Rows out:  0 rows (seg0) with 0.003 ms to end, start offset by 11 ms.  
                     ->  Hash  (cost=1027178.00..1027178.00 rows=2083334 width=8)  
                           Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.  
                           ->  Append-only Columnar Scan on tt2  (cost=0.00..1027178.00 rows=2083334 width=8)  
                                 Rows out:  0 rows (seg0) with 37 ms to end, start offset by 43 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 315K bytes.  
   (slice1)    Executor memory: 378K bytes avg x 48 workers, 378K bytes max (seg0).  Work_mem: 65128K bytes max.  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 5249.571 ms  
(25 rows)

2.3.3.numeric类型

代码语言:javascript
复制
postgres=> explain analyze select count(*),sum(t1.id),avg(t1.id),min(t1.id),max(t1.id),stddev(t1.id) from ttt1 t1 join ttt2 using (id);  
                                                                          QUERY PLAN                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=6695652.56..6695652.57 rows=1 width=168)  
   Rows out:  1 rows with 0.003 ms to first row, 2661 ms to end, start offset by 1.406 ms.  
   ->  Gather Motion 48:1  (slice1; segments: 48)  (cost=6695652.02..6695652.52 rows=1 width=168)  
         Rows out:  48 rows at destination with 4696 ms to end, start offset by 1.409 ms.  
         ->  Aggregate  (cost=6695652.02..6695652.03 rows=1 width=168)  
               Rows out:  Avg 1.0 rows x 48 workers.  Max 1 rows (seg0) with 0.004 ms to first row, 2770 ms to end, start offset by 4.078 ms.  
               ->  Hash Join  (cost=2375169.00..5195652.00 rows=2083334 width=8)  
                     Hash Cond: t1.id = ttt2.id  
                     Rows out:  Avg 2083333.3 rows x 48 workers.  Max 2083627 rows (seg10) with 0.015 ms to first row, 3745 ms to end, start offset by 35 ms.  
                     Executor memory:  65105K bytes avg, 65114K bytes max (seg10).  
                     Work_mem used:  65105K bytes avg, 65114K bytes max (seg10). Workfile: (0 spilling, 0 reused)  
                     ->  Append-only Columnar Scan on ttt1 t1  (cost=0.00..1027512.00 rows=2083334 width=8)  
                           Rows out:  0 rows (seg0) with 0.012 ms to end, start offset by 45 ms.  
                     ->  Hash  (cost=1027512.00..1027512.00 rows=2083334 width=8)  
                           Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.  
                           ->  Append-only Columnar Scan on ttt2  (cost=0.00..1027512.00 rows=2083334 width=8)  
                                 Rows out:  0 rows (seg0) with 30 ms to end, start offset by 46 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 315K bytes.  
   (slice1)    Executor memory: 378K bytes avg x 48 workers, 378K bytes max (seg0).  Work_mem: 65114K bytes max.  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 7369.522 ms  
(25 rows)

2.4.性能指标

类型

总记录数

segment数

单segment记录数

单表聚合耗时

多表JOIN+聚合耗时

每segment每秒聚合记录数

每segment每秒JOIN+聚合记录数

INT8

1亿行

48

208万行

0.38秒

2.4秒

547万行

86万行 * 2

TEXT

1亿行

48

208万行

1.8秒

5.2秒

115万行

40万行 * 2

NUMERIC

1亿行 48

208万行

3.6秒

7.37秒

57万行

28万行 * 2

小结 设置多少个分区,除了业务逻辑层面的因素(比如按日、月、年,或者按LIST等),另外还应该考虑两方面的因素:

  1. 分区过多也会引入导致优化器生成执行计划较慢,元数据过多,SYSCACHE过大等问题。
  2. 单个SEGMENT的计算能力。(将分区后单个SEGMENT的单个分区内的记录数压缩到可以接受的范围。)例如:

100亿条记录,1000个SEGMENT,不分区的情况下,一个SEGMENT有1000万条记录。如果要满足在输入WHERE条件过滤数据后(假设过滤后要计算的记录数小于50亿条)INT8类型字段聚合1秒响应,根据以上性能测试数据,建议至少分成2个区。

数值类型的选择,除非精度要求,建议不要使用numeric。建议使用int, int8, float, float8等类型。从以上测试可以看出性能差异巨大。