# 大数据处理黑科技：揭秘PB级数仓GaussDB(DWS) 并行计算技术

GaussDB(DWS)基于代价估算，根据表的数据量信息来为计划片段生成合适的并行度，下面以TPC-DS Q48为例来看一下，GaussDB(DWS)的并行计划长什么样？

```select sum (ss_quantity)
from   store_sales, store, customer_demographics, customer_address, date_dim
where s_store_sk = ss_store_sk
and    ss_sold_date_sk = d_date_sk and d_year = 1998
and
(
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 100.00 and 150.00
)
or
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'D'
and
cd_education_status = 'Primary'
and
ss_sales_price between 50.00 and 100.00
)
or
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'U'
and
and
ss_sales_price between 150.00 and 200.00
)
)
and
(
(
and
ca_country = 'United States'
and
ca_state in ('KY', 'GA', 'NM')
and ss_net_profit between 0 and 2000
)
or
and
ca_country = 'United States'
and
ca_state in ('MT', 'OR', 'IN')
and ss_net_profit between 150 and 3000
)
or
and
ca_country = 'United States'
and
ca_state in ('WI', 'MO', 'WV')
and ss_net_profit between 50 and 25000
)
)
;```

1）Streaming(type: SPLIT REDISTRIBUTE)：同串行场景下的Redistribute，但以线程为单位进行数据传输，每条元组发送给一个目的线程。

3）Streaming(type: LOCAL REDISTRIBUTE)：作用是DN内部根据当前分布键进行数据Redistribute。通常作用于基表扫描后，因为基表扫描是按页面进行线程划分，此时线程间并不是按DN的分布键分布的，需要增加该DN内重分布操作。

#### 在GaussDB(DWS)中，我们增加了参数query_dop，来控制语句的并行度，取值如下：

1）query_dop=1，串行执行，默认值

2）query_dop=[2..N]，指定并行执行并行度

3）query_dop=0，自适应调优，根据系统资源和语句复杂度情况自适应选择并行度