分析PostgreSQL中的大表连接

2024-04-02 19:04:59 282人浏览泡泡鱼

摘要

这篇文章主要介绍“分析postgresql中的大表连接”，在日常操作中，相信很多人在分析Postgresql中的大表连接问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”分析Po

这篇文章主要介绍“分析postgresql中的大表连接”，在日常操作中，相信很多人在分析Postgresql中的大表连接问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”分析PostgreSQL中的大表连接”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

数据库配置
主机CPU 4核，内存4G，PG共享缓存128MB，work_mem 4MB。

测试数据
创建4张表，每张表1000w行，数据量约1G，是PG共享内存的8倍。

drop table t_big_1;
drop table t_big_2;
drop table t_big_3;
drop table t_big_4;
create table t_big_1(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));
create table t_big_2(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));
create table t_big_3(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));
create table t_big_4(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));
insert into t_big_1 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;
insert into t_big_2 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;
insert into t_big_3 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;
insert into t_big_4 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;
show shared_buffers;
show effective_cache_size;
show work_mem;
select pg_size_pretty(pg_table_size('t_big_1'));
select pg_size_pretty(pg_table_size('t_big_2'));
select pg_size_pretty(pg_table_size('t_big_3'));
select pg_size_pretty(pg_table_size('t_big_4'));
analyze t_big_1,t_big_2,t_big_3,t_big_4;
explain verbose
select a.*
from t_big_1 a join t_big_2 b on a.c1 = b.c1;
explain verbose
select a.id,b.c1,c.c2,d.c3 
from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 d
where a.id = b.id and b.id = c.id and c.id = d.id;
explain verbose
select a.id,b.c1,c.c2,d.c3 
from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 d
where a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2;

大表连接
未分析数据表前

[local:/data/run/pg12]:5120 pg12@testdb=# explain verbose
pg12@testdb-# select a.id,b.c1,c.c2,d.c3 
pg12@testdb-# from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 d
pg12@testdb-# where a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=164722831406.26..1096915306139605248.00 rows=73127676034285903872 width=238)
   Output: a.id, b.c1, c.c2, d.c3
   Merge Cond: ((b.c1)::text = (c.c1)::text)
   ->  Sort  (cost=58799667920.13..59102008117.66 rows=120936079012 width=82)
         Output: a.id, b.c1
         Sort Key: b.c1
         ->  Merge Join  (cost=2124653.55..1816202724.10 rows=120936079012 width=82)
               Output: a.id, b.c1
               Merge Cond: (a.id = b.id)
               ->  Sort  (cost=894232.27..906527.40 rows=4918050 width=4)
                     Output: a.id
                     Sort Key: a.id
                     ->  Seq Scan on public.t_big_1 a  (cost=0.00..213115.50 rows=4918050 width=4)
                           Output: a.id
               ->  Materialize  (cost=1230421.27..1255011.52 rows=4918050 width=82)
                     Output: b.c1, b.id
                     ->  Sort  (cost=1230421.27..1242716.40 rows=4918050 width=82)
                           Output: b.c1, b.id
                           Sort Key: b.id
                           ->  Seq Scan on public.t_big_2 b  (cost=0.00..213115.50 rows=4918050 width=82)
                                 Output: b.c1, b.id
   ->  Materialize  (cost=105923163486.13..106527843881.19 rows=120936079012 width=234)
         Output: c.c2, c.c1, d.c3
         ->  Sort  (cost=105923163486.13..106225503683.66 rows=120936079012 width=234)
               Output: c.c2, c.c1, d.c3
               Sort Key: c.c1
               ->  Merge Join  (cost=3066006.55..1817144077.10 rows=120936079012 width=234)
                     Output: c.c2, c.c1, d.c3
                     Merge Cond: ((c.c2)::text = (d.c2)::text)
                     ->  Sort  (cost=1533003.27..1545298.40 rows=4918050 width=156)
                           Output: c.c2, c.c1
                           Sort Key: c.c2
                           ->  Seq Scan on public.t_big_3 c  (cost=0.00..213115.50 rows=4918050 width=156)
                                 Output: c.c2, c.c1
                     ->  Materialize  (cost=1533003.27..1557593.52 rows=4918050 width=156)
                           Output: d.c3, d.c2
                           ->  Sort  (cost=1533003.27..1545298.40 rows=4918050 width=156)
                                 Output: d.c3, d.c2
                                 Sort Key: d.c2
                                 ->  Seq Scan on public.t_big_4 d  (cost=0.00..213115.50 rows=4918050 width=156)
                                       Output: d.c3, d.c2
(41 rows)

可以看到，未分析前，执行计划使用merge join，计划的cost是一个大数。

执行分析后

[local:/data/run/pg12]:5120 pg12@testdb=# explain (analyze,buffers,verbose)
select a.id,b.c1,c.c2,d.c3 
from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 d
where a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2;
                                                                              QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=896126.19..2564935.91 rows=9999844 width=97) (actual time=393803.655..404902.025 rows=10000000 loops=1)
   Output: a.id, b.c1, c.c2, d.c3
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=266 read=655676 dirtied=486717 written=486414, temp read=473954 written=486068
   ->  Parallel Hash Join  (cost=895126.19..1563951.51 rows=4166602 width=97) (actual time=393672.896..398825.027 rows=3333333 loops=3)
         Output: a.id, b.c1, c.c2, d.c3
         Hash Cond: ((c.c2)::text = (d.c2)::text)
         Buffers: shared hit=266 read=655676 dirtied=486717 written=486414, temp read=473954 written=486068
         Worker 0: actual time=393629.565..399028.498 rows=3549817 loops=1
           Buffers: shared hit=118 read=218079 dirtied=161599 written=161495, temp read=162307 written=161880
         Worker 1: actual time=393585.994..399049.295 rows=3609509 loops=1
           Buffers: shared hit=119 read=217313 dirtied=161014 written=160913, temp read=163324 written=160736
         ->  Parallel Hash Join  (cost=592683.65..1070481.02 rows=4166681 width=66) (actual time=328335.871..378143.916 rows=3333333 loops=3)
               Output: a.id, b.c1, c.c2
               Hash Cond: ((b.c1)::text = (c.c1)::text)
               Buffers: shared hit=63 read=491773 dirtied=352782 written=352575, temp read=267125 written=274312
               Worker 0: actual time=328475.430..378240.528 rows=3325497 loops=1
                 Buffers: shared hit=25 read=164024 dirtied=117445 written=117373, temp read=88941 written=91448
               Worker 1: actual time=328084.038..377943.176 rows=3311112 loops=1
                 Buffers: shared hit=29 read=163900 dirtied=117550 written=117481, temp read=88747 written=91320
               ->  Parallel Hash Join  (cost=290238.33..609558.42 rows=4166681 width=35) (actual time=158380.042..198763.345 rows=3333333 loops=3)
                     Output: a.id, b.c1
                     Hash Cond: (a.id = b.id)
                     Buffers: shared hit=63 read=327838 dirtied=218847 written=218710, temp read=98317 written=100856
                     Worker 0: actual time=158518.764..199077.411 rows=3331104 loops=1
                       Buffers: shared hit=25 read=109394 dirtied=72893 written=72845, temp read=32790 written=33668
                     Worker 1: actual time=158520.409..198920.394 rows=3332824 loops=1
                       Buffers: shared hit=29 read=109323 dirtied=73002 written=72956, temp read=32934 written=33560
                     ->  Parallel Seq Scan on public.t_big_1 a  (cost=0.00..205601.81 rows=4166681 width=4) (actual time=239.830..75704.152 rows=3333333 loops=3)
                           Output: a.id
                           Buffers: shared read=163935 dirtied=109449 written=109391
                           Worker 0: actual time=239.584..75677.703 rows=3327794 loops=1
                             Buffers: shared read=54554 dirtied=36489 written=36468
                           Worker 1: actual time=240.355..75258.837 rows=3347802 loops=1
                             Buffers: shared read=54882 dirtied=36486 written=36467
                     ->  Parallel Hash  (cost=205601.81..205601.81 rows=4166681 width=35) (actual time=65812.428..65812.431 rows=3333333 loops=3)
                           Output: b.c1, b.id
                           Buckets: 65536  Batches: 256  Memory Usage: 3328kB
                           Buffers: shared hit=32 read=163903 dirtied=109398 written=109319, temp written=70136
                           Worker 0: actual time=65812.900..65812.904 rows=3345876 loops=1
                             Buffers: shared hit=11 read=54840 dirtied=36404 written=36377, temp written=23428
                           Worker 1: actual time=65812.873..65812.875 rows=3321816 loops=1
                             Buffers: shared hit=15 read=54441 dirtied=36516 written=36489, temp written=23320
                           ->  Parallel Seq Scan on public.t_big_2 b  (cost=0.00..205601.81 rows=4166681 width=35) (actual time=1.490..47839.237 rows=3333333 loops=3)
                                 Output: b.c1, b.id
                                 Buffers: shared hit=32 read=163903 dirtied=109398 written=109319
                                 Worker 0: actual time=1.464..47814.446 rows=3345876 loops=1
                                   Buffers: shared hit=11 read=54840 dirtied=36404 written=36377
                                 Worker 1: actual time=1.470..47104.413 rows=3321816 loops=1
                                   Buffers: shared hit=15 read=54441 dirtied=36516 written=36489
               ->  Parallel Hash  (cost=205601.81..205601.81 rows=4166681 width=62) (actual time=113720.080..113720.080 rows=3333333 loops=3)
                     Output: c.c2, c.c1
                     Buckets: 65536  Batches: 512  Memory Usage: 2432kB
                     Buffers: shared read=163935 dirtied=133935 written=133865, temp written=103856
                     Worker 0: actual time=113719.124..113719.124 rows=3332395 loops=1
                       Buffers: shared read=54630 dirtied=44552 written=44528, temp written=34648
                     Worker 1: actual time=113720.557..113720.558 rows=3329197 loops=1
                       Buffers: shared read=54577 dirtied=44548 written=44525, temp written=34576
                     ->  Parallel Seq Scan on public.t_big_3 c  (cost=0.00..205601.81 rows=4166681 width=62) (actual time=0.126..80608.068 rows=3333333 loops=3)
                           Output: c.c2, c.c1
                           Buffers: shared read=163935 dirtied=133935 written=133865
                           Worker 0: actual time=0.260..80737.065 rows=3332395 loops=1
                             Buffers: shared read=54630 dirtied=44552 written=44528
                           Worker 1: actual time=0.049..80943.448 rows=3329197 loops=1
                             Buffers: shared read=54577 dirtied=44548 written=44525
         ->  Parallel Hash  (cost=205601.02..205601.02 rows=4166602 width=62) (actual time=10279.722..10279.722 rows=3333333 loops=3)
               Output: d.c3, d.c2
               Buckets: 65536  Batches: 512  Memory Usage: 2400kB
               Buffers: shared hit=32 read=163903 dirtied=133935 written=133839, temp written=103004
               Worker 0: actual time=10222.812..10222.812 rows=3297904 loops=1
                 Buffers: shared hit=9 read=54055 dirtied=44154 written=44122, temp written=34236
               Worker 1: actual time=10222.839..10222.839 rows=3258559 loops=1
                 Buffers: shared hit=6 read=53413 dirtied=43464 written=43432, temp written=33504
               ->  Parallel Seq Scan on public.t_big_4 d  (cost=0.00..205601.02 rows=4166602 width=62) (actual time=0.163..7282.409 rows=3333333 loops=3)
                     Output: d.c3, d.c2
                     Buffers: shared hit=32 read=163903 dirtied=133935 written=133839
                     Worker 0: actual time=0.108..7244.071 rows=3297904 loops=1
                       Buffers: shared hit=9 read=54055 dirtied=44154 written=44122
                     Worker 1: actual time=0.034..7223.191 rows=3258559 loops=1
                       Buffers: shared hit=6 read=53413 dirtied=43464 written=43432
 Planning Time: 1.134 ms
 Execution Time: 405878.841 ms
(83 rows)
[local:/data/run/pg12]:5120 pg12@testdb=#

可以看到，执行计划中的成本回归一个正常的数值，算法使用Hash Join。由于内存不足，PG把数据拆分为N份，使用临时表来临时缓存Hash Table，使用不同的Batch来执行Join。

到此，关于“分析PostgreSQL中的大表连接”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注编程网网站，小编会继续努力为大家带来更多实用的文章！

您可能感兴趣的文档:

点击免费下载>>软考高级考试备考技巧/历年真题/备考精华资料

--结束END--

本文标题: 分析PostgreSQL中的大表连接

本文链接: https://www.lsjlt.com/news/63073.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

本篇文章演示代码以及资料文档资料下载

下载Word文档到电脑，方便收藏和打印～

下载Word文档

去做题

猜你喜欢

分析PostgreSQL中的大表连接

这篇文章主要介绍“分析PostgreSQL中的大表连接”，在日常操作中，相信很多人在分析PostgreSQL中的大表连接问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”分析Po...

99+

2022-10-18
MySQL中多表连接的示例分析

这篇文章主要介绍MySQL中多表连接的示例分析，文中介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们一定要看完！连接可用于查询,更新,建立事实外键(指人为建立的两张表的对应关系,相对的,FORGIEN ...

99+

2022-10-18
PHP中连接符大于号、等号大于号的示例分析

小编给大家分享一下PHP中连接符大于号、等号大于号的示例分析，相信大部分人都还不怎么了解，因此分享这篇文章给大家参考一下，希望大家阅读完这篇文章后大有收获，下面让我们一起去了解一下吧！在学习PHP中，遇到了->和=>这两个符号。...

99+

2023-06-14
Qt5连接并操作PostgreSQL数据库的实现示例分析

Qt5连接并操作PostgreSQL数据库的实现示例分析，针对这个问题，这篇文章详细介绍了相对应的分析和解答，希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。1. 浏览器搜索Qt，打开第一个连接。鼠标悬浮Learning，点击...

99+

2023-06-22
如何进行sql表连接查询的分析

如何进行sql表连接查询的分析，很多新手对此不是很清楚，为了帮助大家解决这个难题，下面小编将为大家详细讲解，有这方面需求的人可以来学习下，希望你能有所收获。连接查询:连接查询是指基于两个或者两个以上的表或者...

99+

2022-10-19
分析PostgreSQL中的tuple locks

这篇文章主要介绍“分析PostgreSQL中的tuple locks”，在日常操作中，相信很多人在分析PostgreSQL中的tuple locks问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方...

99+

2022-10-18
mysql中三张表连接建立视图的示例分析

这篇文章主要为大家展示了“mysql中三张表连接建立视图的示例分析”，内容简而易懂，条理清晰，希望能够帮助大家解决疑惑，下面让小编带领大家一起研究并学习一下“mysql中三张表连接建立视图的示例分析”这篇文...

99+

2022-10-18
Oracle中sql语句(+)符号代表连接的示例分析

小编给大家分享一下Oracle中sql语句(+)符号代表连接的示例分析，相信大部分人都还不怎么了解，因此分享这篇文章给大家参考一下，希望大家阅读完这篇文章后大有收获，下面让我们一起去了解一下吧！oracle...

99+

2022-10-18
SQL中自连接的示例分析

这篇文章给大家分享的是有关SQL中自连接的示例分析的内容。小编觉得挺实用的，因此分享给大家做个参考，一起跟随小编过来看看吧。自连接是什么那我们如何理解自连接呢自连接说白了其实就是两张表结构和数据内容完全一样...

99+

2022-10-18
分析PostgreSQL中的synchronous_commit参数

本篇内容主要讲解“分析PostgreSQL中的synchronous_commit参数”，感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷，实用性强。下面就让小编来带大家学习“分析PostgreSQL中的...

99+

2022-10-18
PostgreSQL中PlannedStmt的跟踪分析

这篇文章主要为大家展示了“PostgreSQL中PlannedStmt的跟踪分析”，内容简而易懂，条理清晰，希望能够帮助大家解决疑惑，下面让小编带领大家一起研究并学习一下“PostgreSQL中Planne...

99+

2022-10-19
PostgreSQL中的ProcessRepliesIfAny函数分析

本篇内容主要讲解“PostgreSQL中的ProcessRepliesIfAny函数分析”，感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷，实用性强。下面就让小编来带大家学习“PostgreSQL中的P...

99+

2022-10-18
MySQL数据库中多表查询之内连接，外连接，子查询的示例分析

小编给大家分享一下MySQL数据库中多表查询之内连接，外连接，子查询的示例分析，相信大部分人都还不怎么了解，因此分享这篇文章给大家参考一下，希望大家阅读完这篇文章后大有收获，下面让我们一起去了解一下吧！多表...

99+

2022-10-18
Redis中连接错误的示例分析

这篇文章主要介绍Redis中连接错误的示例分析，文中介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们一定要看完！前言最近由于流量增大，redis 出现了一连串错误，比如：LOADING Redis is...

99+

2022-10-18
SQL中的连接查询实例分析

这篇文章主要讲解了“SQL中的连接查询实例分析”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习“SQL中的连接查询实例分析”吧！Join 连接 (SQL Join)SQL J...

99+

2023-07-02
MySQL5.7数据库中表连接、子查询、外键的示例分析

小编给大家分享一下MySQL5.7数据库中表连接、子查询、外键的示例分析，希望大家阅读完这篇文章之后都有所收获，下面让我们一起去探讨吧！文章目录表连接自关联外键内连接左连接右连接子查询外键介绍创建表时设置外...

99+

2022-10-18
怎么在postgresql中查询表的大小

在postgresql中查询表大小的方法：1.启动postgresql服务；2.登录postgresql数据库；3.使用数据库；4.使用pg_relation_size函数查询；具体步骤如下：首先，在命令行中启动postgresql服务；n...

99+

2022-10-04
分析PostgreSQL中的distinct和group by

本篇内容介绍了“分析PostgreSQL中的distinct和group by”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅...

99+

2022-10-18
分析PostgreSQL CreateFunction中的interpret_function_parameter_list函数

这篇文章主要介绍“分析PostgreSQL CreateFunction中的interpret_function_parameter_list函数”，在日常操作中，相信很多人在分析PostgreSQL Cr...

99+

2022-10-18
分析PostgreSQL CreateFunction中的ProcedureCreate函数

本篇内容介绍了“分析PostgreSQL CreateFunction中的ProcedureCreate函数”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何...

99+

2022-10-18