pgsql之pg_stat_replication的使用详解
pg_stat_replication是一个视图,主要用于监控一个基于流的设置,建议您 注意系统上称作pg_stat_replication的视图。(注:当前版本为pg 10.0,10.0以下版本,字段名会有差异)此视图包含以下信息:
\d pg_stat_replication
每个字段代码的含义:
"" src="/UploadFiles/2021-04-09/20210115092111.jpg">
在Linux上我们可以看到那个进程不仅有自己的作用 (在这种情况下, wal_sender),而且还带有终端用户的名字以及相关的网络连接信息。在上图中我们可以看到已经有人从192.168.47.127(对应pg_stat_replication的client_addr字段)通过51519(对应pg_stat_replication的client_port字段))端口连接到了master。
bonus:
上面我们提到replay_lsn
是slave上重放的最后的事务日志位置。
pg_current_wal_lsn()函数的作用是获取当前的wal log的写位置。
pg_wal_lsn_diff()函数的作用是计算两个wal日志之间的差距。
所以我们可以通过下面的方法获取高可用架构下从库的复制延迟情况:
SELECT pg_wal_lsn_diff(A .c1, replay_lsn) /(1024 * 1024) AS slave_latency_MB FROM pg_stat_replication, pg_current_wal_lsn() AS A(c1) WHERE client_addr='%s' and application_name = '%s' ORDER BY slave_latency_MB LIMIT 1;
补充:PostgreSQL pg_stat_replication sync_state introduce
PostgreSQL 9.2引入同步复制后, pg_stat_replication的sync_state列有3种状态.
sync
async
potential
分别代表同步standby, 异步standby, 可升级为同步的standby.
状态来自以下函数 : pg_stat_get_wal_senders
[测试]
环境:
1个 primary, 3个 standby.
第一种配置 :
primary配置
postgresql.conf synchronous_standby_names = 'test1,test2,test3'
standby1配置
primary_conninfo = 'application_name=test1 host=127.0.0.1 port=1999 user=postgres keepalives_idle=60'
standby2配置
primary_conninfo = 'application_name=test2 host=127.0.0.1 port=1999 user=postgres keepalives_idle=60'
standby3配置
primary_conninfo = 'application_name=test3 host=127.0.0.1 port=1999 user=postgres keepalives_idle=60'
primary查询
digoal=# select pid,application_name,client_addr,sync_state from pg_stat_replication; pid | application_name | client_addr | sync_state ------+------------------+-------------+------------ 6311 | test1 | 127.0.0.1 | sync 6321 | test2 | 127.0.0.1 | potential 6391 | test3 | 127.0.0.1 | potential (3 rows)
如果sync节点挂掉, 按synchronous_standby_names的顺序, 第一个potential节点会变成sync状态.
pg_ctl stop -m fast -D /pgdata11999 digoal=# select pid,application_name,client_addr,sync_state from pg_stat_replication; pid | application_name | client_addr | sync_state ------+------------------+-------------+------------ 6564 | test2 | 127.0.0.1 | sync 6568 | test3 | 127.0.0.1 | potential (2 rows)
当test1重新起来后又会变成sync状态.
pg93@db-172-16-3-33-> pg_ctl start -D /pgdata11999 server starting digoal=# select pid,application_name,client_addr,sync_state from pg_stat_replication; pid | application_name | client_addr | sync_state ------+------------------+-------------+------------ 6564 | test2 | 127.0.0.1 | potential 6605 | test1 | 127.0.0.1 | sync 6568 | test3 | 127.0.0.1 | potential (3 rows)
第二种配置 :
primary配置
synchronous_standby_names = 'test1,test2'
standby1配置不变
standby2配置不变
standby3配置不变
primary查询
digoal=# select pid,application_name,client_addr,sync_state from pg_stat_replication; pid | application_name | client_addr | sync_state ------+------------------+-------------+------------ 6470 | test1 | 127.0.0.1 | sync 6472 | test3 | 127.0.0.1 | async 6474 | test2 | 127.0.0.1 | potential (3 rows)
test3变成异步了. 因为test3没有配置在primary的synchronous_standby_names 中.
第三种配置 :
primary配置
synchronous_standby_names = 'test1'
standby1配置不变
standby2配置不变
standby3配置不变
primary查询
digoal=# select pid,application_name,client_addr,sync_state from pg_stat_replication; pid | application_name | client_addr | sync_state ------+------------------+-------------+------------ 6519 | test2 | 127.0.0.1 | async 6521 | test3 | 127.0.0.1 | async 6523 | test1 | 127.0.0.1 | sync (3 rows)
test2,test3变成异步了. 因为test2,test3没有配置在primary的synchronous_standby_names 中.
1. src/backend/replication/walsender.c
/* * Returns activity of walsenders, including pids and xlog locations sent to * standby servers. */ Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS) { ...略 /* * More easily understood version of standby state. This is purely * informational, not different from priority. */ if (sync_priority[i] == 0) values[7] = CStringGetTextDatum("async"); else if (i == sync_standby) values[7] = CStringGetTextDatum("sync"); else values[7] = CStringGetTextDatum("potential"); ...略
以上为个人经验,希望能给大家一个参考,也希望大家多多支持。如有错误或未考虑完全的地方,望不吝赐教。
下一篇:pgsql 如何删除仍有活动链接的数据库