PostgreSQL Architecture

Hello Friends,

In This post we will discuss about the postgress in details, As Process architecture , Background process etc. , So lets get started.

Origin : 

PostgreSQL (often referred to as Postgres) is an open source object-relational database management system with a focus on extensibility and standards compliance. It is a highly used and tested solution having started in 1981 as the Ingres project at the University of California. The project was later named Postgres as it became "post Ingres", and then in 1996 the online presence at the website was launched while the project was renamed PostgreSQL to reflect its support for SQL.


The first PostgreSQL release was known as version 6.0 on January 29, 1997 and since then PostgreSQL has continued to be developed by the PostgreSQL Global Development Group, a diverse group of companies and many thousands of individual contributors. As an open source solution, it is free and released under the terms of the PostgreSQL License, a permissive software license.


The PostgreSQL project continues to make major releases (approximately annually) and minor bug fix releases, all available under its free and open-source software PostgreSQL License. It can handle workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users. PostgreSQL runs on all major operating systems including Linux, UNIX variants, and Windows, and is fully ACID-compliant, with full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages).


PostgreSQL is highly scalable both in the sheer quantity of data it can manage and in the number of concurrent users it can accommodate. There are active PostgreSQL systems in production environments that manage in excess of 4 TB of data.


As an enterprise-class database, PostgreSQL boasts features such as:

·         Multi-Version Concurrency Control (MVCC)

·         point-in-time recovery

·         tablespaces

·         asynchronous replication

·         nested transactions (savepoints)

·         online/hot backups

·         sophisticated query planner/optimizer, and

·         write ahead logging for fault tolerance.

·         It also supports international character sets, multibyte character encodings, Unicode, and is locale-aware for sorting, case-sensitivity, and formatting.


PostgreSQL Memory Components


Here is a pictorial view of a PostgreSQL instance depicting the memory areas, server background processes and the underlying database files. The interesting point to note is all the memory components which are listed below are allocated in the RAM.


* Postgres is a client server architecture. Client and server can be on different or on same host. Mostly on the different host. These hosts communicate through TCP/IP based protocol.


SHARED MEMORY: When server gets start, it occupies some of the memory from the RAM. Depending upon the various parameters, x amount of memory is occupied from RAM. Whenever we start the database or DB cluster by executing the pg_ctl utility with start option, a postgres server process starts up. Then, it allocates a shared memory area in memory, starts various background processes, start replication associated processes and background worker processes if necessary, and waits for connection requests for clients. This area is used by all processes of a PostgreSQL server.


Shared Buffers (deals with DDL statements): It is always faster to read or write data in memory than on any other media. A database server also needs memory for quick access to data, whether it is READ or WRITE access. In PostgreSQL, this is referred to as "shared buffers" and is controlled by the parameter shared_buffers. The amount of RAM required by shared buffers is always locked for the PostgreSQL instance during its lifetime. The shared buffers are accessed by all the background server and user processes connecting to the database. The data that is written or modified in this location is called "dirty data" and the unit of operation being database blocks (or pages), the modified blocks are also called "dirty blocks" or "dirty pages". Subsequently, the dirty data is written to disk containing physical files to record the data in persistent location and these files are called "data files". PostgreSQL loads pages within tables and indexes from a persistent storage to here and operates them directly.


WAL Buffers (Write Ahead Log) (deals with DML statements): The write ahead log (WAL) buffers are also called "transaction log buffers", which is an amount of memory allocation for storing WAL data. This WAL data is the metadata information about changes to the actual data and is sufficient to reconstruct actual data during database recovery operations. The WAL data is written to a set of physical files in persistent location called "WAL segments" or "checkpoint segments". The WAL buffers memory allocation is controlled by the wal_buffers parameter, and it is allocated from the operating system RAM. Although this memory area is also accessible by all the background server and user processes, it is not part of the shared buffers. The WAL buffers are external to the shared buffers and are very small when compared to shared buffers. The WAL data is first modified (dirtied) in WAL buffers before it is written to WAL segments on disk. If it is left to default settings, then it is allocated with a size of 1/16th of the shared buffers.

WAL buffers are like logbook where changes are stored. If any UPDATE command occurs, the data is updated in the shared buffer and a log entry is done in the WAL buffer which includes primary ID with old and new data both. All the incremental changes are recorded in the WAL buffer. To ensure that no data has been lost by server failures, PostgreSQL supports the WAL mechanism. WAL data (also referred to as XLOG records) are transaction log in PostgreSQL; and WAL buffer is a buffering area of the WAL data before writing to a persistent storage.


CLOG Buffers: CLOG stands for "commit log", and the CLOG buffers is an area in operating system RAM dedicated to hold commit log pages. The commit log pages contain log of transaction metadata and differ from the WAL data. The commit logs have commit status of all transactions and indicate whether or not a transaction has been completed (committed). There is no specific parameter to control this area of memory. This is automatically managed by the database engine in tiny amounts. This is a shared memory component, which is accessible to all the background server and user processes of a PostgreSQL database. Commit Log(CLOG) keeps the states of all transactions (eg: in_progress, committed, aborted) for Concurrency Control (CC) mechanism.


Memory for Locks / Lock Space: This memory component is to store all heavyweight locks used by the PostgreSQL instance. These locks are shared across all the background server and user processes connecting to the database. A non-default larger setting of two database parameters namely max_locks_per_transaction and max_pred_locks_per_transaction in a way influences the size of this memory component.


PROCESS MEMORY / MEMORY ARCHITECTURE: (Local Memory Area OR Backend Process Area): This memory area is allocated by each backend process for its own use. It is a temporary used privately by each postgres process. By default each session will take a size of 4Mb. Eg.: If there are 100 sessions, then they will consume 400 Mb.


Vacuum Buffers: This is the maximum amount of memory used by each of the autovacuum worker processes, and it is controlled by the autovacuum_work_mem database parameter. The memory is allocated from the operating system RAM and is also influenced by the autovacuum_max_workers database parameter. The setting of autovacuum_work_mem should be configured carefully as autovacuum_max_workers times this memory will be allocated from the RAM. All these parameter settings only come into play when the auto vacuum daemon is enabled, otherwise, these settings have no effect on the behaviour of VACUUM when run in other contexts. This memory component is not shared by any other background server or user process.


Work Memory: This is the amount of memory reserved for either a single sort or hash table operation in a query and it is controlled by work_mem database parameter. A sort operation could be one of an ORDER BY, DISTINCT or Merge join, and a hash table operation could be due to a hash-join, hash based aggregation or an IN subquery. A single complex query may have many numbers of such sort or hash table operations, and as many chunks of memory allocations defined by the work_mem parameter will be created for each of those operations in a user connection. It is for this reason, that work_mem should not be declared to a very big value as it might lead to aggressively utilizing all the available memory from operating system for a considerably huge query, thereby starving the operating system of RAM which might be needed for other processes.

Executor uses this area for sorting tuples by ORDER BY and DISTINCT operations, and for joining tables by merge-join and hash-join operations.


Maintenance Work Memory: This is the maximum amount of memory allocation of RAM consumed for maintenance operations. A maintenance operation could be one of the VACUUM, CREATE INDEX or adding a FOREIGN KEY to a table. The setting is controlled by the maintenance_work_mem database parameter. A database session could only execute any of the above-mentioned maintenance operations at a time and a PostgreSQL instance does not normally execute many such maintenance operations concurrently. Hence, this parameter can be set significantly larger than work_mem parameter. A point of caution is to not set this memory to a very high value, which will allocate as many portions of memory allocations as defined by the autovacuum_max_workers parameter in the event of not configuring the autovacuum_work_mem parameter. Maintenance operations like VACUUM, REINDEX uses this area.


Temp Buffers: A database may have one or more temporary tables, and the data blocks (pages) of such temporary tables need a separate allocation of memory to be processed in. The temp buffers serve this purpose by utilizing a portion of RAM, defined by the temp_buffers parameter. The temp buffers are only used for access to temporary tables in a user session. There is no relation between temp buffers in memory and the temporary files that are created under the pgsql_tmp directory during large sort and hash table operations. Executor uses this area for sorting temporary tables.


BACKGROUND PROCESS: Those processes which get started when postgres server starts. Rest other processes are different & mandatory processes.

Postmaster: Postmaster is the first process which get starts when postgres cluster starts. The postmaster process acts as a supervisor. Postmaster act as a listener at a server side, any new connection coming in, it will first connect to the postmaster and every time for every connection postmaster creates "postgres" process. Postmaster runs on the default port no. 5432 and we can change or reconfigure the port no. anytime and restart the cluster. There is one postmaster for one cluster. PostgreSQL Server process or Postmaster is a parent of all in a PostgreSQL Server.

Backend/Postgres/UserEnd Process: When a new connection request comes, postmaster will create a new process called 'postgres' once your connection is authorized and authenticated. So, for every new client connection there will be new postgres process created and the client connection will directly interact with the new postgres process from onwards.

A backend process, which is also called postgres, is started by the postmaster for each client request and handles all queries issued by the one connected client.

It communicates with the client by a single TCP connection, using semaphores and shared memory and terminates when the client gets disconnected.

As it allowed to operate only one database, we must specify a database we want to use explicitly authenticated with username and password when connecting to a PostgreSQL server.

PostgreSQL allows multiple clients to connect simultaneously; the configuration parameter max_connection controls the maximum number of the clients (default is 100).

Vacuum and Autovacuum: Whenever we run an update command, system do not change the actual data, but it inserts a new data in reference to the old data and mark the row as deleted. Similarly, when we run a delete command, it will not delete the actual data, a new row is inserted in reference to that data and that data is marked as deleted. By default, autovacuum is run in every few minutes and clean all the delete marked rows from the database.

VACUUM reclaims storage occupied by dead tuples (deleted or updated rows). Analyse will collect stats for each table. VACUUM [({FULL | FREEZE | VERBOSE | ANALYZE} [, ...])] [table [(column [, ...])]]

Plain Vacuum (without FULL) simply reclaim space and can operate in parallel with normal reading and writing of the table.

Vacuum Full locks the entire table rewrites the entire contents of the table into a new file with no space, allows unused space to be returned to the OS.

Vacuum processing has been automated with the autovacuum daemon. The daemon periodically invokes several autovacuum_worker processes. By default, it wakes every 1 min (defined by autovacuum_naptime), and invokes three workers (defined by autovacuum_max_works)

Vacuum freeze marks a table's header with a very special timestamp that tells postgres that it does not need to be vacuumed, till next update.

Autovacuum Launcher: The autovacuum-worker processes are invoked for vacuum process periodically (it request to create the autovacuum workers to the postgres server).

Stats collector: It collects the statistics of the activities which are happening in the DB. Database should know where is the table present, table size, table type, cardinality of table, where are indexes of table all such kind of metadata is collected by stats collector. This stats collector process collects the data and save into the statistic tables or statistics dictionary table in the data file itself. What sessions are coming to the database, how many users logged in to the database all of these are the stats of DB.  This stats collection is carried out after some minutes or hours. We can configure the time of stats collector accordingly. We can automate or manualize the stat collection process.

Log/Background Writer: In this process, dirty pages on the shared buffer pool are written to a persistent storage (eg: HDD, SSD) on a regular basis gradually.

Wal Writer: It writes from the Wal_buffers to the actual Wal_files. This process writes and flushes periodically the WAL data on the WAL buffer to persistent storage.

Logging Collector (logger)/Logger Process: writes into the logs file. Basically, all the errors logs are recorded by this and written into the log files.

Checkpointer Process: It is responsible to triggering the checkpoints in postgres. Checkpoint process is a trigger for write process to write dirty pages to disk.

Replication Process: It is launched when we are setting up any replication of the database server.

bgworker (logical replication): Used for replication.

Statistics collector: In this process, statistics information such as for pg_stat_activity and for pg_stat_database, etc is collected.

Archiver: In this process archiving logging is executed. It copy WAL files to archive WAL.


Logical Structure: A cluster can have 'n' no. of databases. every database will contain some schemas, and each schema will contain some tables, indexes, views, functions etc. By default, schema is named as 'public'. It also contains cluster level tablespaces. Different databases can share the same tablespaces. Users are also created at cluster level. So, different users can access different databases in a cluster. So, users are not specific to database, users can login into any databases, they have login privileges on all the databases by default.








Data Files: It contains all the data required by the user for processing. Data from data files are first moved to shared buffers and then the data is processed accordingly.


WAL Files/Segments: WAL Buffers are linked with WAL Files/Segments. WAL buffer flushes all the data into WAL Segment whenever commit command occurs. Commit is a command which ensures the end of the transaction. Commit work by default after every command in PostgreSQL which can also be changed accordingly.


Archived WAL/Files: This file segment is a kind of copy of WAL Segment which stores all the archived data of WAL Segment which can be stored for a long period. WAL segment is having limited storage capacity. Once the space is exhausted in WAL Segment, the system starts replacing the old data with new data which results in the loss of old data. Eg: We are having complete backup of a X database on every Sunday. If the X database get crash on Wednesday, we can restore the data with Sunday backup and with the help of archive restore procedures, we can re-apply all the changes happened in the database from Sunday to Wednesday.

Archives are required till full backup of the database. After that we can even delete the archive in order to gain some space in the DB.


Error Log Files: Those files which contains all the error messages, warning messages, informational messages, or messages belonging to all the major thing happening to the database. All logs related to DDL and DML changes are not stored in this space. Internal errors, internal bugs entry is stored in this file which help the admin to troubleshoot the problem.

Next Post »