Thursday, January 24, 2019

POSTGRES DIRECTORY STRUCTURE

PostgreSQL DIRECTORY STRUCTURE

All the data needed for a database cluster is stored within the cluster's data directory, commonly referred to as PGDATA.


 pg_serial is used to track summarized information about committed serializable transactions which might still become part of a serialization failure rolling back some not-yet-committed transaction to protect data integrity.

The catalog cache is information from the system tables which describes the tables, indexes, views, etc. in the database. If you had to re-read that from the system tables each time, it would be slow. Even shared memory would be clumsy for that, so each backend process has its own cache of system catalog data for fast lookup. When anything changes, all backends are sent a signal to update or reload their cache data. When pages are read or written, they go through the OS cache, which is not directly under PostgreSQL control. The optimizer needs to keep track of a lot of information while it parses and plans a query, which is why that is shown. A plan has execution nodes, some of which may need to use memory; that is where work_mem comes in -- a sort or hash table (as examples) will try not to exceed work_mem *for that node*. It is significant that one query might use quite a few nodes which each allocate memory up to work_mem. But since most queries are simpler and might not use any work_mem allocations, people often do their calculations based on an expected maximum of one allocation per backend (i.e., per connection). But that could be off by quite a bit if all connections might be running queries with five nodes allocating memory.






No comments: