AppLogic 2.3 Beta Documentation The latest production release is AppLogic 3.0.30 PGSQL - PosgreSQL Database Appliance - Implementation
Base Class
The LUX5 appliance image is used as the base class for PGSQL appliance.
Class Volumes
PGSQL itself is not sensitive to the volume configuration and will work on most any setup. Besides the PostgreSQL binaries, it does not require any additional software to be able to start. Samba client is required for logging. The setup chosen for the 1.0 release is as follows:
- boot volume, 125MB size, mounted as "/", writeable, instantiable.
- usr volume, 175MB size, mounted as "/usr", common
The boot volume contains a basic Linux installation as the one present on the LUX5 appliance.
In summary, PGSQL's boot volume contains:
- CentOS 5, as base OS image (as installed on the LUX5 appliance)
- PostgreSQL database engine version 8.2.7, for data storage and query processing
- Samba client, for accessing files through the
log terminal
- sshd, used for logging into the component through the default interface
In addition to the class volume, PGSQL requires a user-supplied data volume that is used to store the actual database:
- data volume, user-defined size, mounted as "/mnt/data", writeable, placeholder.
Packages
- Main
-
postgresql-libs-8.2.9-1PGDG.rhel5.i386.rpm: PostgreSQL client shared libraries
-
postgresql-8.2.9-1PGDG.rhel5.i386.rpm: PostgreSQL database utilities
-
postgresql-server-8.2.9-1PGDG.rhel5.i386.rpm: PostgreSQL database engine
-
postgresql-test-8.2.9-1PGDG.rhel5.i386.rpm: PostgreSQL tests package
- Dependencies - Samba Client (used for the PostgreSQL error log)
-
samba-common-3.0.24-7.fc5.i386.rpm: samba shared libraries
-
samba-client-3.0.24-7.fc5.i386.rpm: samba client
- Extra
-
dhclient-3.0.3-26.i386.rpm: Updated dhclient package
-
logrotate-3.7.4-8.i386.rpm: Updated logrotate package
-
tcpdump-3.9.4-2.2.i386.rpm: Updated tcpdump package
PGSQL does not modify PostgreSQL despite configuration templates; PGSQL uses the PostgreSQL binary RPM installation
Contents of the release directory:
Exported appliance image:
PostgreSQL? Binary RPMs: install these in the order listed to re-create the appliance:
-
postgresql-libs-8.2.9-1PGDG.rhel5.i386.rpm
-
postgresql-8.2.9-1PGDG.rhel5.i386.rpm
-
postgresql-server-8.2.9-1PGDG.rhel5.i386.rpm
-
postgresql-test-8.2.9-1PGDG.rhel5.i386.rpm
-
samba-common-3.0.24-7.fc5.i386.rpm
-
samba-client-3.0.24-7.fc5.i386.rpm
-
dhclient-3.0.3-26.i386.rpm
-
logrotate-3.7.4-8.i386.rpm
-
tcpdump-3.9.4-2.2.i386.rpm
PGSQL automated build script: use this to automatically build PGSQL appliance
-
pgsql-build.1.0.0-01.tar.bz2
PGSQL automated test suite: use this to test PGSQL appliance
-
pgsql-test.1.0.0-01.tar.bz2
Source RPMs:
-
postgresql-8.2.9.src.tar.bz2 - full PostgreSQL sources (including regression and pgbench tests)
Theory of Operation
PostgreSQL consists of one main background daemon called
postmaster (known as the "master process"). This daemon is loaded in the background when PGSQL is started.
postmaster waits for incoming connections from clients and for each new connection, it spawns a new server process named
postgres (known as a "server task" - in reality this is the same executable as
postmaster). PostgreSQL is based on a client/server model where each client process is connected to exactly one server process. The server tasks communicate with each other using semaphores and share memory to ensure data integrity for concurrent data access.
In addition to the daemon above, there are several other processes that PostgreSQL runs in the background when
postmaster is started:
- a writer process - used to perform checkpoints, flush dirty pages to disk
- a logger process
- stats collector process
PostgreSQL clients must use the PostgreSQL protocol to communicate with the
postmaster daemon. PostgreSQL provides a "C" library named
libpq which is used by a majority of the clients (for example,
pg_regress and
pgbench used in the PGSQL tests both use this library). In addition, there are other implementations of the protocol for many other types of clients such as JDBC.
Configuration
General
PostgreSQL is configured using a file called
postgresql.conf which contains all of the PostgreSQL configuration settings. PGSQL stores modified template config files (postgresql.conf.tmpl, pg_hba.conf ) files at
/appliance/conf directory. There are multiple PGSQL properties that affect multiple PostgreSQL settings; start script recreates the main config file (postgresql.conf) using the template file and properties supplied by
AppLogic.
When PGSQL is started, it generates the configuration file and then starts the
postmaster daemon. This is the only daemon to start for PostgreSQL.
postmaster; It is started using modified /etc/init.d/postgresql script or /appliance/appliance.sh script
The table below shows the mapping of PGSQL properties to PostgreSQL settings. For any PostgreSQL setting not in the table and not mentioned in this section, PGSQL leaves it as its default value.
| PGSQL Property | PostgreSQL Setting in postgresql.conf | Notes |
auto_create | Should be enabled to create a fresh database if it does not exist at /mnt/data/pg_data; needs read_only to be disabled (look below) |
read_only | stats_start_collector autovacuum | If read only, both are off otherwise on. |
max_connections | max_connections |
query_complexity | work_mem | Configured as described below. |
sync_on_write | fsync |
log_filename | log_directory log_filename redirect_stderr | If log_filename is empty, path settings are empty strings and redirect_stderr is off; otherwise they are set accordingly. |
log_age | log_rotation_age |
log_size | log_rotation_size |
log_level | log_min_messages log_min_error_statement client_min_messages log_min_duration_statement log_connections log_disconnections log_duration | log_level setting for error or warn is propagated accordingly; debug affects all settings after client_min_messages. |
log_cmd_stats | stats_start_collector stats_command_string | If enabled, both settings are on (stats_start_collector may already be enabled if not read-only database). Required to be "on" for auto-vacuum. |
In addition to the above settings, PGSQL modifies the following PostgreSQL settings:
| !PostgreSQL Setting in postgresql.conf | Notes |
data_directory | Set to /mnt/data/pg_data |
listen_addresses | Set to 'in,localhost' (terminal on which to accept connections and local loopback address) |
shared_buffers | Calculated based on assigned memory - defined below. |
maintenance_work_mem | Calculated based on assigned memory - defined below. |
work_mem | Calculated based on assigned memory - defined below. |
effective_cache_size | Calculated based on assigned memory - defined below. |
log_statement | Set to all when log_level is set to "debug" (so all SQL statements will be logged if enabled). |
Auto-creation
When PGSQL is configured to auto-create the database (
auto_create is on,
read_only is off), it first checks whether the database exists on the configured
data volume. PostgreSQL does not provide a way to determine whether a database is valid. As such, PGSQL checks the volume for a file named
PG_VERSION (PostgreSQL version file) and base/template1 folder. If this file exists and base/template1 folder is not, PGSQL assumes there is a PostgreSQL database present on the
data volume. If the file is missing, PGSQL deletes the entire volume content and installs a brand new PostgreSQL database on the volume. If file AND folder both present, PostgreSQL appliance checks for
read_only and
auto_create properties state. It removes old files and create fresh database if allowed, or stops operating and sends 'start_failed' message to controller if not.
Shared_buffers
This is an amount of memory that the database process is using for shared buffers. The exact amount of memory is calculated based on total amount of memory available to the PGSQL. Approximate values of shared memory configured are shown below:
| Total memory: | 128Mb | 192Mb | 256Mb | 384Mb | 512Mb | 1Gb | 1.5Gb | 2Gb | 3Gb | 4Gb | 6Gb | 8Gb | 16Gb | 32Gb |
| Shared buffers: | 16Mb | 24Mb | 32Mb | 48Mb | 64Mb | 128Mb | 192Mb | 256Mb | 384Mb | 512Mb | 768Mb | 1Gb | 2Gb | 4Gb |
Work_mem
This is amount of memory that PostgreSQL? uses for every connection (sort operations, hash tables) before switching to the temporary disk files. Complex queries require more memory. Work_mem is calculated by PGSQL based on total amount of memory available, max_connections and query_complexity setting. Default setting is 1Mb and the maximum is 16Mb. This setting can be changed per-connection. Amount assigned is doubled when query_complexity is set to "complex"
Maintenance_work_mem
This setting specifies the amount of memory used for maintenance operations, like VACUUM, ANALYZE, CREATE INDEX, and adding foreign keys. Since only one of these operations can be executed at a time by a database session, and an installation normally doesn't have very many of them happening concurrently, PGSQL sets this to double of work_mem.
Effective_cache_size
This setting sets the optimizer's assumption about the effective size of the disk cache (that is, the portion of the kernel's disk cache that will be used for PostgreSQL? data files). It is set roughly to the 1/4 of the amount of memory available.
Users
PGSQL configures PostgreSQL to allow any valid postgres user ('postgres' is preconfigured and could be used here) to access the database without checking password of authenticating in any other way - locally or remotely. This is accomplished by using a
pg_hba.conf file that is set up to trust all users who connect to the database either locally within the appliance or remotely through the
in terminal (only connections through the
in terminal are accepted). The client of PGSQL has to connect under 'postgres' user, or create own roles and users granting appropriate rights to them.
pg_hba.conf should contain the following:
# TYPE DATABASE USER CIDR-ADDRESS METHOD
local all all trust
host all all 0.0.0.0/0 trust
Log Files
PostgreSQL Error Log
The user of PGSQL can configure the appliance to use a database error log. If an error log is used, PGSQL expects there to be a share named
share accessed through the
log terminal (typically exposed by file storage appliances such as NAS). PGSQL creates the mount point
/mnt/log and mounts
share to the mount point using the Linux
mount utility. The share is mounted using the samba client, assuming full access and as the user "nobody".
The database error log is used for one main purpose - to log PostgreSQL warnings, errors and debug information (debug information includes things like SQL commands executed, user connection/disconnection, etc). The logging contents are controlled through PGSQL's
log_level property. In addition, the user may control the life-cycle of the log (either by time or log size) in order to prevent the log from eating up all available diskspace. This is important as logging debug information while many users are executing queries can take up a lot of diskspace.
PGSQL Log
Besides the database error log, PGSQL also maintains its own log file in
/var/log/appliance/log. This log file contains informational messages about what happened during PGSQL start and shutdown (such as if it started in maintenance mode, created a new database, etc). This is useful if the user wants to know what PGSQL did during start-up. It is also useful when PGSQL fails to start, this log likely will contain an error message pertaining to why the appliance failed to start.
Controler log
PGSQL appliance logs some critical events to Controller log using vme tool; most significant (id=alert, severity=fatal, error or warn) are exposed to dashboard. Please refer to
PGSQL data sheet, sections "Possible startup errors" and "Dashboard messages" for more details.
Appliance Implementation Details
The PGSQL appliance contains the following files in the
/appliance folder on its boot volume. These files are used to implement the appliance:
| File | Type | Description |
appliance.sh | Bash script | Appliance main controlling script - invoked when the appliance is booted. |
conf/postgresql.conf.tmpl | Text | Preconfigured template file to replace /mnt/data/pg_data/postgresql.conf |
conf/pg_hba.conf | Text | Preconfigured file to replace /mnt/data/pg_data/pg_hba.conf |
check_disk_space.sh | Bash script | Script that checks for free space available on data volume and reports to dashboard |
feed_pgsql_statistics.sh | Bash script | Script that reports the PostgreSQL? statistics to the MON |
Important Notes
- All PostgreSQL utilities must be executed as the
postgres user: su -l postgres .
- PostgreSQL requires full access to the
/tmp folder (777).
- The
/mnt/data folder must be owned by postgres:postgres.
- NAS appliance where the log files are stored must be started before PGSQL
--
EricT - 12 Mar 2006