AppLogic 2.4 Documentation The latest production release is AppLogic 3.0.30
PGSQL64 - PosgreSQL Database Appliance - Implementation
Base Class
The LUX64 appliance image is used as the base class for PGSQL64 appliance.
Class Volumes
PGSQL64 itself is not sensitive to the volume configuration and will work on most any setup. Besides the PostgreSQL binaries, it does not require any additional software to be able to start. Samba client is required for logging. The setup chosen for the 1.0 release is as follows:
- boot volume, 125MB size, mounted as "/", writeable, instantiable.
- usr volume, 212MB size, mounted as "/usr", common
The boot volume contains a basic Linux installation as the one present on the LUX64 appliance.
In summary, PGSQL's boot volume contains:
- CentOS 5, as base OS image (as installed on the LUX64 appliance)
- PostgreSQL database engine version 8.3.3, for data storage and query processing
- Samba client, for accessing files through the
log terminal
- sshd, used for logging into the component through the default interface
In addition to the class volume, PGSQL64 requires a user-supplied data volume that is used to store the actual database:
- data volume, user-defined size, mounted as "/mnt/data", writeable, placeholder.
Packages
- Main
-
postgresql-libs-8.3.3-1PGDG.rhel5.x86_64.rpm: PostgreSQL client shared libraries
-
postgresql-8.3.3-1PGDG.rhel5.x86_64.rpm: PostgreSQL database utilities
-
postgresql-server-8.3.3-1PGDG.rhel5.x86_64.rpm: PostgreSQL database engine
-
postgresql-test-8.3.3-1PGDG.rhel5.x86_64.rpm: PostgreSQL tests package
- Dependencies - Samba Client (used for the PostgreSQL error log)
-
samba-common-3.0.28-1.el5_2.1.x86_64.rpm: samba shared libraries
-
samba-client-3.0.28-1.el5_2.1.x86_64.rpm: samba client
PGSQL64 does not modify PostgreSQL despite configuration templates; PGSQL64 uses the PostgreSQL binary RPM installation
Contents of the release directory:
Exported appliance image:
PostgreSQL? Binary RPMs: install these in the order listed to re-create the appliance:
-
postgresql-libs-8.3.3-1PGDG.rhel5.x86_64.rpm
-
postgresql-8.3.3-1PGDG.rhel5.x86_64.rpm
-
postgresql-server-8.3.3-1PGDG.rhel5.x86_64.rpm
-
postgresql-test-8.3.3-1PGDG.rhel5.x86_64.rpm
-
samba-common-3.0.28-1.el5_2.1.x86_64.rpm
-
samba-client-3.0.28-1.el5_2.1.x86_64.rpm
PGSQL64 automated build script: use this to automatically build PGSQL64 appliance
-
pgsql64-build.1.0.0-01.tar.bz2
PGSQL64 automated test suite: use this to test PGSQL64 appliance
-
pgsql64-test.1.0.0-01.tar.bz2
Source RPMs:
-
postgresql-8.3.3.src.tar.bz2 - full PostgreSQL sources (including regression and pgbench tests)
Theory of Operation
PostgreSQL consists of one main background daemon called postmaster (known as the "master process"). This daemon is loaded in the background when PGSQL64 is started. postmaster waits for incoming connections from clients and for each new connection, it spawns a new server process named postgres (known as a "server task" - in reality this is the same executable as postmaster). PostgreSQL is based on a client/server model where each client process is connected to exactly one server process. The server tasks communicate with each other using semaphores and share memory to ensure data integrity for concurrent data access.
In addition to the daemon above, there are several other processes that PostgreSQL runs in the background when postmaster is started:
- a writer process - used to perform checkpoints, flush dirty pages to disk
- a logger process
- stats collector process
PostgreSQL clients must use the PostgreSQL protocol to communicate with the postmaster daemon. PostgreSQL provides a "C" library named libpq which is used by a majority of the clients (for example, pg_regress and pgbench used in the PGSQL64 tests both use this library). In addition, there are other implementations of the protocol for many other types of clients such as JDBC.
Configuration
General
PostgreSQL is configured using a file called postgresql.conf which contains all of the PostgreSQL configuration settings. PGSQL64 stores modified template config files (postgresql.conf.tmpl, pg_hba.conf ) files at /appliance/conf directory. There are multiple PGSQL64 properties that affect multiple PostgreSQL settings; start script recreates the main config file (postgresql.conf) using the template file and properties supplied by AppLogic.
When PGSQL64 is started, it generates the configuration file and then starts the postmaster daemon. This is the only daemon to start for PostgreSQL. postmaster; It is started using modified /etc/init.d/postgresql script or /appliance/appliance.sh script
The table below shows the mapping of PGSQL64 properties to PostgreSQL settings. For any PostgreSQL setting not in the table and not mentioned in this section, PGSQL64 leaves it as its default value.
| PGSQL64 Property | PostgreSQL Setting in postgresql.conf | Notes |
auto_create | Should be enabled to create a fresh database if it does not exist at /mnt/data/pg_data; needs read_only to be disabled (look below) |
read_only | autovacuum |
max_connections | max_connections |
query_complexity | work_mem | Configured as described below. |
sync_on_write | fsync synchronous_commit full_page_writes | If read_only is off and sync_on_write is on - all are on otherwise off. |
log_filename | log_directory log_filename logging_collector | If log_filename is empty, path settings are empty strings and logging_collector is off; otherwise they are set accordingly. |
log_age | log_rotation_age |
log_size | log_rotation_size |
log_level | log_min_messages log_min_error_statement client_min_messages log_min_duration_statement log_connections log_disconnections log_duration | log_level setting for error or warn is propagated accordingly; debug affects all settings after client_min_messages. |
log_cmd_stats | track_activities | Required to be "on" for auto-vacuum. |
In addition to the above settings, PGSQL64 modifies the following PostgreSQL settings:
| !PostgreSQL Setting in postgresql.conf | Notes |
data_directory | Set to /mnt/data/pg_data |
listen_addresses | Set to 'in,localhost' (terminal on which to accept connections and local loopback address) |
shared_buffers | Calculated based on assigned memory - defined below. |
maintenance_work_mem | Calculated based on assigned memory - defined below. |
work_mem | Calculated based on assigned memory - defined below. |
effective_cache_size | Calculated based on assigned memory - defined below. |
log_statement | Set to all when log_level is set to "debug" (so all SQL statements will be logged if enabled). |
Auto-creation
When PGSQL64 is configured to auto-create the database (auto_create is on, read_only is off), it first checks whether the database exists on the configured data volume. PostgreSQL does not provide a way to determine whether a database is valid. As such, PGSQL64 checks the volume for a file named PG_VERSION (PostgreSQL version file) and base/template1 folder. If this file exists and base/template1 folder is not, PGSQL64 assumes there is a PostgreSQL database present on the data volume. If the file is missing, PGSQL64 deletes the entire volume content and installs a brand new PostgreSQL database on the volume. If file AND folder both present, PostgreSQL appliance checks for read_only and auto_create properties state. It removes old files and create fresh database if allowed, or stops operating and sends 'start_failed' message to controller if not.
Shared_buffers
This is an amount of memory that the database process is using for shared buffers. The exact amount of memory is calculated based on total amount of memory available to the PGSQL64. Approximate values of shared memory configured are shown below:
| Total memory: | 128Mb | 192Mb | 256Mb | 384Mb | 512Mb | 1Gb | 1.5Gb | 2Gb | 3Gb | 4Gb | 6Gb | 8Gb | 16Gb | 32Gb |
| Shared buffers: | 16Mb | 24Mb | 32Mb | 48Mb | 64Mb | 128Mb | 192Mb | 256Mb | 384Mb | 512Mb | 768Mb | 1Gb | 2Gb | 4Gb |
Work_mem
This is amount of memory that PostgreSQL? uses for every connection (sort operations, hash tables) before switching to the temporary disk files. Complex queries require more memory. Work_mem is calculated by PGSQL64 based on total amount of memory available, max_connections and query_complexity setting. Default setting is 1Mb and the maximum is 16Mb. This setting can be changed per-connection. Amount assigned is doubled when query_complexity is set to "complex"
Maintenance_work_mem
This setting specifies the amount of memory used for maintenance operations, like VACUUM, ANALYZE, CREATE INDEX, and adding foreign keys. Since only one of these operations can be executed at a time by a database session, and an installation normally doesn't have very many of them happening concurrently, PGSQL64 sets this to double of work_mem.
Effective_cache_size
This setting sets the optimizer's assumption about the effective size of the disk cache (that is, the portion of the kernel's disk cache that will be used for PostgreSQL? data files). It is set roughly to the 1/4 of the amount of memory available.
Users
PGSQL64 configures PostgreSQL to allow any valid postgres user ('postgres' is preconfigured and could be used here) to access the database without checking password of authenticating in any other way - locally or remotely. This is accomplished by using a pg_hba.conf file that is set up to trust all users who connect to the database either locally within the appliance or remotely through the in terminal (only connections through the in terminal are accepted). The client of PGSQL64 has to connect under 'postgres' user, or create own roles and users granting appropriate rights to them.
pg_hba.conf should contain the following:
# TYPE DATABASE USER CIDR-ADDRESS METHOD
local all all trust
host all all 0.0.0.0/0 trust
Log Files
PostgreSQL Error Log
The user of PGSQL64 can configure the appliance to use a database error log. If an error log is used, PGSQL64 expects there to be a share named share accessed through the log terminal (typically exposed by file storage appliances such as NAS). PGSQL64 creates the mount point /mnt/log and mounts share to the mount point using the Linux mount utility. The share is mounted using the samba client, assuming full access and as the user "nobody".
The database error log is used for one main purpose - to log PostgreSQL warnings, errors and debug information (debug information includes things like SQL commands executed, user connection/disconnection, etc). The logging contents are controlled through PGSQL64='s =log_level property. In addition, the user may control the life-cycle of the log (either by time or log size) in order to prevent the log from eating up all available diskspace. This is important as logging debug information while many users are executing queries can take up a lot of diskspace.
PGSQL64 Log
Besides the database error log, PGSQL64 also maintains its own log file in /var/log/appliance/log. This log file contains informational messages about what happened during PGSQL64 start and shutdown (such as if it started in maintenance mode, created a new database, etc). This is useful if the user wants to know what PGSQL64 did during start-up. It is also useful when PGSQL64 fails to start, this log likely will contain an error message pertaining to why the appliance failed to start.
Controler log
PGSQL64 appliance logs some critical events to Controller log using vme tool; most significant (id=alert, severity=fatal, error or warn) are exposed to dashboard. Please refer to PGSQL64 data sheet, sections "Possible startup errors" and "Dashboard messages" for more details.
Appliance Implementation Details
The PGSQL64 appliance contains the following files in the /appliance folder on its boot volume. These files are used to implement the appliance:
| File | Type | Description |
appliance.sh | Bash script | Appliance main controlling script - invoked when the appliance is booted. |
conf/postgresql.conf.tmpl | Text | Preconfigured template file to replace /mnt/data/pg_data/postgresql.conf |
conf/pg_hba.conf | Text | Preconfigured file to replace /mnt/data/pg_data/pg_hba.conf |
check_disk_space.sh | Bash script | Script that checks for free space available on data volume and reports to dashboard |
feed_pgsql_statistics.sh | Bash script | Script that reports the PostgreSQL? statistics to the MON |
Important Notes
- All PostgreSQL utilities must be executed as the
postgres user: su -l postgres .
- PostgreSQL requires full access to the
/tmp folder (777).
- The
/mnt/data folder must be owned by postgres:postgres.
- NAS appliance where the log files are stored must be started before
PGSQL64
Copyright © CA 2005-2011. All Rights Reserved.