Setting up PGCluster (Guide)

This post is a guide on how to install, configure, and run PGCluster 1.9rc7 (a synchronous replication tool for PostgreSQL), on Centos.

It assumes that you are going to be running PGCluster on at least three distinct machines, and that you have a general understanding of its architecture. It’s key contribution is some scripts which automate the task of modifying config files for each of the three types of cluster node.

Step 1. Download, Compile, and Install

Run the following commands to download and extract the PGCluster source:

Once you’ve done this, go into the extracted directory and run the following commands (as root/with sudo):

Step 2: Create Config Files

I created a couple of scripts to generate the config files required on each cluster node. To use these, follow the instructions below (or do it manually as per this guide):

  1. Download the scripts from here.
  2. Unzip.
  3. Edit the config.cfg file with the parameters for your system. You must add your own values for the following variables:
    1. IP_ADDRESSES: IP addresses of all machines in the cluster. This is used to update PostgreSQL’s settings specifying what machines incoming connections will be allowed from.
    2. LOAD_BALANCER_NODE: The hostname (or IP address) of the node which will run the load balancer.
    3. REPLICATION_SERVER_NODE: The hostname (or IP address) of the node which will run the replication server.
    4. DATA_NODES: The hostnames (or IP addresses) of the nodes which will run data servers.
  4. Copy this config file (and the rest of the scripts) onto each node in the cluster.
  5. On the node that is to be the load balancer, run
  6. On the node that is to be replication server, run
  7. On the node(s) that are to be the data servers, run

Step 3: Start PGCluster

Before running any of the following commands, make sure that you are logged in as the database user (type ‘su postgres‘ if you aren’t).

To start each data server:

To start the replication server:

To start the load balancer:

I’ve read other instructions that say you should start the replication server before the data servers. This didn’t work for me, but if things start going wrong I’d try starting the replication server first.

To get more informative output from the replication server or load balancer add the arguments -n -v, for verbose debugging. If at any time you want to stop these processes running you can use the following commands:

Step 4: Start Querying!

Provided there were no errors during the last step, you should now be able to start using the cluster. You can send queries to the data nodes directly without breaking replication, but the system is structured so that you send queries to the load balancer.

The simplest way of testing that the system is working is to create a new database using the createdb command:

If this works you can login to the interactive console and start creating tables and issuing queries:

What to do when nothing else works…

I found this blog post on setting up PGCluster very useful, though the advice on postgresql.conf doesn’t seem to work anymore (my scripts do something different). The instructions I’ve given here follow the same approach, but I have included scripts in step 2 to automatically create the necessary config files.

The official install instructions contain a number of mistakes (specifically, in describing where config files should be placed and what config files are needed).

I came across two main sources of error when working through this:

  1. The replication server stating that it couldn’t connect to either of the data servers. This was due to an incorrectly configured pg_hba.conf file, which specifies what incoming connections are allowed.
  2. Postgresql FATAL: user “root” does not exist”. This happens if you try to run one of the services as the wrong user. You need to be logged in as the user which owns the database folder.
  3. ERROR:load_balance_main():PGR_Create_Recv_Socket failed. This happened when, after seeing the error in no. 2, I logged in as the correct user and tried starting one of the services. The service I’d tried to start as the wrong user was still running, and already listening on the PostgreSQL port.
  4. PGRget_Pgrp_Conf_Data error. This happens when the process can’t find the configuration file it needs when starting up. This will happen when you specify the wrong folder for the configuration file, and when the database user doesn’t have permissions to open the configuration file.
  5. WARNING:  This query is not permitted without running replication server. This happens when the data node has not been listed in the replication server’s configuration file.