> ## Documentation Index
> Fetch the complete documentation index at: https://www.aptible.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Replicate PostgreSQL to BigQuery with Datastream

> Configure Google Datastream to replicate changes from an Aptible PostgreSQL database to BigQuery.

This guide shows how to build a Change Data Capture (CDC) pipeline from an Aptible PostgreSQL Database to Google BigQuery using Google Datastream.

```text theme={null}
Aptible PostgreSQL -> Aptible Database Endpoint -> Google Datastream -> BigQuery
```

The Datastream connection uses a Database Endpoint with IP filtering and TLS certificate verification. PostgreSQL changes are read from the write-ahead log (WAL) using the `pgoutput` logical replication plugin.

## Prerequisites

* Administrator access to the Aptible Environment
* The [Aptible CLI](/reference/aptible-cli/overview)
* A Google Cloud project with billing enabled
* The Datastream API and BigQuery API enabled in Google Cloud
* A PostgreSQL Database on Aptible

## Step 1: Configure Aptible Network Access

Create a [Database Endpoint](/core-concepts/managed-databases/connecting-databases/database-endpoints) so Datastream can connect to your Database.

<Steps>
  <Step title="Create a Database Endpoint">
    In the Aptible Dashboard, navigate to the Database, open the **Endpoints** tab, and create a TCP Endpoint. Keep the Endpoint host and port available for the Google Datastream setup.
  </Step>

  <Step title="Get Google Datastream IP addresses">
    In Google Cloud, navigate to **Datastream** > **Connection Profiles** > **Create Profile**. Select PostgreSQL, choose the region where the stream will run, select **IP allowlisting** as the connectivity method, and copy the static IP addresses Google provides.
  </Step>

  <Step title="Allowlist Google IP addresses">
    Return to the Aptible Database Endpoint and add the Google Datastream static IP addresses to the Endpoint IP allowlist.
  </Step>
</Steps>

<Warning>
  Enable IP filtering on the Database Endpoint. Without IP filtering, the Database Endpoint is reachable from the public internet.
</Warning>

## Step 2: Configure PostgreSQL Replication

Use the default `aptible` database user to configure logical replication.

Create a local tunnel to your Database:

```bash theme={null}
aptible db:tunnel YOUR_DATABASE_HANDLE
```

Copy the connection URL from the tunnel output and connect with `psql` or another PostgreSQL client. Then run:

```sql theme={null}
CREATE PUBLICATION datastream_publication FOR ALL TABLES;

SELECT pg_create_logical_replication_slot('datastream_slot', 'pgoutput');

ALTER SYSTEM SET wal_sender_timeout = 0;

SELECT * FROM pg_replication_slots;
```

<Info>
  `wal_sender_timeout = 0` prevents idle replication connections from timing out when Datastream keeps the WAL stream open through the Aptible load balancer.
</Info>

## Step 3: Retrieve the Aptible CA Certificate

Download the Environment CA certificate so Google Datastream can verify the Database Endpoint certificate:

```bash theme={null}
aptible environment:ca_cert YOUR_ENVIRONMENT_HANDLE > aptible_ca.pem
```

You will upload `aptible_ca.pem` when creating the PostgreSQL source connection profile in Datastream.

## Step 4: Create Datastream Connection Profiles

### PostgreSQL Source

Create a PostgreSQL source connection profile in Google Datastream with these values:

* **Hostname**: the Aptible Database Endpoint hostname, such as `elb-xxx.aptible.in`
* **Port**: the Aptible Database Endpoint port
* **Username**: `aptible`
* **Password**: the Database credential password
* **Database name**: `db`, unless your Database uses a different name
* **Encryption**: Require SSL/TLS
* **Server verification**: enabled
* **CA certificate**: upload `aptible_ca.pem`
* **Connectivity**: IP allowlisting

Run the connection test before creating the profile.

### BigQuery Destination

Create a BigQuery destination connection profile in Datastream and select the Google Cloud project where replicated data should be written.

## Step 5: Create and Start the Stream

In Google Datastream, create a stream with the PostgreSQL source profile and BigQuery destination profile.

Use these source configuration values:

* **Replication slot name**: `datastream_slot`
* **Publication name**: `datastream_publication`

Choose the destination configuration and staleness limit that match your analytics requirements, then start the stream.

## Step 6: Validate Replication

After Datastream initializes and performs the initial backfill, open BigQuery and confirm the dataset and tables are present. You can run a query like:

```sql theme={null}
SELECT * FROM `your-project.your_dataset.your_table` LIMIT 100;
```

If the stream fails to connect, verify the Database Endpoint host and port, IP allowlist, TLS settings, and uploaded Aptible CA certificate.
