Create a Datastream connection profile for the source database (Postgres or MySQL), providing host, port, credentials, and SSL configuration; test the connectivity from the Datastream UI or gcloud CLI.
Enable the required source prerequisites: for Postgres, set wal_level=logical and create a replication slot and publication; for MySQL, enable binary logging with binlog_format=ROW.
Create a Datastream stream, selecting the connection profile, the target BigQuery dataset, and the tables to replicate; choose backfill mode (automatic for initial snapshot, or manual to skip history).
Datastream writes change events into BigQuery using the Storage Write API in a merge-mode destination table; each table gets _metadata_timestamp, _metadata_operation, and other CDC system columns alongside the row data.
Enable the BigQuery destination merge mode to have Datastream automatically consolidate inserts, updates, and deletes into a single up-to-date view of the table rather than an append-only changelog.
Monitor stream status, latency, and error logs in the Datastream console; set up Cloud Monitoring alerts on throughput drops or replication lag exceeding your SLA.
Known gotchas
Datastream requires a dedicated replication user with SELECT on all replicated tables, REPLICATION CLIENT, and REPLICATION SLAVE (MySQL) or REPLICATION privileges (Postgres); least-privilege setups often miss one of these.
BigQuery merge mode incurs DML costs for each merged row; high-churn tables with many updates can generate unexpectedly large query costs compared to append-only ingestion.
If the Postgres replication slot grows large (e.g., Datastream is paused), WAL accumulates on the source disk; set a slot lag alert on the source and resume or drop the slot if it grows beyond acceptable bounds.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp