Stage your data files in S3 in the same AWS region as your Redshift cluster; split large files into multiple roughly equal-sized files (one per Redshift slice, typically 1–3 files per node) to maximize parallel loading
Prefer columnar compressed formats: Parquet or ORC for mixed workloads, or gzip/snappy-compressed CSV for simple pipelines; COPY auto-detects gzip and bz2 compression
Run the COPY command: COPY <table> FROM 's3://<bucket>/<prefix>' IAM_ROLE '<role-arn>' FORMAT AS PARQUET; for CSV use DELIMITER, IGNOREHEADER, and DATEFORMAT options as needed; the IAM role must have s3:GetObject and s3:ListBucket on the source bucket
Use a manifest file (JSON listing S3 object URLs) for COPY when loading specific files rather than an entire prefix; this prevents accidentally loading new files added to the prefix mid-load
After large COPY operations, run VACUUM and ANALYZE on the target table to reclaim space from deleted rows and update query planner statistics; Redshift does not auto-vacuum during active COPY loads
Monitor load errors in the STL_LOAD_ERRORS system table and load progress in STV_LOAD_STATE; errors are not surfaced in the COPY return value — always check STL_LOAD_ERRORS after a load
Known gotchas
COPY is transactional — if any file fails to load, the entire COPY statement rolls back; check STL_LOAD_ERRORS to identify and fix bad rows, then reload
Loading into a table without a proper SORTKEY and DISTKEY causes all data to land on the leader node slice and then redistribute, eliminating parallelism benefits — define these keys based on your most frequent join and filter columns
Using a single large file instead of multiple files means only one slice does the loading work; for a 16-slice cluster this can be 16x slower than splitting the file
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp