Query the snapshots table to see all committed snapshots: SELECT snapshot_id, committed_at, operation, summary FROM my_catalog.db.events.snapshots.
Query the manifests table to see manifest files for the current snapshot: SELECT path, partition_spec_id, added_data_files_count, existing_data_files_count FROM my_catalog.db.events.manifests.
Query the files table to inspect individual data files, their sizes, and record counts: SELECT file_path, file_format, record_count, file_size_in_bytes FROM my_catalog.db.events.files.
Query the history table to trace snapshot lineage and parent-child relationships: SELECT made_current_at, snapshot_id, parent_id FROM my_catalog.db.events.history.
Query the partitions table for partition-level statistics: SELECT partition, record_count, file_count FROM my_catalog.db.events.partitions ORDER BY record_count DESC.
Known gotchas
Metadata tables reflect the current snapshot by default; to inspect a historical snapshot's files use the time-travel syntax: SELECT * FROM my_catalog.db.events.files VERSION AS OF <snapshot_id>.
The files metadata table can be expensive to query on large tables because it reads all manifest files; limit queries with LIMIT or filter on partition columns when exploring large tables.
Metadata table syntax uses a dot-separated suffix (e.g., .snapshots, .files); this syntax is catalog-specific and may not work outside of Spark unless the catalog or query engine explicitly supports it.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp