Install soda-core along with the warehouse-specific package (e.g., soda-core-snowflake, soda-core-bigquery); create a configuration YAML file with your data source connection details and optionally a Soda Cloud API key pair
Write checks in a SodaCL file using built-in metrics such as 'row_count > 0', 'missing_percent(email) < 1%', or 'duplicate_count(order_id) = 0'; SodaCL supports over 25 built-in metrics and custom SQL expressions
Invoke the scan programmatically using the Soda Library Python API: instantiate a Scan object, call scan.set_data_source_name(), scan.add_configuration_yaml_file(), scan.add_sodacl_yaml_file(), then scan.execute()
Inspect results via scan.get_scan_results_json() or scan.assert_no_checks_fail() which raises SodaScanError if any check fails; use this in CI/CD pipelines to block deployments on data quality failures
To push results to Soda Cloud for trend tracking, include the cloud_api_key_id and cloud_api_key_secret in configuration YAML; Soda Cloud stores historical scan results and supports alert notifications
Known gotchas
Soda v3 introduced breaking changes from Soda Core 3.0 including new CLI commands and a restructured Python API; scripts written for Soda Core 2.x will not work without modification
The scan.execute() call is synchronous and blocks until completion; for large tables the scan can take several minutes — run it in a separate thread or async context if you need non-blocking behavior in an orchestration DAG
SodaCL check thresholds use a specific comparison syntax (e.g., 'missing_count(col) < 5' not '< 5 missing_count'); incorrect syntax produces a parse error at scan.execute() time, not at YAML load time
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp