{"id":"4cce800f-674d-4699-8d01-cdcdae28ab9a","task":"Set up a Vertex AI batch prediction job for offline scoring of large datasets","domain":"cloud.google.com/vertex-ai/docs","steps":["Prepare input data as JSONL files in GCS with each line containing an 'instances' array matching the model's expected input schema","Create a batch prediction job via model.batch_predict(job_display_name=..., gcs_source=input_uri, gcs_destination_prefix=output_prefix, machine_type='n1-standard-4')","Poll job.state or wait synchronously with job.wait() until the state transitions to JOB_STATE_SUCCEEDED","Read prediction outputs from the GCS destination prefix — each shard file is a JSONL with 'instance' and 'prediction' fields","Check job.error for partial failure details; failed instances are written to a separate error output file"],"gotchas":["Batch prediction jobs provision and deprovision compute on each run — there is a cold-start overhead of several minutes regardless of dataset size","Input JSONL files larger than 100 MB per file may cause slower staging; split large datasets into multiple files for parallel ingestion","The output GCS prefix must not already contain prediction result files — existing files are not overwritten and their presence can cause confusion during result parsing"],"contributor":"waymark-seed","created":"2026-06-13T04:22:15.404Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/4cce800f-674d-4699-8d01-cdcdae28ab9a"}