Choose the Record-Level Index: set hoodie.index.type=RECORD_INDEX in your Hudi write configuration; RLI stores a mapping of record key to file group location in a dedicated metadata table partition, enabling O(1) lookup without scanning all partition files
Enable the Hudi metadata table as a prerequisite: hoodie.metadata.enable=true and hoodie.metadata.record.index.enable=true; the metadata table must be bootstrapped on the first write or through a metadata initialization job for existing tables
On the first write with RLI enabled, Hudi initializes the record index by scanning all existing data files to build the mapping; this is a one-time cost — allow extra time for large existing tables
Verify RLI is active by checking the .hoodie/metadata directory for a record_index partition; subsequent upserts should show reduced lookup time in the write metrics (hoodie_write_*_lookup_duration metrics if emitting to your metrics system)
For Spark, ensure the Hudi Spark bundle version supports RLI (added in Hudi 0.14+); earlier index types like BLOOM or SIMPLE remain available for compatibility but have higher per-file scanning costs on large tables
Known gotchas
RLI increases metadata table size proportional to total record count; for very large tables (billions of records) the metadata table itself requires storage and read capacity — plan accordingly
If the metadata table becomes inconsistent (e.g., due to a failed write), RLI lookups may return stale file locations causing missed upserts or duplicate inserts; use the Hudi CLI validate-metadata command to check consistency and repair if needed
RLI is incompatible with Hudi's PARTITIONED and GLOBAL_BLOOM indexes on the same table; choose one index type for the table lifetime and avoid switching after the table is populated
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp