Metadata automation, 60× faster
Quick Facts
- Industry: AdTech / data-connectivity SaaS
- Role: Senior Consultant — OneStream platform lead
- Timeline: ~4 months, three optimisation passes
- Team: Solo build, architect-reviewed; two finance analysts validating outputs
- Impact: End-to-end metadata cycle cut from ~2 hours to under 2 minutes (60×). Manual governance retired.
Overview
The client runs a OneStream platform spanning four cubes — consolidation, planning, reporting, and a downstream data-mart feed — with dimensional complexity that grows every quarter as the business adds new properties, accounts, and partner segments. The metadata that ties those cubes together (account hierarchies, entity rollups, custom dimension properties) was maintained by hand. Each change set took an analyst the better part of a day and broke something downstream roughly once a sprint. We replaced the manual cycle with a custom automation framework, then spent the next three months making it fast: 2h → 10min → <2min, a 60× improvement, with zero correctness regressions in production.
This is the story of those three iterations: what we measured, what we changed, and which mental model unlocked the final 5× jump.
The Problem
OneStream's strength is also its trap. Each cube can have its own dimensions, rules, and rollup logic; that gives you per-domain modelling power, but it also means metadata changes have to propagate carefully or downstream consumers see partial state. On this platform a single new account had to land in the consolidation cube, the planning cube's Account dimension, the reporting cube's flattened view, and the data-mart feed's export shape — in that order, with full hierarchy fidelity at each hop.
What was happening before:
- Manual edits, ~14 touchpoints per cycle. An analyst opened the dimension editor, added the account in cube A, copy-pasted properties into cubes B and C, ran a smoke check, exported a flattened view for the data-mart, then mailed the platform team a "done" note. Anything missed surfaced as a stale rollup three days later in a finance review.
- No automated regression check. Each change relied on the analyst's memory of what to verify. "Looks right" was the test.
- Release cadence gated by the cycle. The team couldn't push more than one metadata change a week without queueing.
The brief: build an end-to-end framework that does the full lifecycle — create, propagate, validate, export — with one input file and zero human touch in the middle. The hard constraints:
- Hierarchy-aware. Parent-child relationships and rollup structures must be respected at every hop, including conditional rollups that depend on properties on the child member.
- Downstream cubes see only finalised outputs. No partial state. If iteration N fails halfway, cube N+1 must not have been touched.
- Zero breaking changes to existing business rules. The team had ~200 rules in production; we couldn't ask them to refactor any of them to accommodate our framework.
The v1 cut of the framework shipped at the four-month mark and ran the full cycle end-to-end. It was correct. It also took 2 hours and 4 minutes to complete, which was worse than the manual process for small change sets. That was the starting point for the optimisation work.
Process
Iteration 1 — 2h → ~10min (profile, then batch)
The first instinct was to instrument. OneStream's Business Rules logging gives you per-rule timing if you ask for it, and the wider Application Log captures dimension-fetch operations with timestamps. I turned both up to verbose and ran a representative change set (12 new accounts, 3 new entities, 4 property updates), then loaded the log into a spreadsheet and drew the timing histogram.
The pattern jumped out immediately: roughly 80% of wall-clock time was spent in redundant metadata fetches. The v1 framework had been written defensively — each transformation re-read the dimension state at the start of its work to "make sure" nothing had drifted underneath. With four cubes and twelve transformations per cube, that was 48 full dimension reads per cycle, against dimensions with 6,000–14,000 members each.
The fix had three moves:
- Batch the fetches. Read each cube's dimension state once at the top of the cycle, hold it in an in-memory map keyed by
cube → dimension → member, and let every transformation read from that map. Stale-state risk was traded for an end-of-cycle reconciliation: if the post-cycle state didn't match what the in-memory map predicted, the framework rolled back and re-ran the cycle from a fresh fetch. - Eliminate cross-cube round-trips inside a single dimension's transformation. Some transformations were calling out to a sibling cube to look up a property they could compute from local state with one extra rule. Inlining those saved 4–6 cross-cube hops per dimension.
- Pre-warm the export shape. The data-mart feed was being constructed row-by-row at export time. We moved the shape construction to a single pass over the in-memory map, which turned the export step from a 30-minute tail into a ~90-second one.
End of iteration 1 the cycle ran in ~10 minutes. The histogram was now flat-ish, with no single transformation dominating.
Iteration 2 — 10min → ~2min (selective skip)
Flat histograms are misleading. They tell you no single step is pathological, but they don't tell you whether you're doing necessary work. I went back to the rule logging and asked a different question: for each cascading calculation, did its downstream cascade actually change anything?
The answer was no, most of the time. A property update on a leaf member would trigger a re-roll of its parent, its grandparent, and so on up the hierarchy — but in 70%+ of cases the rolled-up totals were identical to what they had been before the update. The framework was walking the hierarchy and recomputing values that didn't need to change, because it had no way to know in advance whether a leaf-level change was material to the rollup.
The unlock was a selective-skip evaluator. Before each intermediate transformation, the framework now computes a cheap signature — a hash of the inputs that would feed into the rollup — and compares it to the signature stored from the last cycle. If they match, the transformation is skipped and its downstream cascade is short-circuited. The hash is conservative on the side of correctness: any property that could affect the rollup is included, so false negatives (incorrectly skipping a needed transformation) are not possible by construction. False positives (running a transformation when we could have skipped) are fine — they just cost time.
The skip logic also had to interact with the multi-cube export. We needed the guarantee that downstream cubes never see partial state, which meant the export shaper had to be aware that some transformations had been skipped and treat their outputs as "stable from last cycle" rather than "needs recomputation." We added a freshness flag on each transformation's output in the in-memory map, and the export shaper now only re-emits rows whose freshness changed.
End of iteration 2: ~2 minutes for the representative change set. The framework was now doing the minimum work the change-set demanded, plus the unavoidable fixed cost of the four cube traversals.
Iteration 3 — <2min, steady state (align with hierarchy traversal order)
The remaining ~2 minutes was almost entirely the four cube traversals, in serial. They had to be serial because cube B's transformations consumed cube A's outputs, and so on. But within each cube, the order in which we walked the dimension's members affected whether the rollup engine had to recompute parents before children or after.
OneStream's rollup engine is happiest when you walk a dimension in post-order traversal — children first, parents last — so that by the time the engine sees a parent it already has the finalised child totals in cache. The v1 framework had been walking in alphabetical order, which is fine for correctness but forces the engine to compute parents speculatively and revise when children arrive.
Iteration 3 swapped the walk order to post-order on each dimension and aligned the export shaper to emit in the same order. That cut the per-cube wall-clock by another 20–30 seconds and brought the steady-state cycle to under 2 minutes for typical change sets.
We also added a final correctness check: an automated regression diff that byte-compares the export output against the previous cycle's output, flagging any row that changed unexpectedly. This was the one piece I'd add earlier if I did it again — having it in iteration 1 would have saved a week of manual spot-checking.
Solution
The framework that came out of those three iterations has four components:
1. Dimension-aware fetch layer
Reads each cube's dimension state once at the start of the cycle into a typed in-memory map. Exposes a get(cube, dim, member) interface that every transformation uses instead of touching the platform directly. Handles the reconciliation pass at end-of-cycle and rolls back to a fresh read if the predicted state and observed state diverge.
2. Hierarchy-skip evaluator
Computes a conservative input signature for each transformation, compares against the previous cycle's signature, and decides skip-vs-run. Maintains the per-transformation freshness flag that the export shaper consumes downstream. Designed to be unable to produce false negatives — the worst case is wasted work, never incorrect output.
3. Export shaper
Walks each cube's dimensions in post-order, consumes the in-memory map (filtered by the freshness flags), and emits the data-mart feed in a single pass. The post-order walk is what aligned with OneStream's native rollup behaviour and gave us the iteration-3 speedup.
4. Safe-mode + regression diff
If any unexpected hierarchy structure shows up — a member with no parent that should have one, a circular reference, a property that fails the type guard — the framework short-circuits to safe-mode: rolls back the in-memory map, emits a partial-state warning to the platform team's mailbox, and exits with a non-zero status so the calling automation knows not to proceed. The byte-level regression diff runs at the end of every successful cycle and surfaces any unexpected row deltas.
Everything's wrapped in an Extender rule type so there's no external middleware dependency — the framework runs entirely within OneStream and can be triggered by the platform's native scheduling or by an analyst clicking a dashboard button.
Results
| Metric | Before (manual) | v1 (correct) | v3 (steady state) | Change |
|---|---|---|---|---|
| End-to-end runtime | ~6h (analyst day) | 2h 04min | <2min | 60× vs v1, ~180× vs manual |
| Manual touchpoints per cycle | ~14 | 2 (input + review) | 0 (input-only) | full automation |
| Cubes refreshed per cycle | 4 | 4 | 4 | unchanged — correctness preserved |
| Failure-mode rollbacks (last quarter) | n/a | n/a | 2 (both safe-handled, zero downstream impact) | predictable |
| Stale-rollup tickets / sprint | ~1 | 0 | 0 | error class eliminated |
Soft outcomes:
- Release cadence unblocked. The team can now ship multiple metadata change sets a day, which made the platform's quarterly close materially calmer.
- Governance scales as dimensional complexity grows. Adding new dimensions costs configuration, not framework rework.
- The "stale parent hierarchy in cube N" error class disappeared from the support queue. It was the third-most-common ticket category before; six months in, it has not recurred.
Learnings
What worked. Profiling each iteration before optimising kept us from chasing the wrong bottleneck. The selective-skip pattern — compute a cheap signature, short-circuit when nothing material has changed — is the most portable mental model from this project. It generalises to any multi-cube perf problem I've hit since, and it's now part of how I scope similar engagements during pre-sales.
What I'd do differently. Stand up the automated regression diff in iteration 1, not iteration 3. The week I spent manually spot-checking outputs across the first two iterations was the single biggest time sink, and it gave me less confidence than the diff did once it existed. The lesson: write the test that proves correctness before you write the optimisation that makes correctness scary.
Skill developed. Reading OneStream's internal rule timing logs as a profiler rather than a debugger. The platform exposes more performance signal than its docs advertise; if you turn on the right log levels and post-process the output, you can build a flame-graph-equivalent without any external tooling. That technique has paid for itself on every performance engagement since.