Google Drive to OneStream, parallelised
Quick Facts
- Industry: DTC outdoor lifestyle group (multi-brand consumer portfolio)
- Role: Integration Lead — OneStream ingestion architect
- Timeline: ~3 months, from prototype to production cutover
- Team: Solo build, architect-reviewed; finance-ops team validating loads cycle by cycle
- Impact: First-of-kind Google Drive ingestion pipeline with no middleware, parallel file processing, and a manual fallback inside the same framework.
Overview
A DTC outdoor lifestyle group runs its monthly close on OneStream and feeds the consolidation cube from a fleet of partner brands, each delivering flat-file extracts into a shared Google Drive folder. There is no native Google Drive connector for OneStream, and the obvious alternatives — middleware brokers, custom side-services, or "just download and import manually" — each introduced operational debt the platform team didn't want to own. We built one with what OneStream gives you natively. The result was a parallel, connector-free ingestion pipeline that pulls every brand's drop into the cube in a single rule invocation, with a manual upload path folded into the same framework so finance-ops never has to learn a second workflow.
This is the story of how that pipeline came together: proving you could talk to Google Drive from inside an Extensibility rule, parallelising the file work once the path was real, and threading the manual fallback through the same code paths so it isn't a second system bolted onto the side.
The Problem
The business sources its monthly numbers from a dozen partner brands, each owning their own subledger and each producing a flat-file extract in a slightly different shape. The agreed exchange was a shared Google Drive folder per brand, with the extracts dropped on a published cadence. From there OneStream had to ingest, validate, and stage the data for the consolidation cycle.
The pain points the team was living with before we started:
- No native connector. OneStream ships connectors for a handful of common file sources and database systems; Google Drive isn't one of them. The platform's data-source abstraction has no built-in awareness of OAuth-based cloud storage APIs.
- Every alternative introduced operational debt. Standing up middleware (a small integration broker that would poll Drive and push to a OneStream-readable share) meant a new service to monitor, a new credential store, a new failure surface. A custom side-service in another language meant a deployment pipeline the platform team didn't run anywhere else. The "download to a local share and import" workaround forced an analyst to babysit the load every cycle.
- Sequential ingest was slow. Even when the team did manual download-then-import, the OneStream import rule processed each file end-to-end before moving to the next. A representative monthly drop was a dozen-plus files ranging from small property updates to large transaction extracts; doing them sequentially turned what should have been a few minutes into the better part of an hour of wall-clock time.
- Manual loads lived in a different mental model. When connectivity was flaky and someone had to upload a file by hand, they did it through a separate workflow with its own buttons, validation messages, and error surfacing. Two ways to do the same thing meant two sets of muscle memory and two sets of edge cases.
The brief was tight and specific: get partner-brand data from Google Drive into the consolidation cube without standing up middleware, do it fast enough that a full monthly drop ingests in minutes not hours, and keep manual upload as a first-class path inside the same framework — not a parallel system.
Process
Phase 1 — Prove the Extensibility rule can talk to Google Drive
The first question was whether the connector-free path was actually viable. OneStream's Extensibility rule type lets you write .NET code that runs inside the platform with access to the cube, the workflow, and crucially the standard .NET HTTP stack. If you can write an OAuth flow and an HTTP client in C#, you can hit the Google Drive API; the question was whether the platform's sandboxing and rule-execution model would let you do it cleanly.
I prototyped the smallest possible thing: an Extensibility rule that authenticates against a Google service account, lists files in a known Drive folder, and writes the list to the OneStream Application Log. No ingestion, no parsing, no error handling — just "can we see the files." That worked on the first cycle, which collapsed half the risk in the project. The OAuth flow needed a service-account JSON key stored in the OneStream file system, the access-token refresh logic had to be re-entrant in case two rule invocations overlapped, and the Drive API needed paging for folders with more than a hundred files — but none of these were structural blockers, just engineering.
The second prototype added the read path: pull a file's bytes down from Drive, hand them to OneStream's existing flat-file parser, and let the cube load proceed. That worked too. At that point we knew the connector-free architecture was real and the project moved from "is this possible" to "is this fast enough and operationally clean enough to ship."
Phase 2 — Parallelise the ingest
With a single-file path proven, the next question was throughput. The naive implementation pulled one file from Drive, parsed it, loaded it to the staging area, then moved to the next file. For a dozen files of varying sizes that meant the slowest file gated the cycle and the others sat idle waiting their turn.
The fix had two halves. First, the rule fans out the file list into a worker pool — a bounded number of .NET tasks each handling one file end-to-end (download, parse, stage). The pool size is tuned to the platform's available threads and the Drive API's rate limit; we settled on a value that kept both saturated without tripping either. Second, the staging writes had to be safe under concurrency: each worker writes into its own scoped staging area keyed by file identity, and a final reconciliation step rolls the scoped stages into the real staging table once every worker has reported success.
The reconciliation step was the part that took the most care. If any worker failed mid-flight we needed the whole cycle to either retry that file or fail cleanly without leaving partial state in the cube. We threaded a per-worker status flag through the pool, and the reconciliation step only proceeds when every file has reported a terminal status (success or terminal-fail). Terminal-fail files get retried up to a configured retry count and then surfaced to the operator with the underlying exception attached.
The wall-clock result was the one the brief was asking for. Sequential ingest for a representative monthly drop took several minutes per file and roughly an hour end-to-end; parallel ingest brought the same drop in within minutes, with the per-file cost dropping from minutes to seconds for the smaller files because they were no longer queued behind the larger ones.
Phase 3 — Fold the manual fallback into the same framework
Manual upload was the part most ingestion projects get wrong. The shape of the bug is always the same: the automated path is the "real" path, the manual path is a second system bolted on for emergencies, and the two drift apart until the manual path is the source of half the cycle's bugs.
We refused to ship a second system. The Extensibility rule was already the source of truth for "what happens when a file enters the pipeline"; manual upload should just be a different way to put a file in front of the same rule. We exposed a dashboard upload control that drops a file into the same staging Drive folder the automated pipeline reads from, and re-triggers the same Extensibility rule with the same parallel worker logic. The rule cannot tell, and does not need to tell, whether the file arrived via partner-brand drop or via finance-ops clicking the upload button.
The win is that all the validation, error surfacing, retry, and staging logic is shared. When connectivity is flaky and someone uploads manually, they get the same error messages, the same retry behaviour, and the same staging guarantees as the automated path. Finance-ops learned one workflow, not two.
Solution
Four components, all running inside the OneStream Extensibility rule framework with no external dependencies:
1. Extensibility-rule Google Drive client
A .NET client built on the standard HTTP stack, talking to the Google Drive API via a service-account OAuth flow. Handles token refresh, folder paging, and per-file download. The credential blob lives in OneStream's secure file system; nothing about the integration sits outside the platform. The client is intentionally narrow — it knows how to list a folder and pull a file's bytes, and that's it. Anything richer (e.g., write-back to Drive, folder management) is out of scope by design so the surface stays auditable.
2. Parallel ingest dispatcher
The dispatcher reads the file list from the Drive client, fans it out into a bounded worker pool, and supervises the workers through to terminal status. Each worker is a single .NET task that owns one file end-to-end. The dispatcher enforces the pool size, applies the per-file retry policy, and serialises the final reconciliation step so the cube only sees finalised state. This is the component that turns a sequential hour into a parallel few-minutes.
3. In-framework manual fallback
A dashboard upload control that drops an uploaded file into the same Drive staging folder the automated path reads from, then re-triggers the same Extensibility rule. There is no separate "manual ingest" code path; the rule treats partner-brand drops and analyst uploads identically once the file is in the staging folder. This is the component that keeps the second-system bug from ever taking root.
4. Error and retry surfacing
Per-worker exceptions are captured at terminal-fail time, attached to a structured load-cycle report, and surfaced through the dashboard. The same surface handles the Drive-side errors (auth failures, paging errors, rate-limit responses) and the OneStream-side errors (parse failures, validation mismatches, staging conflicts), so finance-ops sees a single error queue regardless of which layer the failure came from. Retries are configurable per-cycle without touching the rule code.
Results
| Metric | Before | After | Change | |---|---|---|---| | Native Google Drive support in OneStream | none | full ingest pipeline, no middleware | first-of-kind | | Ingest time, representative monthly drop | ~1 hour sequential | a few minutes parallel | minutes per file → seconds per file in the parallel pool | | Middleware / side-services to operate | 1+ (any alternative path) | 0 | dependency eliminated | | Separate workflow for manual fallback | yes (two systems) | no (one framework) | unified | | External credentials living outside OneStream | partial (depending on alternative) | none | platform-contained |
Soft outcomes:
- The monthly close stopped waiting for ingest. What used to be "kick off the load and grab a coffee" became "kick off the load and start the next task" — the wall-clock cost stopped being a budgeted block in the close calendar.
- Operational footprint stayed flat. No new service to monitor, no new credential store to rotate, no new deployment pipeline to maintain. The platform team operates one thing — OneStream — and that didn't change.
- The manual-fallback muscle memory survived. Because the manual path runs through the same rule as the automated path, the team's mental model didn't fragment. When the rare manual upload comes up, finance-ops uses the same vocabulary and the same error queue they use every cycle.
Learnings
What worked. Starting with the smallest possible Extensibility-rule prototype — list files, log result — was the move that de-risked the whole project. OneStream's Extensibility surface is more capable than most platform teams give it credit for, and the only way to know how far you can push it is to push it on the smallest possible scope first and let what you learn dictate the next prototype. The lesson generalises: when there's no native connector and the alternatives are all middleware-shaped, spend a week proving the connector-free path before you commit to a middleware-shaped architecture.
What I'd do differently. Build the parallel worker pool with the retry-and-reconciliation logic from the first day, not retrofitted in. The first parallel cut was happy-path-only and assumed every file would succeed; threading retries and the final reconciliation through after the fact was more work than building them in from the start would have been. The transferable form of this lesson: concurrency primitives and failure-handling primitives are the same primitives, and pretending otherwise just means you write them twice.
Skill developed. Treating OneStream's Extensibility rules as a real integration runtime rather than a glue layer. Once you accept that you can run a full HTTP client, a worker pool, and an OAuth flow inside the platform, a whole class of "we need middleware" conversations become "we need a rule." That reframing has changed how I scope integration work since: the default question is now "can this live in the platform?" and the answer is yes more often than people expect.