Why Your Snowflake Bill Is Unattributable (And What to Do About It)
Finance wants to know which team drove the 40% bill increase. You have no idea. Here's why — and how to fix it.
Picture this: Finance sends a Slack message — "Our Snowflake bill jumped 40% this month. What happened?" The data engineering lead opens a tab, stares at a flat-rate number in the Snowflake console, and realizes they have absolutely no idea which team, pipeline, or analyst is responsible.
This happens constantly. Snowflake's billing model is powerful and flexible, but cost attribution — connecting a dollar amount to a specific team, project, or query — is something you have to build yourself. And most teams either haven't built it, or built something that broke the moment the team scaled past 20 people.
The attribution problem
At the surface level, Snowflake gives you a credit consumption number per warehouse. But warehouses are shared. A single X-Large warehouse might serve your BI analysts, your dbt production jobs, your ML feature pipelines, and your data scientists' ad-hoc queries — all at the same time, with no native cost separation.
The data to do attribution exists, but it's scattered across views that each have their own limitations:
- WAREHOUSE_METERING_HISTORY tells you how many credits a warehouse consumed in a time window — but not which queries consumed them.
- QUERY_HISTORY has the queries, users, and execution times — but costs aren't directly attached. You have to calculate them by correlating query timing with warehouse metering data.
- ACCOUNT_USAGE views have a 45-minute latency. By the time data appears there, the spend has already happened.
- There's no native tagging system. Snowflake has no built-in way to attach a 'team' or 'project' label to queries automatically.
Doing attribution correctly means cross-referencing multiple views, normalizing time windows, and making assumptions about concurrent query costs — all with data that's already stale. It's not impossible, but it's genuinely hard to do accurately at scale.
What most teams do (and why it doesn't work)
When teams try to solve this, they almost always go through the same sequence of increasingly broken approaches:
The spreadsheet phase. Someone pulls warehouse metering data monthly, manually eyeballs which warehouse belongs to which team, and sends a summary to Finance. This works for maybe three months, until the team grows and warehouses start being shared across multiple teams. The spreadsheet becomes a quarterly exercise in guesswork.
The dbt model phase. A smart engineer writes a dbt model that joins QUERY_HISTORY with WAREHOUSE_METERING_HISTORY to allocate costs per user. It works great — for a few months, until a warehouse gets renamed, a new user joins without being tagged, or a one-off warehouse gets created for a migration project. The model silently produces wrong numbers, and nobody realizes it for two billing cycles.
The INFORMATION_SCHEMA phase. Someone discovers INFORMATION_SCHEMA.QUERY_HISTORY has no latency (unlike ACCOUNT_USAGE). They build a real-time dashboard. Then they hit the 10,000 row scan limit per call and the whole thing starts falling apart on busy days when query volume spikes.
Each of these approaches fails because attribution in Snowflake is a continuous, streaming problem — not a batch reporting problem. The data changes every minute, the organizational structure changes over time, and the queries you need to attribute don't announce themselves.
A better approach: automated cost attribution
What actually works is treating attribution as a first-class data pipeline, not an afterthought. Here's what that looks like in practice:
Cross-reference query execution with warehouse metering in real time. For every time window, calculate the fraction of warehouse compute consumed by each query based on execution overlap and bytes scanned. This gives you a per-query cost estimate that's far more accurate than any manual allocation.
Tag queries by user, role, and source system — automatically. Snowflake's QUERY_TAG session parameter lets you attach metadata to every query a service account or user runs. When dbt, Fivetran, or your internal Python scripts set a query tag before executing, that tag flows through to QUERY_HISTORY and becomes the attribution signal. Set it up once at the connection level and you get automatic tagging for every query downstream.
Map roles and users to teams in a separate metadata layer. Snowflake roles are a natural attribution boundary — ANALYST_TEAM_A, DATA_ENG_PLATFORM, ML_PROD. Maintain a simple mapping from role to team to cost center, and apply it to your query attribution. When people change teams, update the mapping rather than rebuilding your data model.
Generate chargeback reports automatically. Once attribution is working, the chargeback report is just a query. Filter by team, sum the attributed costs, and send a weekly digest. Finance gets the numbers they need, team leads can see their footprint, and the conversation shifts from "we have no idea" to "here's exactly where the spend came from."
Reality check
Building this pipeline correctly takes 2–4 weeks of engineering time, ongoing maintenance as your warehouse structure evolves, and careful handling of edge cases (concurrent queries, warehouse scaling events, shared service accounts). It's absolutely doable in-house — but it's also exactly the kind of work that gets deprioritized whenever there's a product sprint.
The real payoff
When attribution works, behavior changes. Teams that can see their own cost footprint start optimizing differently — data scientists start thinking about query efficiency, BI analysts notice when their dashboards are scanning the full table, and engineering leads have the data to justify infrastructure decisions. Cost visibility isn't just for Finance. It's a forcing function for engineering quality.
The teams we work with typically find that the first month of attribution data reveals two or three cost centers that nobody expected — a deprecated pipeline still running, a dashboard that auto-refreshes every 5 minutes, an analyst running full table scans for exploratory work. None of these are malicious. They're just invisible until you can see them.
Every dollar attributed to a query, pipeline, and team — automatically.
PrismCost handles the full attribution pipeline: query history ingestion, cost allocation, team mapping, and automatic chargeback reports. See your complete cost breakdown within 48 hours.
See your cost breakdown →