Building a Strong Data Engineering Team on Databricks and Google Cloud

This article is the first in a series. In Part 1, we'll focus on what "good" looks like in the target state (the end goal) and the organizational setup (how people and teams should work together). Future articles will dive deeper into CI/CD pipelines, governance, cost management, and more.

We'll also explain acronyms and terms so the whole picture is clear.

What "Good" Looks Like: The Target State

A target state is the picture of what your system, people, and processes should look like when everything is working well. It's like drawing the end goal before starting the journey.

1. Architecture (How the system is built)

When we talk about a target state, we need to start with the foundation: the architecture. Architecture defines how all the pieces of technology fit together, how data flows, and how security and governance are enforced. Without a strong foundation, even the best teams will struggle, because problems like inconsistent access, unreliable performance, or ballooning costs will slow everything down. By clearly defining the architectural components—such as Databricks Lakehouse, Google Cloud Platform, and Unity Catalog—we set the stage for a system that is both scalable and trustworthy. This section is included in the target state definition because it ensures that every decision about people, processes, and tools rests on a solid, secure, and future-proof technical base.

Key Components:

Databricks Lakehouse: A combination of a data lake (cheap storage for raw data) and a data warehouse (fast queries). It allows you to store all kinds of data—structured, semi-structured, or unstructured—and analyze it in one place.
Google Cloud Platform (GCP): A cloud provider offering storage, compute power, and security. Databricks runs on top of GCP so you can take advantage of both.
Unity Catalog (UC): Databricks' governance layer. "Governance" means controlling who can access which data, tracking data lineage (where it came from), and making sure data is secure.
External Locations: Controlled gateways that point to your actual files in Google Cloud Storage. They ensure access rules are enforced.
Private Service Connect (PSC): A GCP feature that keeps network traffic private so it doesn't travel over the public internet. This is critical for regulated industries like banking and healthcare.

Why this matters

Without a well-designed architecture, you risk data leaks, messy permissions, or projects that don't scale. A clean, governed architecture makes it easier to trust your data and pass audits.

2. CI/CD (Continuous Integration / Continuous Delivery)

After defining the architecture, the next part of a solid target state is how changes are delivered safely and consistently—this is where CI/CD comes in. In any modern data platform, teams make constant updates: new pipelines, transformations, dashboards, and governance rules. Without a structured way to test and deploy these updates, every change risks breaking production or slowing down delivery. CI/CD (Continuous Integration and Continuous Delivery) gives you that safety net by automating testing, packaging, and deployment. By including this section in the target state, we highlight the importance of having reliable pipelines, repeatable processes, and secure automation tools. This ensures that the platform doesn't just work once—it keeps working as the system grows, new teams join, and the business demands faster, more frequent releases.

Key Concepts:

CI/CD means packaging, testing, and deploying changes in a safe, repeatable way.
Continuous Integration (CI): Every time a developer writes code, it gets automatically tested.
Continuous Delivery (CD): Code is automatically packaged and deployed to environments like dev, test, and production.

Tools to use:

Databricks Asset Bundles (DABs): Containers that package code and environment settings together so what works in dev also works in prod.
GitHub Actions or Cloud Build: Services that run pipelines, checking code quality, running tests, and deploying bundles.
Terraform: A tool that treats infrastructure (servers, databases, permissions) as code.
Workload Identity Federation (WIF): Connects GitHub pipelines to GCP securely, without passwords.
Service Principals and OAuth: Robot accounts and token systems for secure automation.

Why this matters

CI/CD ensures every deployment is consistent, tested, and reversible. Without it, teams may break production or spend days debugging.

3. Operations and Governance

The third pillar of the target state is operations and governance—the guardrails that keep the platform secure, cost-effective, and compliant. Even with the best architecture and CI/CD in place, things can quickly go off the rails if usage isn't monitored, permissions aren't enforced, or secrets aren't managed properly. Operations and governance provide visibility into how the system is used, protect against unnecessary spending, and safeguard sensitive information. By including this section in the target state, we make sure the platform isn't just powerful and efficient, but also controlled, auditable, and resilient enough to handle growth and regulatory demands without unpleasant surprises.

Why

Keeps responsibility close to the business, ensuring data products deliver value.

Why

Prevents squads from being blocked while learning new tools.

Why

Concentrates rare expertise so other teams stay focused.

Building Your Data Strategy?

Our experts can help you design the right organizational approach and technical foundation.

Team Assessment & Design Data Strategy AI Strategy Databricks Architecture Foundation

Get Expert Guidance

4. Process and Way of Working

The final piece of the target state is the process and way of working—how people actually collaborate to deliver value. Technology alone won't guarantee success; the way teams are structured and coordinated makes the difference between smooth delivery and constant fire drills. By combining the Scaled Agile Framework (SAFe) with concepts from Team Topologies, you create both alignment and flexibility. SAFe ensures large groups move in sync by defining clear leadership and coordination roles like the System Architect and Release Train Engineer (RTE). Team Topologies complements this by explaining how teams should be shaped—whether as stream-aligned squads that own a product end-to-end, a platform team that provides shared services, an enabling team that teaches new skills, or a complicated subsystem team that tackles specialized challenges. Including this section in the target state makes sure people know their responsibilities, how they interact with other teams, and how work flows from idea to production—removing ambiguity and accelerating delivery.

Why

Prevents squads from being blocked while learning new tools.

Why

Concentrates rare expertise so other teams stay focused.

Why

Prevents squads from being blocked while learning new tools.

Org Topology: How Teams Should Be Organized

Once the target state is clear, the next step is to decide how teams should be organized to bring it to life. This is what we call the org topology—the structure of teams, their sizes, and their responsibilities. Good technology and processes will only succeed if the right people are in the right places, working in well-defined groups. By outlining the org topology, we show how platform specialists, business-aligned squads, short-term enablers, and experts on complex systems can all fit together. This section is included because it translates theory into practice: it explains who will run the platform, who will deliver business value, who will coach and support, and who will handle specialized challenges. A clear team structure removes overlaps, avoids gaps in responsibility, and makes sure every part of the system—technical and organizational—has an owner.

Here is a suggested setup combining SAFe roles with Team Topologies.

1. Platform (Lakehouse Platform) Team

At the heart of the org topology is the Platform (Lakehouse Platform) Team. This group acts as the backbone of the entire data environment. With about 5–7 people, the team's mission is to build and maintain the shared Databricks and GCP foundation that all other squads depend on. They manage critical pieces like Terraform code for infrastructure, CI/CD templates that standardize deployments, and security measures such as secret management and access controls. They also keep an eye on costs through dashboards and guardrails. We include this team in the design because, without it, every data squad would be forced to solve the same problems repeatedly—wasting time, duplicating effort, and risking inconsistency. The platform team ensures that best practices are baked in once and reused everywhere, giving product squads a solid, reliable base to build on.

Size

5–7 people

Responsibilities

• Build and maintain Databricks + GCP platform
• Own Terraform code and CI/CD templates
• Manage secrets, security, and cost dashboards

Size

5–7 people

Responsibilities

• Build and maintain Databricks + GCP platform
• Own Terraform code and CI/CD templates
• Manage secrets, security, and cost dashboards

2. Stream-Aligned Data Product Squads

The next key building block is the Stream-Aligned Data Product Squad. Each squad, usually 5–8 people, is focused on a single business domain such as finance, marketing, or operations. Their role is to handle the full flow of data for that area—ingesting raw data, transforming it into usable formats, and serving it to analysts or applications. A Product Owner manages the backlog, making sure the team is always working on the highest-value tasks for their business stakeholders. This structure is important because it keeps responsibility and ownership close to the business, rather than separating technology from real-world needs. By aligning squads to domains, you ensure that data products are not just technically correct, but also valuable, timely, and relevant to the people who rely on them.

Size

5–8 people per squad

Responsibilities

• Own ingestion, transformation, and serving for one business domain
• Manage backlog with Product Owner
• Focus on domains like marketing or finance

Size

5–8 people per squad

Responsibilities

• Own ingestion, transformation, and serving for one business domain
• Manage backlog with Product Owner
• Focus on domains like marketing or finance

3. Enabling Team

The Enabling Team plays a short-term but critical role in helping the organization adopt new practices. Usually made up of 2–3 people, this team's main job is to coach and guide the stream-aligned squads on specific skills—such as setting up testing frameworks, adopting CI/CD pipelines, or implementing data quality checks. They don't own long-term delivery; instead, they transfer knowledge and then step aside once the squads are self-sufficient. Including an enabling team in the org topology is important because it prevents delivery squads from getting stuck or slowed down while trying to learn new tools on their own. By accelerating adoption of best practices, enabling teams raise the maturity of the entire organization without creating permanent overhead.

Size

2–3 people, temporary

Responsibilities

• Teach squads new practices like testing frameworks
• Coach on CI/CD pipeline adoption
• Transfer knowledge and step aside

Size

2–3 people, temporary

Responsibilities

• Teach squads new practices like testing frameworks
• Coach on CI/CD pipeline adoption
• Transfer knowledge and step aside

4. Complicated Subsystem Team (Optional)

The Complicated Subsystem Team is an optional but highly valuable part of the organization when specialized challenges arise. This team takes on problems that require deep, rare expertise, such as building real-time streaming pipelines with tools like Google Pub/Sub or managing advanced change data capture (CDC) processes. These tasks are often too complex for stream-aligned squads to handle on top of their regular delivery work. By concentrating experts in one place, you ensure that tough technical issues are solved efficiently without distracting other teams from their core responsibilities. Including this team in the org topology gives the organization the flexibility to tackle specialized, high-stakes problems while allowing product squads to stay focused on delivering consistent business value.

Responsibilities

• Handle tough problems like real-time streaming with Pub/Sub
• Manage advanced change data capture (CDC) processes
• Concentrate rare expertise for specialized challenges

Responsibilities

• Handle tough problems like real-time streaming with Pub/Sub
• Manage advanced change data capture (CDC) processes
• Concentrate rare expertise for specialized challenges

SAFe Roles Recap

To tie the org topology together, it's important to recap the SAFe (Scaled Agile Framework) roles that provide structure and alignment across all teams. These roles ensure that while each squad has autonomy, the larger program moves forward in a coordinated way. The System Architect sets the long-term technical direction and enforces consistent standards, making sure the architecture runway is clear for future work. The Release Train Engineer (RTE) acts as the master facilitator, keeping multiple teams in sync, coordinating release schedules, and helping manage dependencies. The Product Owner (PO) drives business value by managing the backlog, defining features, and deciding what "done" means for the team. Finally, the Scrum Master serves as a coach, helping the team adopt agile practices, improve collaboration, and remove blockers that slow delivery. Together, these roles form the leadership layer that ensures teams stay aligned, productive, and focused on delivering outcomes that matter.

Key SAFe Roles:

System Architect

Defines architecture runway and enforces standards.

Product Owner

Owns backlog, defines features, and accepts work.

RTE

Release Train Engineer

Facilitator who ensures synchronized releases.

Scrum Master

Helps the team adopt agile practices and remove blockers.

Why This Design Works

This organizational design works because it balances clarity, speed, and flexibility. Each team has clear responsibilities, so there's no confusion about who owns what—whether it's platform stability, business-specific pipelines, or specialized subsystems. The model also supports scalability: as demand grows, you can simply add more stream-aligned squads without redesigning the whole structure. At the same time, governance remains strong because the platform team enforces consistent standards across all squads. Business-aligned squads bring agility, delivering value faster since they stay close to stakeholder needs and own end-to-end delivery. Finally, the structure allows for flexibility—consultants or temporary enabling teams can be plugged in when rare expertise or extra capacity is needed, without disrupting the core organization. Together, these factors create a system that is both stable and adaptable, ensuring the data platform can grow and evolve alongside business priorities.

Clear Responsibilities

Each team knows what they own.

Scalability

New squads can be added easily.

Governance

Platform team enforces standards.

Agility

Business-aligned squads deliver value faster.

Flexibility

Consultants can be added for specialized work.

Practical Roadmap

It's one thing to define the target state and team structure, but the real challenge is knowing how to get there step by step. That's why we include a practical roadmap—to turn strategy into action. A roadmap breaks down the big vision into smaller, time-bound steps that can actually be delivered. It helps leaders prioritize what to do first, ensures teams don't get overwhelmed, and creates visible progress that builds confidence. By outlining milestones for the first 30, 60, and 90 days, as well as longer-term actions, we make sure the journey from concept to execution is structured and achievable. This section matters because without a clear sequence, even the best-designed target state can stall or lose momentum.

First 30 Days

Set up dev, staging, and prod workspaces with Terraform
Create Unity Catalog and first catalogs
Enable system tables for monitoring
Deploy a CI/CD "hello world" pipeline

Days 31–60

Form the platform team
Onboard the first data product squad
Build first ingestion and transformation pipelines with tests
Implement cost guardrails

Days 61–90

Add a second squad
Expand CI/CD to multi-environment deployments
Introduce data quality checks and contracts
Build cost dashboards
Establish release governance with the RTE

Beyond 90 Days

Add more squads as demand grows
Form a Complicated Subsystem Team if needed
Review architecture regularly with the System Architect
Use Enabling Teams for training new hires

Conclusion

Building a data engineering team for Databricks on Google Cloud Platform (GCP) is not simply a matter of bringing in a group of engineers and expecting results. It is a much more holistic effort that combines people, processes, and technology in a deliberate way. A successful team needs a clear target state—a vision of what the platform should look like when it's mature, governed, and delivering value. Without that north star, teams risk chasing short-term fixes that don't add up to a sustainable system.

It also requires a smart organizational setup. Technology alone cannot deliver outcomes if people are working in silos, duplicating effort, or unclear about responsibilities. By deliberately structuring teams into platform specialists, stream-aligned product squads, and supportive groups like enabling teams or complicated subsystem experts, you create an environment where everyone knows what they own and how their work contributes to the bigger picture. Layering in SAFe roles—the System Architect, Release Train Engineer, Product Owners, and Scrum Masters—ensures coordination across teams and keeps delivery aligned with both technical standards and business priorities.

Finally, this transformation requires a practical roadmap for growth. Ambition without a step-by-step plan often leads to frustration or wasted effort. By sequencing the journey—starting with platform foundations, adding the first product squad, expanding CI/CD, and gradually scaling into multiple squads—you avoid chaos and build confidence at every stage. The roadmap provides not just direction, but also momentum, ensuring that teams can celebrate quick wins while moving toward long-term success.

In short, building this kind of team is about designing for scale, trust, and agility from the very beginning. With a strong foundation, a thoughtful organizational model, and a roadmap that balances near-term execution with long-term vision, you create a data platform that is secure, cost-effective, and capable of delivering real business impact. It's this combination—technology, governance, and human collaboration—that makes the difference between a data engineering team that merely operates, and one that truly drives the business forward.

What's Next in the Series

This was Part 1: Target State and Team Topology.

In the next articles, we will cover:

CI/CD in Action

A deep dive into pipelines, testing strategies, and deployment templates.

Data Governance and Unity Catalog

How to manage permissions, lineage, and compliance.

Cost and FinOps

Monitoring, optimization, and preventing runaway bills.

Advanced Use Cases

Streaming data, machine learning, and cross-cloud integration.

By the end of the series, you'll have a full blueprint to run a modern, scalable data platform on Databricks and Google Cloud.

Ready to Transform Your Data Engineering?

Let our experts help you implement these strategies and build a world-class, scalable data platform.

Team Assessment & Design Data Strategy AI Strategy Databricks Architecture Foundation

Start Your Transformation

Part 1: Building a Strong Data Engineering Team on Databricks and Google Cloud

Foundation, Organization & Roadmap

What "Good" Looks Like: The Target State

1. Architecture (How the system is built)

Key Components:

Why this matters

2. CI/CD (Continuous Integration / Continuous Delivery)

Key Concepts:

Tools to use:

Why this matters

3. Operations and Governance

Why

Why

Why

Building Your Data Strategy?

4. Process and Way of Working

Why

Why

Why

Org Topology: How Teams Should Be Organized

1. Platform (Lakehouse Platform) Team

Size

Responsibilities

Size

Responsibilities

2. Stream-Aligned Data Product Squads

Size

Responsibilities

Size

Responsibilities

3. Enabling Team

Size

Responsibilities

Size

Responsibilities

4. Complicated Subsystem Team (Optional)

Responsibilities

Responsibilities

SAFe Roles Recap

Key SAFe Roles:

System Architect

Product Owner

Release Train Engineer

Scrum Master

Why This Design Works

Clear Responsibilities

Scalability

Governance

Agility

Flexibility

Practical Roadmap

First 30 Days

Days 31–60

Days 61–90

Beyond 90 Days

Conclusion

What's Next in the Series

CI/CD in Action

Data Governance and Unity Catalog

Cost and FinOps

Advanced Use Cases

Ready to Transform Your Data Engineering?

Stay Connected with KData

KData Company