Knowns, Unknowns and Why Principles Still Matter (Even in Data and AI)

Data engineering is having a moment.

Everyone suddenly cares about pipelines, lineage and “AI foundations.” It still surprises me, mostly because these are the same things no one wanted to talk about for years. They were the unglamorous parts of data work, the plumbing behind the dashboards.

Now they’re headline topics again. Progress always circles back to fundamentals — and that’s a good thing, because nothing about AI works if the data doesn’t.

AI has compressed years of technology maturity into months. We went from “that’s interesting” to “deploy it everywhere” without stopping to learn what breaks and why. That’s the work now. And it isn’t happening in model architectures or tuning algorithms. Rather, it’s happening in data engineering, in the same fundamentals we’ve always needed: clean pipelines, solid governance, traceable lineage and systems that fail gracefully.

AI is just the latest technology to pass through that same cycle of discovery and disillusionment. What makes it stick are the principles data engineers have practiced for years.

Data engineering is infrastructure for the digital world. You don’t get credit when it works, but everything stops when it doesn’t. The job isn’t just moving data from point A to point B. It’s turning raw information into something that makes sense, adding context, shaping structure and creating the connective tissue that turns data into knowledge. Just look at the findings from a recent Snowflake MIT Tech Review report, Redefining Data Engineering in the Age of AI: 72% of the 400 surveyed technology leaders deem data engineers integral to their business. 

The work is invisible most of the time: saying no to shortcuts that will break later, tracing issues no one else sees and keeping systems alive through quiet discipline. That’s the craft.

How we learn what we don’t know

Every new technology follows the same pattern: excitement, confusion, failure and eventually understanding. AI is no different; it’s just moving faster.

We’re still learning what it can do, where it breaks and how to make it trustworthy. And that learning curve isn’t just technical — it’s cultural. It’s about how people share what they know and how organizations turn uncertainty into progress.

The hard part isn’t building these systems. It’s understanding what we truly know, what we only think we know and what we’ve never even questioned.

Recently, I came across a simple, yet revealing framework often used for risk analysis: knowns and unknowns. It fits perfectly with where we are with regard to data and AI. It helps us see not just what we know but what we assume, ignore or forget to ask. It shows us where the real danger lives.

The 2×2 of reality

The “known knowns” model has been around for decades. It became famous when then-Secretary of Defense Donald Rumsfeld used it during a press briefing in 2002, but the idea goes back to psychology research from the 1950s when Joseph Luft and Harrington Ingham created the Johari window, a way to think about what’s known to us, what’s known to others and what’s still hidden.

It fits perfectly into the realm of data and AI because it shows how people and systems actually learn.

 KnownUnknown
KnownKnown knowns: What we understand and rely onKnown unknowns: What we know we haven’t figured out yet

Unknown     

Unknown knowns: What someone else knows but we don’tUnknown unknowns: The things we didn’t even know we didn’t know

It looks simple, but it explains where organizations succeed, stumble and sometimes fail completely.

Known knowns: The foundations under siege

We all know the fundamentals — pipelines, governance, lineage, documentation — yet we tend to forget them the moment a new framework appears.

AI hasn’t changed their importance; it has just made it obvious when they’re missing. Think about what happens when you try to build AI on shaky ground:

  • The model hallucinates because no one validated the training data.
  • The pipeline breaks silently, feeding stale data to production.
  • A “quick prototype” becomes a business-critical dependency.

The quiet safeguards are what keep systems standing: the lineage job that catches broken dependencies; the schema test that stops bad data from spreading; or the safety switch that halts ingestion when quality drops.

AI didn’t make these things optional. It made them nonnegotiable. 

Strong foundations don’t just prevent outages; they control cost. Every untested job, broken lineage or stale model burns compute and time. The best-performing systems aren’t always the ones that run fastest — they’re the ones that run predictably. Efficiency starts with understanding, and understanding comes from doing the fundamentals well.

Known unknowns: The questions getting harder

Every organization has a list of things it doesn’t fully understand. In AI, that list keeps growing:

  • How do we measure explainability when models make decisions we can’t audit?

  • How do we trace lineage between training data and outputs when models retrain themselves?

  • How do we govern synthetic data?

  • How do we handle drift when failures happen in milliseconds, not hours?

These are known unknowns. The old playbook of batch jobs and predictable workflows doesn’t apply. We’re writing a new one while the system is running.

If someone asks “Do we trust and understand our data in production AI systems?” and you hesitate, that’s real data engineering. Recognizing a known unknown is the first step in turning it into a known known. It’s how we reduce uncertainty, one honest question at a time.

Unknown knowns: The answers hiding in plain sight

This is the quadrant that quietly kills projects: the things we don’t know but someone else does.

They show up everywhere:

  • The vendor’s system “optimizes” itself until performance tanks, and suddenly you’re debugging blind while they hold the map.

  • The upstream team changed their schema but never told anyone.

  • The team solved the same problem a year ago but never shared it.

  • The model works until it doesn’t, with no one remembering why it was designed that way.

Unknown knowns are the hidden debt of complexity. They appear when knowledge stops flowing across people and teams.

The fix isn’t more automation; it’s communication. Ask questions. Listen. Bring people in early. Sometimes the smartest debugging tool is asking “Has anyone seen this before?”

Work needs to be shared across disciplines. Engineers see technical risk. Product sees customer impact. Security sees threats. Business sees compliance. What’s unpredictable to one may be obvious to another.

You’d be surprised how many “AI problems” turn out to be unknown knowns, just answers sitting quietly in someone’s head all along.

Unknown unknowns: What’s coming that we can’t see

These are the surprises we don’t see coming — the ones that look obvious only in hindsight.

Imagine an AI agent “optimizing” your pipeline by dropping what it thinks are low-value tables. Or real-time inference systems that fail faster than humans can react.

This isn’t new. Every technology wave starts this way. Cloud migrations broke in unpredictable ways too, like auto-scaling that seemed perfect in testing until it torched the production bill. We learned, and then we built guardrails.

AI is moving through the same cycle, only faster. You can’t predict unknown unknowns, but you can prepare for them. How?

  • Design for failure: Assume it’s not a question of if but when. Build with rollbacks, retries and safety switches.

  • Contain the blast radius: One bad model or agent shouldn’t take down your platform.

  • Empower the front lines: The engineer who spots something off should feel safe to act.

  • Learn from incidents: A good postmortem helps you understand what really happened and why.

The goal isn’t perfection — it’s resilience. It’s building systems and cultures that can absorb surprises and adapt.

The long game

AI started as an unknown unknown, distant and theoretical. Now it’s familiar but still not fully understood. Our work is to move it toward the known known, not by simplifying it but by making it explainable, reliable and trustworthy.

As AI automates more of the execution work, the craft of data engineering is shifting. The next generation will learn through context and mentorship, turning hard-won lessons into something that can be taught instead of simply inherited.

Principles still matter. They are what turn reaction into resilience and resilience into lasting progress.

Technology will keep changing, but the foundation stays the same, because the real infrastructure isn’t the platform. It’s the people who keep building, learning and passing the craft forward.

Every generation thinks it’s building for the future. In reality, we’re building for the next team who inherits what we leave behind. The best thing we can give them is clarity and the principles to keep learning.

Disclaimer: These thoughts are the author’s own, based on experience, and don’t represent the views of her current or former employers.

LATEST ARTICLE

See Our Latest

Blog Posts

admin November 26th, 2025

Real business intelligence is more than seeing a number — it’s about understanding the story within it. In the UK, […]

admin November 26th, 2025

Data engineering is having a moment. Everyone suddenly cares about pipelines, lineage and “AI foundations.” It still surprises me, mostly […]

admin November 26th, 2025

We are thrilled to announce the availability of Claude Opus 4.5, Anthropic’s most capable model available to customers on Snowflake […]