Observability and human intuition in an AI world

Observability and human intuition in an AI world - Stack Overflow

Stack Overflow Business Stack Internal: the knowledge intelligence layer that powers enterprise AI.Stack Data Licensing: decades of verified, technical knowledge to boost AI performance and trust.Stack Ads: engage developers where it matters — in their daily workflow.Honeycomb is an observability platform that enables deep, high-dimensional exploration so you can debug unpredictable behavior with precision.Resolve AI allows you to resolve incidents, optimize costs, and code with production context using AI that works across your code, infrastructure, and telemetry.Connect with Christine on LinkedIn.Connect with Spiros on LinkedIn.TRANSCRIPT[intro music]Ryan Donovan: Hello, everyone, and welcome to the Stack Overflow podcast, a place to talk all things software and technology. Today, we have a special twofer episode recorded live from HumanX with big observability SRE focus. So the guests for this are Christine Yen, CEO of Honeycomb, and Spiros Zantos, who is CEO of Resolve.ai. So enjoy. I'm here at the floor of HumanX. We're going to be talking about observability in the age of AI. My guest for that is the CEO of Honeycomb, Christine Yen.Christine Yen: Hello. I'm excited to be here. Thank you for having me.RD: Of course. My pleasure. So I feel like observability has been one of those things that has increased in conversation over the last five, 10 years. And then with AI, it's sort of come into a different perspective and different importance. What's changing with observability because of AI?CY: Oh, you're asking the big question up front.RD: Yeah, yeah. Bigger than smaller.CY: [laughter] Well, let's talk about what's happening with software first, and then I'll wrap observability in. I was talking to someone yesterday who posited that we're not going to have engineers and product managers and designers anymore. They're all just going to be builders. And I like that idea because, you know we used to think of the software development lifecycle as these discrete steps. And they're all getting squished together. AI agents are moving so fast that the idea of writing a spec and implementation and review, it's all just getting squashed together. And I think that what that means for observability, for testing, for different things that were CI, that also basically means it's getting squished together into a step of just validation. Did the code do what I expected it to do? We used to validate that, as humans, maybe by reading the code and thinking about it and handing it to another human to read it, now everyone is talking about how no one's reading the code. But the code still has to be validated. Someone still has to make sure that what it does matches the intent. And when I think about how observability has evolved in the last five or ten years, it's actually really exciting. Because ten years ago, people were talking about logs, monitoring, and APM. They were different data types and people were like, well, you can do this with these data types. And when Honeycomb came on the scene, I think that we really tried to shift the conversation to: what can you do with this telemetry? I don't love getting bogged down in like what type of telemetry we're talking about. How do you know whether that telemetry is working for you or not? What does good mean for your service? How are you measuring that? And I think those skills, whether you're taking those and then... using them to define SLOs or just using them to shape the graphs or signals you're looking for, that skill of "what does good mean, how will we know”, that maps so well to this squished development cycle with intent and validation.RD: Talk about a lot of things sort of changing, like the old APM style of just like getting a dash, whatever metrics, and then logging and traces. I want to sort of get a sort of more fundamental question, like what is telemetry at this point?CY: I think telemetry is not going to change dramatically. And I say that partially because I have a pretty unglamorized definition of telemetry to begin with, which is just, what are the bits and exhaust that your application is putting out so that you know it's doing something? One of my least favorite things to do is sit and debate the relative merits of logs versus traces or structured events versus this label versus that label. It is all just telemetry. It's all just data. You can shape data in certain ways. You can add certain bits of metadata. I think certain bits of metadata are more useful. I'm not going to debate the types. What I'm going to say is that if you're an e-commerce business, you probably care about SKUs and shopping carts and checkout times, which is going to be different from a social media site, which cares about upload times and user IDs and likes and relationships. You should capture the things that matter. But in the end, telemetry is just... It's just data.RD: Yeah.CY: And also data has gravity. Even with AI, I don't think everyone is going to be really excited about completely overhauling everything that they've used before to understand applications. There's going to be an evolution. There always is.RD: Yeah.CY: But it's just the paper trail that you have your applications leave or your agents leave so you know what the heck they did.RD: So it almost sounds like you're saying like what used to be your KPIs are now part of your telemetry, right? Your sort of business.CY: I hope so.RD: Yeah?CY: I think that that's… the exciting part about– there's lots of reasons, lots of smart engineers are raising their eyebrows at fully autonomous agents, generated code, no one reading the code. I want to acknowledge that. But I think that we all feel that that's the world we're moving towards. And in that world, it does not matter anymore how elegantly these functions were defined or how they're invoked. What matters is: did they accomplish the job that code was supposed to? And I think that is going to force more and more engineers to define the job that the code is supposed to do in the language of the business, in the, like, what is the outcome this code is supposed to drive? It's not to, like, process text. It is to make sure that this item listing is formatted in a certain way. They're going to have to connect that to their business, and I think that's great.RD: With a lot of the AI-generated code, I've seen takes where people have found it to be not the best written performance-wise. I think Gary Tan, CEO of Y Combinator, yeah, was talking about, you know, X tens of thousands of code. And somebody looked at it and was like, this is just heavy, you know, not performant code. Do you think that is part of the observability that people are going to be more building in? The more sort of like, is this good defining good in a different way?CY: I think that's the key.RD: Yeah.CY: When does that facet of good matter? My co-founder, Charity, wrote a great piece earlier this year about disposable versus durable code. All respect to Gary, the code he's writing is probably pretty disposable. There aren't a whole bunch of people relying on it. There isn't his business entirely relying on it. No one is going to be that upset if latency doubles. People clearly aren't upset right now. He's not upset right now.RD: [laughter] Right.CY: And so I think that for his application, the definition of good does not contain performant, efficient, any of those things. For something like, I don't know, like Visa or financial services companies where you want really low latency, you want really high availability and reliability, those guarantees don't change, whether it is AI generated or not. And those companies are going to have to, and those engineering teams are going to have to figure out if they incorporate these coding agents, which inevitably they will, how do they define guardrails so that the agents know to stay within those and validate their output to make sure that it stays within those guardrails.RD: Yeah, I mean, like, how do you get an agent to write durable code? Because I think a lot of folks who are out there vibecoding and doing EA agents, it is that sort of disposal code where it's just like, this is a toy, this is a little internal tool. Who cares if it runs well?CY: First, there's that definition that we've talked about a lot. Adam Jacob, one of the co-founders of Chef, is out there now working on a new startup, writing a lot. He's gone through his whole AI, newly born revelations. And he really talks about building the machine to build the machine. And if you believe that you are– that the output of your code has to be durable, it is defining those characteristics that make it durable. What are the qualities that have to be placed? What are the things that should be optimized versus don't have to be optimized? And defining that upfront, just like you would for an engineering team. The way that people used to build engineering teams, if you are building an engineering team for a Visa or a Stripe, you're going to be looking for people who think or can think about code differently than a... Something that is meant for toy sample apps. The brownfield versus greenfield piece is a little different.RD: Oh, sure, yeah.CY: Where I think all the people who are raving and having wild success early pretty consistently green code.RD: Yeah.CY: Greenfield, right? Whether it was durable or disposable. This idea that the agent is the one generating the code and the agent doesn't have to necessarily make sense of what the human and the agent is doing. I think that this question of how do I take an existing code base and make it legible, make it really easily usable by agents is an interesting one. Obviously, it is possible to point an agent at an existing code base, but I think that there are going to become some interesting standards or practices to make pieces more modular, be more intentional about some patterns for agents to be able to pick up on them and keep going forward.RD: Yeah, I think I ta

Observability and human intuition in an AI world

Observability and human intuition in an AI world - Stack Overflow

Related Articles

The Singleton Labyrinth

Build your first MCP server in TypeScript: the 2026 setup that takes 30 minutes.

Check Wallet Balances Across 4 Chains with Zero Dependencies — chain_balance.py

Comments