New year, new eng team goals? Check out our resource hub with everything you need to choose the best IDP for your team.
Seth Lochen of Groupon talks ownership and the bystander effect, platform engineering, and frogs in boiling water
Welcome to Level-Up, an exclusive interview series with standout engineering leaders who share what’s top of mind for them. This interview puts the spotlight on Seth Lochen, Senior Engineering Manager at Groupon. Let us know who we should talk to next!
The next time you get a local experience on Groupon, give a nod to Seth Lochen. Seth is a Senior Engineering Manager at Groupon, where he’s part of the platform and engineering tools team.
Seth has been at Groupon since its early days, so he’s watched Groupon evolve. He joined in 2010, and his specific duties are managing the CI/CD team, the logging platform team, and the Service Portal (their homegrown service catalog) team.
Before talking to Seth, I didn’t know much about Groupon’s stack. But he explained that Groupon started as a monolithic Ruby on Rails app with hundreds of developers all committing to the same code base.
As the company scaled, that monolith was broken into pieces, and Groupon developed a classic service-oriented architecture. “Little by little we built up more and more microservices,” he said. “For example, we built a framework to build a bunch of Node JS services for our front end and another framework for building Java services for API, which led to an explosion of other services.”
I was fortunate enough to sit down with Seth for a chat about service ownership, service maturity, and service level indicators. He has a fresh perspective: he believes in granular and individually-oriented service ownership, sees service maturity as essential, and understands how RED metrics are important yet limited.
Seth’s ideas and experience are inspiring to any engineer looking to mature and scale their organization sustainably.
Service ownership: Who’s really responsible?
Service ownership has evolved over the years, as Groupon transitioned from a monolith to microservices. As they onboarded more and more services, they found that they were losing track of basic things like number of services, functions of services, or how services integrated with one another. As a result, service ownership emerged as a top challenge to solve.
“Initially we associated high-level owners– such as VPs or senior leaders– with the services that rolled up to them,” Seth said. “As time has gone on, we’ve found it helpful to get even more granular, getting down to individual managers or lead engineers because that allows us to assign tasks to people who are actually going to do the work.”
Seth believes that assigning ownership to individuals, rather than teams, is a surefire way to make sure that issues are resolved. He noted the bystander effect, the idea that if everyone–i.e an entire department or team–sees something happening, but shares responsibility, no individual will rush to fix it. However, if someone makes a direct ask to a particular individual, they’ll spring into action.
Once the team had better insight into service ownership by getting more granular, they could ensure that metadata for their services was accurate. And once accuracy was improved, it allowed Groupon’s team to automate a lot of things that previously had to be done manually and better manage service sprawl.
“Service ownership has allowed us to improve the data itself, which then allows our operations and SRE teams to find the person that’s responsible for the service and take action– whether that’s to deal with an incident or simply check in on the everyday health of the service,” said Seth.
Caching–or why service maturity tasks matter
Every engineer wants service maturity. For Seth, it’s essential to making Groupon trustworthy. Without service maturity, the company can’t provide a stellar customer experience.
“For us, service maturity is an indicator of the health of the service,” he said. “We don’t want people using software that has vulnerabilities. We also don’t want to fall too behind because it makes service ownership and maintenance difficult.”
“If you’re falling that far behind, you’re putting yourself at risk in terms of your service just being operational,” said Seth. “That’s why it’s in the best interest of the engineering organization to keep a healthy balance between service maturity and product work to keep everything up to date.”
Seth shared a story about a recent migration of CI systems. As part of the old system, they cached a lot of dependencies that the team used for efficiency. When a new build spun up, engineers didn’t have to download the entire internet from Maven again. But as a result, they were caching a lot of old information, and the team ran into situations where their library dependencies were so old that they were no longer published.
“If you’re falling that far behind, you’re putting yourself at risk in terms of your service just being operational,” said Seth. “That’s why it’s in the best interest of the engineering organization to keep a healthy balance between service maturity and product work to keep everything up to date.”
Translating monitoring metrics into business KPIs
When it comes to service maturity, Seth is also focused on SLIs and SLOs. After all, many companies struggle to define their SLIs and SLOs. It’s also challenging to determine who owns what when it comes to SLOs: is the platform team responsible or the feature team?
At Groupon, Seth’s team is starting to focus on red metrics, which are (1) rate (i.e. requests per second), (2) errors, and (3) duration or latency. “We’re in an early stage where we’re defining what percentage of error rate is acceptable. No matter what, we’ll continue to go back to the service owners I mentioned before because they’ll have better context on what’s acceptable for their applications.”
As far as getting business stakeholders on board, Seth believes that red metrics are foundational and low-level, meaning that it’s not straightforward for those on the business side to understand them.
But he hopes that the strategy will evolve so that there are certain metrics that can be communicated with key stakeholders. “For example, we can give an actual, objective service level,” said Seth. “Although that may boil down to a bunch of counters and histograms on the back end, we’ll be able to share with business stakeholders that we’re delivering on the business agreement.”
When to start a dedicated platform engineering team
There’s not a magic time where you’ll need to create a dedicated platform engineering team. According to Seth, the time to upgrade is when you find that your product and features teams are spending as much–or more time–shipping stuff out the door as they are actually developing them.
It’s a frog in boiling water problem. That is, the frog feels that the water is fine as it gets hotter and hotter. Often, teams don’t know just how bad things are until someone else comes in and asks how they can live that way.
As companies scale up, people often expect to be more productive and efficient. But if they don’t have the right tools and processes in place, productivity stagnates or even goes down. That’s when it’s time to create your own platform engineering team.