How Super’s Infrastructure team scales impact with distributed ownership and Campaign automation
Challenge
Unscalable manual rollouts across a growing number of microservices that weren’t an effective use of time or talent.
Solution
Automating data entry, ownership, and search via the Kubernetes Syncer, Campaigns, and Slack integrations to complete metadata and make it easy to find and update.
Use cases
- Campaigns and service maturity
- Automated and integrated service catalog
Results
Scaled impact with distributed ownership so infra engineers can focus on more impactful initiatives and save time, while being more engaged in their work.
With OpsLevel’s automation, Super’s service owners take responsibility for their initiatives, opening the Infra team to better scale their impact across the org.
At the intersection of fintech and commerce, Super puts the power back into its users’ hands to help them spend less, save more, and build their credit through discounts on travel to everyday items and a cashback program. Just like Super helps consumers control spending and stay on track, Alex Ianus and the Infrastructure team work hard to enable successful deploys, optimize developer costs, and maintain service health. At around 40 microservices, Alex found that manual processes and regressions were going up, while scalability and efficiency were going down. Super needed a better way to manage microservices.
Challenge
Manual rollouts across growing number of microservices
“We have a lot of initiatives that get rolled out across all of our services,” Alex Ianus, Super’s Infrastructure Engineering Lead explains. “For example, we changed how we did dependency management by switching to a tool called Poetry instead of using PIP directly with PIP install requirements. But then it becomes something you have to go to every service to upgrade, and you can’t do that with a small team as the number of services are rising.”
At first, Alex tried to track it in a spreadsheet, but it was a pain. “It’s too manual, it doesn’t scale, and it depends on nagging people to update it. When I stop pestering them, they just go back to doing something wrong and we’ll never find out.”
Next, he thought he could write a tool that would automate initiative tracking so service owners didn’t have to fill in a spreadsheet, but stopped himself before he got that far to research if such a tool already existed. He found OpsLevel and the search was over—no extra coding needed.
Solution
Automated accountability with Campaigns
Sometimes choosing a new tool requires an extensive evaluation process, but for Alex, OpsLevel was an obvious choice because of the Campaigns feature. It was exactly what he was looking for, and most other similar tools didn’t have it. “I just needed something that would automatically check whether a team had upgraded a given service and remind them if they hadn’t. That way, there would be some level of tracking and accountability, and I didn’t have to be at the center of it.” Alex adds, “Once an initiative is implemented in a service via a Campaign, the service maturity rubric prevents it from regressing.”
Next, Alex started a pilot with a Solutions Consultant to spin up an OpsLevel instance he could show to teammates and leadership. The Kubernetes Syncer was another standout feature to help them quickly build the catalog. “We didn’t want to choose a solution where we would have to manually enter information for each of our 40 microservices. So when I saw the OpsLevel Kubernetes Syncer, I knew that I would just have to do the work once.”
Within five days, Super had all their services and nearly all associated metadata in OpsLevel based on these automations, because, as Alex explains, “once I did one service I kind of did them all because they would all be in Kubernetes and tagged the same way.” Though they started with 40 microservices, Alex says it would have been just as easy to complete had they had 400 services.
Onboarding
Capturing more users and metadata with Campaigns and Slack integrations
With everything in OpsLevel, Alex’s manager and his team saw the value right away, making it easy to move to an org-wide rollout. Campaigns worked their magic again in onboarding, capturing the last pieces of metadata to ensure all services had both an owner and a repo. This allowed Super to finalize their catalog import, and get end users accustomed to the Campaigns feature. “Once you send reminders and the service owner sees they’re failing, it’s pretty self explanatory,” Alex says. The campaign rollout was paired with hands-on training offered by OpsLevel’s Customer Success team, which took the pressure off of Alex.
Thanks to OpsLevel’s native Slack integration, the team could also save time searching for service information and link back to OpsLevel without making teammates search for it elsewhere or ping one another in the process. Super also created their own Slackbots powered by OpsLevel data to enable a suite of actions, like universal search.
Use cases
How Super uses OpsLevel in the day to day
Since running their first campaign, Super has run many more to tackle questions and issues that come up. One of their first campaigns was to classify every service based on PII or PCI data for their banking partner, so they created a campaign to add tags based on what level of data their service processed. Currently, they’re adding HTTP header propagation for all services to have more complete microservice logs after certain customer actions—another initiative made possible with OpsLevel.
OpsLevel also helps Super drive down insecure dependencies by flagging which services have them. And Super uses their maturity rubric to flag any service that throws too many errors, anything that runs out of memory more than two times a day, and any pull requests that have been open for more than two weeks. “OpsLevel makes it really clear to a developer why they’re not at gold [the highest maturity level in OpsLevel], why they’re getting flagged, and makes it easy for me as a manager to see where the issues are.” This rubric, combined with OpsLevel’s dashboards, are major inputs in the team’s weekly sustainability reviews to ensure everybody’s improving.
Results
Scaled impact with distributed ownership
For Alex and his team, the major benefit has been that the Infrastructure team can scale their impact across the organization by re-distributing initiatives to service owners. Now they don’t have to dedicate a teammate to one campaign at a time, which opens them up to do more impactful infrastructure work. “OpsLevel helps with retention, because if my infrastructure engineers had to spend their days making the same little change across 40 different services, they probably wouldn’t be around very long,” Alex laughs.
Super’s infrastructure engineers save 11+ hours per month by empowering product developers to self-serve more information. And though Super’s engineering org sits at 80 developers and 40 services now, it’s only continuing to grow. By getting ahead of this problem by building operations into the culture of regular developer work early, Alex can ensure the success of the engineering org for years to come.
Subscribe for regular updates.
Conversations with technical leaders delivered right to your inbox.