Challenge
If a problem arose from any of its 400+ services, the Ops team had only outdated spreadsheets and messy Confluence pages to suss out next steps.
Solution
Automated catalog creation and real-time updates with the Kubernetes Syncer, finally gives dev teams the source of truth they needed.
Use cases
- Campaigns
- Service maturity rubric
- Kubernetes and Datadog integrations
Results
DevOps can now undertake migration projects that would have been impossible before, while reaching service maturity with Campaigns prioritized by leadership.
With nearly 400 services owned by over a dozen teams, Keller Williams had to ditch manual spreadsheets to make service ownership and maturity an org-wide priority. Now with automated Kubernetes syncs and quarterly campaigns, they finally have a service catalog they can trust to tackle day-to-day issues and large migration projects.
Keller Williams is the world’s largest real estate franchise, with more than 187,000 agents in 1,100 offices and the highest sales volume in the United States. This success can be attributed to its time spent cultivating an agent-centric, technology-driven, and education-based culture—making it no surprise that they have large consumer-facing and agent-facing platforms. The agent-facing platform hosts a CRM and transaction management tool for agents, and it’s here where most of the development complexities for Senior Architect, Drew McAuliffe, lie. Managing the approach to development, maturity, and architecture, Drew brings everything together to set common standards for the large Keller Williams development team.
Challenge
Hundreds of services, 20+ teams, no catalog
Keller Williams’ customer- and agent-facing platforms were first developed by independent contractors, creating silos which only got further enforced as they grew to more than 20 teams. “Once we got upwards of 300 services,” Drew explains, “anytime something went wrong, the Ops team would ask why, who it belonged to and it was just radio silence. Everything was a guess.”
To attempt to track services, Drew cobbled together a system between spreadsheets and searching Confluence page dumps. But even he admits these were a miss as, “hardly anyone knew the spreadsheet existed and nobody actively managed it. Confluence is only as useful as it is organized, so if you knew where to look you could maybe get something but everybody formats and shares it differently.”
Managing the health of these services, then, was an almost unthinkable challenge. Responsible for rolling out tool upgrades, deprecations, and other organization-wide initiatives with no real way to track them or have accountability, Drew’s best bet was creating an epic in Jira, advocating for the various engineering teams to engage with it, and hoping they’d heed his words.
Drew realized that he was either looking at a homegrown solution, or an existing internal developer portal that had already solved the problem. He preferred the latter as, “there is a difference between an actual product solution versus something that an engineer cooks up in-house.” OpsLevel caught his attention right away.
Solution
Kubernetes as the automated source of service catalog truth
“Looking at the problem, I knew at the very least we needed a software and service catalog so Ops people could know who was on what team, what services they owned, and who to contact for issues. OpsLevel made it really easy for us to build our service catalog with the Kubernetes integration sync. It uses a Kube control script config file that maps out and scans Kubernetes. And I didn’t realize this until later, but this scanning happens with every Kubernetes deployment! So if it detects any new service, it maps it over to OpsLevel ensuring our catalog always stays up to date.”
Once they chose OpsLevel, Keller Williams was able to catalog 395 services in the first month, and rolled out catalog adoption to more than 160 unique users within the first two months.
Use cases
Quarterly campaigns to drive service maturity
After Keller Williams implemented OpsLevel, the Campaigns feature launched, which is how Drew and his team were able to drive service maturity initiatives to improve service health and roll out service-wide initiatives.
“Campaigns have changed our culture. With Campaigns, I can say, ‘here’s something you need to go and do’ then actually see who’s done it. It’s a far cry from my days of creating Jira epics that I had to beg people to care about. Now we can prioritize these initiatives with team leads each quarter so service maturity isn’t just this thing that only I’m worried about and working on.”
Keller Williams is also a big user of various integrations. Beyond the daily Kubernetes syncs, they utilize a custom two-way Datadog integration. Datadog has information about service dependencies that gets pulled into OpsLevel, while OpsLevel populates key service catalog data points, like team, owner, etc. in Datadog. “Having automatic two-way data syncs that replicate what’s visible across your tooling is such a relief. We don’t have to update anything manually or worry about which one is correct.”
Results
A 350% increase in visibility
“It’s hard to calculate the impact of OpsLevel, because it’s like we grew an entire arm we didn’t have before. We can do things that were literally unthinkable.” In a recent customer survey, Keller Williams engineers reported being 350% more satisfied with the visibility across their software ecosystem since using OpsLevel. Now in the middle of performing an important ASM service mesh migration, helping them ensure data security, Drew says, “the fact that we have an updated service catalog that’s based on actual running code is huge. This project would be impossible without OpsLevel.”
“It’s hard to calculate the impact of OpsLevel, because it’s like we grew an entire arm we didn’t have before. We can do things that were literally unthinkable.”
— Drew McAuliffe, Senior Architect, Keller Williams
Drew also appreciates the contrast between OpsLevel’s support team with other vendors they use. “With OpsLevel, any time we have a question, we can reach out in our own Slack instance and get the answer versus some random channel that’s separate from everything else. And the degree to which they’ve rolled features out since we’ve signed on has been impressive.” With each OpsLevel feature launch, Drew is always looking for a way to implement it and loves that the offering continues to improve.
Subscribe for regular updates.
Conversations with technical leaders delivered right to your inbox.