How checks work in OpsLevel
Checks are a critical component of improved production readiness (AKA “service maturity”) and set the bar for how your services are built, operated, and maintained. Having the right checks in place drives continuous improvement and helps prevent production incidents and vulnerabilities.
In this article, we’ll cover how Checks work in OpsLevel and the lifecycle of a Check.
If you’re looking for a step-by-step walkthrough of how to set up Checks in OpsLevel, check out our technical documentation.
Checks in OpsLevel
Checks are automated tests evaluating your services against your established pass/fail conditions. They exist at an account-wide level and are then scoped to the relevant services via Filters. Most importantly—checks are automated, so service owners will instantly know which Checks apply to their services and which Checks their services aren’t passing. At the same time, engineering leaders get insight into the health and quality of their entire architecture by seeing which standards are (and aren’t) being met.
With OpsLevel Checks, you can verify that services:
- Are using a particular version of a library or framework
- Have migrated to a new third party tool (e.g., “Are all of our services off Splunk?”)
- Have a low number of production incidents or library vulnerabilities… and more!
Check examples and data sources
Checks can be as simple as an ownership check ensuring services have owners, or as advanced as custom event checks that evaluate payloads from any tool against your thresholds and standards. Checks and catalogs are a two-way street—your catalog can inform checks, and vice versa.
There are three different categories of Checks in OpsLevel. Each pulls data from a different place:
- Service metadata - Checks that validate the structure and contents of your service catalog in OpsLevel. These can be helpful in driving adoption of technologies, tools, and practices, as well as general catalog adoption.
- Code and configuration - Checks that run on the files in the repos associated with your services. These can be used to validate the usage of certain libraries or versions, configurations in yaml or json files, and much more.
- Integrations and custom events - Checks on the values sent from integrations with other security/reliability/quality tools, etc. These can be used to consolidate best practices from many other sources into one service maturity rubric.
See more examples with explanations here.
Lifecycle of a Check
Though Checks themselves are automated, it’s important for the Check owner to follow through on the process to create and maintain it. For each Check you’d like to run in OpsLevel:
- Identify - Assess and scope requirements to the relevant teams and services needing them.
- Prioritize - Prioritize and scope the work, making sure it's manageable in the moment and over time.
- Configure - Add the check to OpsLevel, give it a clear owner, justification, and where to start.
- Report - Owners can report on service health and check in regularly, making sure everything is operating and documented properly.
- Re-evaluate - Who ultimately cares if this work is done or not and is following up if it’s behind? Is it worth your service owner’s time and attention?
A note on rubrics vs. campaigns
Rubrics define service maturity for your organization and measure adherence to it based on the checks you set, whereas a campaign helps you implement a change across your engineering org (e.g., a migration or paying off technical debt) and evaluates how you’re tracking on implementing those changes. Campaigns drive the passing of checks by being time bound with alerts, etc.
Ready to create your first check? Head to the Checks section of our help documentation and choose the check you’re looking for in the dropdown.