Running scripts in opposition to manufacturing is a standard engineering activity that may be surprisingly troublesome in a totally containerized setting like Kubernetes. To do it, your scripts should be positioned in a container picture and deployed, which requires an intensive understanding of the CI/CD setting and the creation of related YAML information. You’ll additionally want to write down customized failure monitoring and restart logic. Finally, some compliance requirements require an audit log for any adjustments to manufacturing—and utterly disallow adjustments produced from native environments.
In this publish, we introduce a framework that abstracts over this complexity, permitting engineers to write down scripts and run them on Kubernetes with out having to edit YAML information or manually construct containers—making this method one thing like an inner Zapier or IFTTT to your engineering group. Though the implementation is particular to Iterable, the idea can seemingly be utilized to any firm operating Kubernetes.
What is Krobs?
Enter Krobs—a framework for operating scripts and cron jobs on Kubernetes, whether or not on a schedule or simply as soon as. Krobs prevents engineers from having to execute lengthy-operating scripts on their native machines or as cron jobs on different servers, and from having to take care of YAML information and Helm charts.
With Krobs, you may simply write the script, merge, and run it by way of Harness—Krobs handles the YAMLs, Harness configurations and monitoring framework for you. Kubernetes jobs are assured to rerun till they succeed or fail a given variety of instances, which is one thing {that a} regular script must deal with explicitly.
How We’re Using Krobs
The unique motivation for the instrument got here from the will to keep away from operating lengthy-operating scripts on developer laptops. These sorts of scripts are most utilized by our Platform Services group, which is liable for sustaining and scaling our massive, multi-tenant Elasticsearch clusters and information infrastructure.
These scripts carry out duties similar to draining Elasticsearch nodes, operating expunges on indices, and deleting information from indices. Developers ran these scripts regionally, or on varied EC2 hosts. The scripts may run for hours (or days!), and every of them wanted monitoring and handbook restarts—taking valuable focus away from actual engineering.
All of the duties talked about above now run on Krobs. Engineers on the group write scripts, deploy them with Harness, and get standing updates on a Slack channel. They can mainly hearth and overlook, until the job fails for some unrecoverable cause. As utilization of Krobs has grown, engineers have developed frequent libraries. We’re additionally utilizing it as a alternative for customized monitoring companies—for instance, we use Krobs jobs (Krob for brief) to watch the state of cluster settings in Elasticsearch (a use case that doesn’t warrant a full service).
It has solely been a number of months for the reason that introduction of Krobs, so utilization has principally been restricted to Elasticsearch upkeep and monitoring. We additionally use it for operating one-off scripts throughout incident remediation, the place—for compliance causes—we restrict entry in native environments. As Krobs matures, utilization could unfold to different engineering groups at Iterable.
How It Works
Krobs builds on the Kubernetes Job, which ensures that a number of Pods efficiently run to completion. The course of of making a Krob begins with a pull request in a devoted Krobs repository. In this repository, there’s one folder for every programming language that Krobs helps—presently Ruby, Python, Scala (Ammonite), and Bash. Each folder accommodates a Dockerfile that builds the picture with the given language and required packages. The Dockerfile additionally imports any scripts discovered within the folder.
The Krobs repository appears to be like like this:
The pull request should move evaluate earlier than it may be merged. This level is price emphasizing as a result of it’s one of many key variations between operating a Krob and operating advert-hoc scripts in opposition to manufacturing. The approval course of ensures that at least one different engineer has reviewed the script earlier than it runs.
After the pull request is merged, CircleCI routinely builds the pictures and publishes to AWS ECR. At this level, the script exists within the revealed container and might be run. We use Harness for CI/CD, which suggests we will set off the Krob from a Harness deployment.
In the Harness deployment, we specify an setting through which to run the Krob (manufacturing or staging), a singular job identify, the programming language, the identify of the script, and the picture that accommodates the given script. If the Krob is a cron job, we embrace a parameter to specify its cron interval.
Monitoring
Each Krob script takes benefit of current integration with Slack and Datadog. We have a krob-monitor Kubernetes service that Krobs can POST to with Slack messages and Datadog metrics, and the monitoring service handles the small print of really posting these companies. Here’s an instance that reveals how a Krob script may submit a Slack replace from a Bash script:
Demo
Below is an instance of triggering a Krob for the script deletions-BIG-1553-c12-org-1698.sc
by way of Harness:
After submitting the Harness deployment, the Krob begins to run:
The job might be seen operating in Kubernetes by way of kubectl:
A Slack notification tells the engineer that the duty has began operating:
Limitations
Because Krobs is constructed on prime of Kubernetes Jobs, there are some restrictions on the sorts of jobs it could actually run. Kubernetes Jobs should be idempotent, as a result of Kubernetes cron jobs creates a job about as soon as per execution time of its schedule (although they are often configured to run at least as soon as). By extension, Krobs should even be idempotent, which can disqualify Krobs for some scripts that should be run precisely as soon as (and may’t be made to be idempotent).
Another, we implement a rule that Krobs shouldn’t grow to be load-bearing posters. If a Krob fails, there ought to be no manufacturing affect. Krobs ought to by no means grow to be mission-essential components of an utility (to keep away from having an obscure, unmaintained Krob operating a essential a part of the app), and onerous to establish when it inevitably breaks.
Finally, to keep away from the creation of a bunch of miscellaneous scripts with out recognized house owners, every Krob ought to be completely documented (its performance, proprietor, and supposed cron schedule if it’s a cron job).
Conclusion
By making it simpler to run scripts in opposition to manufacturing Kubernetes in a safe and auditable away, Krobs has diminished the quantity of operational work required of Iterable engineers. We’ve applied Krobs on Kubernetes Jobs and Harness, nevertheless it ought to be attainable to construct an identical instrument on any stack with a Kubernetes cluster.