How to Use Bash & Python for Real DevOps Automation – Full Handbook with 5 Production Use Cases

Osomudeya Zudonu

Automation scripts often validate process completion instead of system health. A Kubernetes pod can be running while the application inside it can't authenticate to the database. A Terraform deployment can return clean while someone has manually changed infrastructure in the cloud console. A canary rollout can show zero errors while users wait five seconds for every request. The problem isn't the tooling. The problem is that the system can look healthy when it really is not. This handbook walks through five production-style automation scenarios using Bash and Python for:

Detecting abnormal AWS spend before the monthly invoice arrives

Correlating logs across multiple services using trace IDs

Finding infrastructure drift outside Terraform

Validating secret rotation at the application level

Automatically rolling back slow deployments before users complain

By the end of this handbook, you'll be able to build small scripts that help you notice when something is wrong in a system, even when the tools say everything is fine. The scripts are intentionally small. The important part is the operational thinking behind them like what signal the script measures, what failure mode it can detect, and what assumptions the platform is making underneath. Each use case includes a runnable demo environment, the complete script, a breakdown of the system behaviour involved, and an intentional failure you can trigger yourself. If you're new to this workflow, start with use case 1 and work forward. The later sections build on the same pattern: automation is useful when it verifies reality, not just process completion. Prerequisites Before you start, set up the following:

Python 3.8 or higher – check with python3 --version

A Python virtual environment – create one before installing anything:

python3 -m venv venv source venv/bin/activate

# on Windows:

venv\Scripts\activate

This keeps your installed packages isolated from your system Python and prevents permission errors on shared machines.

pip – Python's package installer, included with Python

AWS CLI configured with a working profile – a free-tier AWS account is enough for use cases 1, 3, and 4. Verify it's working with: aws sts get-caller-identity

Docker and Docker Compose – needed for use cases 2, 4, and 5

Kind (Kubernetes in Docker) – a way to run Kubernetes locally for use cases 4 and 5. Install with brew install kind on macOS, or follow the Kind quick start guide

kubectl – the command-line tool for talking to a Kubernetes cluster. After installing Kind, run kind create cluster and kubectl is configured automatically

Helm – a package manager for Kubernetes, needed for use case 5. Install with brew install helm or the Helm install guide

Terraform – needed for use case 3. Install with brew install terraform on macOS or follow the Terraform install guide. Check with terraform version.

bc – a calculator utility used by the canary watch scripts for floating-point comparison. Install with brew install bc on macOS or apt install bc on Ubuntu. Run bc --version to confirm it is available before starting use case 5.

Knowledge and Skills

You should be comfortable reading Python and Bash scripts without needing to write them from scratch.

You should have basic Linux terminal comfort – navigating directories, running scripts, reading output, and so on.

You should know what Kubernetes pods and deployments are at a basic level – you don't need deep Kubernetes expertise, as use cases 4 and 5 will introduce the Kubernetes concepts they rely on as they go.

Familiarity with AWS basics such as what EC2, IAM, and Secrets Manager will help with use cases 1, 3, and 4, while use case 2 runs entirely on your local machine and requires no AWS knowledge at all.

For use case 3, knowing what Terraform is and what a state file does will help. You don't need to write any Terraform, but understanding that Terraform tracks and what it created is the foundation of the whole use case.

AWS IAM Permissions Required The scripts in this article make real AWS API calls. Your IAM user or role needs the following minimum permissions. (If you see an AccessDenied error, this is the first place to look.):

Use Case Required IAM Permission

1 - Cost Anomaly Detection ce:GetCostAndUsage

3 - Drift Detection ec2:DescribeSecurityGroups

4 - Secrets Rotation secretsmanager:GetSecretValue, secretsmanager:PutSecretValue

If you're using a fresh AWS free-tier account with AdministratorAccess attached, these permissions are already included and you can skip this step. If you're on a restricted IAM user, here's how to add them. In the AWS Console, go to IAM, click Users, then click your username. Under the Permissions tab, click Add permissions, then Create inline policy. Switch to the JSON tab and paste a policy document granting the permissions in the table above, then save it. If your company manages AWS through an organization and you don't have permission to edit your own IAM policies, ask your administrator to add these permissions to your role. Companion GitHub Repository All demo projects live at: https://github.com/irvingtalks/devops-scripting-labs Each use case has its own numbered folder with the complete script, supporting files, a setup.sh to prepare the environment, and a break_it.sh that injects the specific failure each use case is built around. Clone the repo before starting: git clone https://github.com/irvingtalks/devops-scripting-labs cd devops-scripting-labs

Before running any use case, check that you have everything installed: ./preflight.sh

This checks for every tool the lab needs like Python, AWS CLI, Docker, Kind, Helm, Terraform, and bc and tells you exactly what's missing with the install command for each one. Table of Contents

Use Case 1 - Cost Anomaly Detection

Use Case 2 - Log Correlation Across Services

Use Case 3 - Infrastructure Drift Detection

Use Case 4 - Secrets Rotation with Zero Downtime

Use Case 5 - Automated Canary Rollback Trigger

What You Can Do Now

Use Case 1 - Cost Anomaly Detection Environment: AWS Cost Explorer API (read-only, available in all accounts) Language: Python The Production Problem A junior engineer is testing a Kubernetes configuration. They spin up a managed node group in AWS (a set of EC2 virtual machines that the Kubernetes cluster uses to run workloads) and configure the cluster autoscaler, which is the Kubernetes component responsible for adding more machines when the cluster needs more capacity. The test goes well, and on Friday afternoon, they forget to tear the environment down. Over the weekend, the autoscaler keeps provisioning new nodes because the test workloads are still running and requesting resources. By Monday morning you have a node group that has been quietly growing for two and a half days, and nobody noticed until the invoice landed three weeks later. The script in this use case exists because your AWS bill isn't just a monthly number. It's a time series, and you can monitor it the same way you monitor application metrics. Check it daily, know your baseline, and you catch this kind of event in hours instead of weeks. What's Actually Happening at the System Level What this is not: This isn't a finance dashboard. It's an operational anomaly detector and the signal it monitors is cost. But the thing it's actually detecting is unexpected infrastructure behavior such as resources left running, autoscaler events, and forgotten environments. AWS Cost Explorer is a service that stores your billing data and exposes it through an API, and when you call it, you're running a query against your account's billing records by specifying the time range, the granularity, and how you want results grouped. One thing to know before you start investigating any flagged cost is that AWS decides which service category to put a charge under, not you. An EBS snapshot copy running across regions might appear under the EC2 line item rather than data transfer, which means a spike in EC2 spend doesn't necessarily mean something went wrong with your EC2 instances. The script flags the spike correctly, but investigating it means asking "what changed in my infrastructure on this date" rather than "what is running in EC2 right now." The billing label is a starting point, not a diagnosis. Set Up the Demo Environment Navigate to 01-cost-anomaly/ in the companion repo. No cluster setup is needed for this use case because the script runs against your AWS account directly, and the only dependency is boto3: cd 01-cost-anomaly pip install boto3

Before running against your real account, make sure your AWS credentials are configured. The script uses whatever credentials the AWS CLI is set up with. If you haven't done this yet: aws configure

This will ask for your AWS Access Key ID, Secret Access Key, default region (use us-east-1 if unsure), and output format (type json). You can find your access keys in the AWS Console under IAM → Users → your username → Security credentials → Create access key. Your account needs the ce:GetCostAndUsage permission also, if you're on a fresh account with AdministratorAccess that's already included. If you have an AWS account with a few weeks of billing history, you can run the script directly against your real data: python detect_cost_anomaly.py

Two things to know before running against a real account. First, Cost Explorer data has a 24-hour lag. This means spend from today won't appear until tomorrow, so the script automatically excludes the most recent day to avoid incomplete results. Second, the script uses unblended costs, which is what you actually pay on a single-account

How to Use Bash & Python for Real DevOps Automation – Full Handbook with 5 Production Use Cases

How to Use Bash & Python for Real DevOps Automation – Full Handbook with 5 Production Use Cases

Related Articles

Treasure Hunt Engine: How We Blew Up the Docs and Built a System That Actually Works

The Blacklist Nightmare: How to Get Off Spam Lists Fast

How I built a Bluesky scraper using the AT Protocol API (and published it on Apify)

Comments