Shell Scripting for DevOps
Published: December 2025 | Topic: Automation & Scripting for DevOps
Automation is the superpower of DevOps. Shell scripting enables you to automate repetitive tasks, manage infrastructure, deploy applications, and monitor systems efficiently. Mastering Bash scripting is essential for any DevOps professional looking to streamline operations and build reliable automation pipelines.
Why Shell Scripting Matters for DevOps
Shell scripting provides:
- Automation: Eliminate manual, repetitive tasks
- Consistency: Ensure identical execution across environments
- Infrastructure as Code: Manage infrastructure through scripts
- Rapid Prototyping: Quickly test ideas and workflows
- Cross-platform Compatibility: Run on any Unix-like system
- Integration: Connect different tools and services
- Monitoring: Create custom monitoring solutions
- Backup & Recovery: Automate data protection strategies
The DevOps Scripting Workflow
Define requirements
Write Bash script
Debug & validate
Schedule with cron
Logs & alerts
1. Introduction to Bash Scripting
What is Bash?
Bash (Bourne Again SHell): The default command-line interpreter for most Linux distributions and macOS. It's a powerful scripting language that allows you to automate system administration tasks, process data, and control program execution.
Your First Bash Script
# hello.sh - My first Bash script
# This script prints a greeting message
echo "Hello, World!"
echo "Today is: $(date)"
echo "Current user: $USER"
echo "Working directory: $(pwd)"
Script Execution Methods
Method 1: Using bash command
$ chmod +x hello.sh
# Execute with bash
$ bash hello.sh
$ ./hello.sh # Requires shebang and executable permission
# Output:
Hello, World!
Today is: Mon Dec 1 10:00:01 UTC 2025
Current user: alice
Working directory: /home/alice
Method 2: Source execution
$ source hello.sh
$ . hello.sh # Shorthand for source
# Difference:
# ./script.sh - Runs in new shell
# source script.sh - Runs in current shell
# (Environment changes persist with source)
Method 3: Debug mode
$ bash -x hello.sh
# Output with debugging:
+ echo 'Hello, World!'
Hello, World!
+ date
+ echo 'Today is: Mon Dec 1 10:00:01 UTC 2025'
Today is: Mon Dec 1 10:00:01 UTC 2025
# Debug inside script
set -x # Enable debugging
# ... script commands ...
set +x # Disable debugging
Essential Script Components
Shebang (#!)
#!/bin/bash - Most common
#!/usr/bin/env bash - More portable
#!/bin/sh - For POSIX shell scripts
#!/usr/bin/python - For Python scripts
# Why shebang matters:
1. Tells system which interpreter to use
2. Makes script self-contained
3. Allows direct execution (./script)
4. Ensures correct syntax highlighting
Comments & Documentation
# This is a comment
echo "Hello" # Inline comment
# Multi-line comments (using here-doc)
: << 'COMMENT'
This is a multi-line comment.
It can span multiple lines.
Useful for function documentation.
COMMENT
# Script header template
#!/bin/bash
#
# Script: backup.sh
# Author: DevOps Team
# Date: 2025-12-01
# Description: Automated backup script
# Usage: ./backup.sh [options]
Complete Starter Script Template
#
# template.sh - Bash script template for DevOps
#
# Features:
# - Proper error handling
# - Command-line arguments
# - Logging
# - Configuration
#
# Usage: ./template.sh [OPTIONS]
#
# ============================================
# CONFIGURATION
# ============================================
SCRIPT_NAME="$(basename "$0")"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
LOG_FILE="/var/log/${SCRIPT_NAME%.*}.log"
TEMP_DIR="/tmp/${SCRIPT_NAME%.*}_$(date +%s)"
# ============================================
# FUNCTIONS
# ============================================
log_message() {
local message="$1"
local level="${2:-INFO}"
local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
echo "[$timestamp] [$level] $message" | tee -a "$LOG_FILE"
}
cleanup() {
log_message "Cleaning up temporary files"
rm -rf "$TEMP_DIR"
}
show_help() {
cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]
Options:
-h, --help Show this help message
-v, --verbose Enable verbose output
-d, --dry-run Simulate without making changes
Examples:
$SCRIPT_NAME --verbose
$SCRIPT_NAME --dry-run
EOF
}
# ============================================
# MAIN SCRIPT
# ============================================
# Set up error handling
set -euo pipefail
trap cleanup EXIT
trap 'log_message "Script interrupted" "ERROR"; exit 1' INT TERM
# Parse command-line arguments
VERBOSE=0
DRY_RUN=0
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help)
show_help
exit 0
;;
-v|--verbose)
VERBOSE=1
shift
;;
-d|--dry-run)
DRY_RUN=1
shift
;;
*)
log_message "Unknown option: $1" "ERROR"
show_help
exit 1
;;
esac
done
# Main script logic
log_message "Script started" "INFO"
if [[ $VERBOSE -eq 1 ]]; then
log_message "Verbose mode enabled" "DEBUG"
fi
if [[ $DRY_RUN -eq 1 ]]; then
log_message "Dry run mode - no changes will be made" "WARNING"
fi
# Create temporary directory
mkdir -p "$TEMP_DIR"
log_message "Created temp directory: $TEMP_DIR" "DEBUG"
# Your script logic goes here
log_message "Performing main task..." "INFO"
sleep 2
log_message "Script completed successfully" "INFO"
exit 0
2. Variables, Loops, and Conditionals
Variables in Bash
Variables: Named storage locations that hold data. In Bash, variables can store strings, numbers, arrays, and command outputs. Variable names are case-sensitive and should be descriptive.
Variable Declaration & Usage
name="Alice"
count=42
path="/home/alice/documents"
# Using variables
echo "Hello, $name"
echo "Count is: $count"
echo "Path: ${path}" # Curly braces for clarity
# Command substitution
current_date="$(date)"
file_count="$(ls -1 | wc -l)"
kernel_version="$(uname -r)"
Special Variables
$0 # Script name
$1, $2, ... # Arguments
$# # Number of arguments
$@ # All arguments as separate words
$* # All arguments as single word
# Special parameters
$$ # Current process ID
$! # Last background process ID
$? # Exit status of last command
# Example script with arguments
#!/bin/bash
echo "Script: $0"
echo "First argument: $1"
echo "Total arguments: $#"
echo "All arguments: $@"
Arrays
fruits=("apple" "banana" "cherry")
numbers=(1 2 3 4 5)
files=("$(ls)")
# Accessing array elements
echo "First fruit: ${fruits[0]}"
echo "All fruits: ${fruits[@]}"
echo "Number of fruits: ${#fruits[@]}"
# Adding to array
fruits+=("orange" "grape")
# Associative arrays (Bash 4+)
declare -A config
config["host"]="localhost"
config["port"]="8080"
config["user"]="admin"
echo "Host: ${config[host]}"
Conditionals (if/else/case)
If-Else Statements
if [[ -f "/etc/passwd" ]]; then
echo "File exists"
fi
# If-else statement
if [[ $USER == "root" ]]; then
echo "Running as root"
else
echo "Running as regular user"
fi
# If-elif-else statement
if [[ $1 -gt 100 ]]; then
echo "Greater than 100"
elif [[ $1 -gt 50 ]]; then
echo "Greater than 50"
else
echo "50 or less"
fi
File Test Operators
[[ -e "file" ]] # Exists
[[ -f "file" ]] # Regular file
[[ -d "dir" ]] # Directory
[[ -r "file" ]] # Readable
[[ -w "file" ]] # Writable
[[ -x "file" ]] # Executable
[[ -s "file" ]] # Not empty
# String tests
[[ -z "$str" ]] # Empty string
[[ -n "$str" ]] # Non-empty string
[[ "$str1" == "$str2" ]] # Equal
[[ "$str1" != "$str2" ]] # Not equal
# Numerical tests
[[ $a -eq $b ]] # Equal
[[ $a -ne $b ]] # Not equal
[[ $a -lt $b ]] # Less than
[[ $a -le $b ]] # Less or equal
[[ $a -gt $b ]] # Greater than
[[ $a -ge $b ]] # Greater or equal
Case Statements
case "$1" in
start)
echo "Starting service..."
# start commands
;;
stop)
echo "Stopping service..."
# stop commands
;;
restart)
echo "Restarting service..."
# restart commands
;;
status)
echo "Service status..."
# status commands
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
;;
esac
Loops in Bash
For Loops
for i in 1 2 3 4 5; do
echo "Number: $i"
done
# For loop with range
for i in {1..10}; do
echo "Count: $i"
done
# For loop with step
for i in {0..100..10}; do
echo "Step: $i"
done
# For loop over array
fruits=("apple" "banana" "cherry")
for fruit in "${fruits[@]}"; do
echo "Fruit: $fruit"
done
While Loops
counter=1
while [[ $counter -le 5 ]]; do
echo "Counter: $counter"
((counter++))
done
# While reading file
while IFS= read -r line; do
echo "Line: $line"
done < "input.txt"
# Infinite loop with break
while true; do
echo "Press Ctrl+C to exit"
sleep 1
done
Until Loops
counter=1
until [[ $counter -gt 5 ]]; do
echo "Counter: $counter"
((counter++))
done
# Wait for service to start
attempts=0
until curl -f http://localhost:8080 &>/dev/null; do
((attempts++))
if [[ $attempts -ge 10 ]]; then
echo "Service failed to start"
exit 1
fi
echo "Waiting for service... (attempt $attempts)"
sleep 2
done
echo "Service is up!"
Practical Example: System Health Check Script
# system_health.sh - Check system health metrics
# Configuration
THRESHOLD_CPU=80
THRESHOLD_MEM=85
THRESHOLD_DISK=90
LOG_FILE="/var/log/system_health.log"
# Functions
log_alert() {
local message="$1"
local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
echo "[$timestamp] ALERT: $message" >> "$LOG_FILE"
echo "ALERT: $message"
}
check_cpu() {
local cpu_usage="$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)"
echo -n "CPU Usage: ${cpu_usage}% "
if [[ $(echo "$cpu_usage > $THRESHOLD_CPU" | bc) -eq 1 ]]; then
log_alert "High CPU usage: ${cpu_usage}%"
else
echo "[OK]"
fi
}
check_memory() {
local mem_total="$(free | grep Mem | awk '{print $2}')"
local mem_used="$(free | grep Mem | awk '{print $3}')"
local mem_percent="$(echo "scale=2; $mem_used * 100 / $mem_total" | bc)"
echo -n "Memory Usage: ${mem_percent}% "
if [[ $(echo "$mem_percent > $THRESHOLD_MEM" | bc) -eq 1 ]]; then
log_alert "High memory usage: ${mem_percent}%"
else
echo "[OK]"
fi
}
check_disk() {
df -h | grep -v tmpfs | tail -n +2 | while IFS= read -r line; do
local filesystem="$(echo "$line" | awk '{print $1}')"
local usage="$(echo "$line" | awk '{print $5}' | sed 's/%//')"
local mount="$(echo "$line" | awk '{print $6}')"
echo -n "Disk $filesystem ($mount): ${usage}% "
if [[ $usage -ge $THRESHOLD_DISK ]]; then
log_alert "High disk usage on $filesystem ($mount): ${usage}%"
else
echo "[OK]"
fi
done
}
check_services() {
local services=("ssh" "nginx" "postgresql")
for service in "${services[@]}"; do
if systemctl is-active --quiet "$service"; then
echo "Service $service: [RUNNING]"
else
log_alert "Service $service is not running"
fi
done
}
# Main execution
echo "=== System Health Check $(date) ==="
echo
check_cpu
check_memory
check_disk
check_services
echo
echo "=== Check Complete ==="
3. Functions & Arguments
What are Functions?
Functions: Reusable blocks of code that perform specific tasks. Functions help organize scripts, avoid code duplication, and make scripts more readable and maintainable.
Function Syntax and Usage
Basic Function Definition
function_name() {
# Function body
echo "This is a function"
}
# Method 2: Using 'function' keyword
function function_name {
echo "This is also a function"
}
# Calling a function
function_name
# Function with parameters
greet() {
local name="$1"
local time="$2"
echo "Good $time, $name!"
}
greet "Alice" "morning"
Return Values
is_even() {
local number="$1"
if (( number % 2 == 0 )); then
return 0 # Success (true)
else
return 1 # Failure (false)
fi
}
is_even 42
if [[ $? -eq 0 ]]; then
echo "Number is even"
else
echo "Number is odd"
fi
# Return data via stdout
get_timestamp() {
date '+%Y-%m-%d %H:%M:%S'
}
timestamp="$(get_timestamp)"
echo "Current time: $timestamp"
Variable Scope
GLOBAL_VAR="I'm global"
demo_scope() {
# Local variables (only in function)
local LOCAL_VAR="I'm local"
echo "Inside function:"
echo " Global: $GLOBAL_VAR"
echo " Local: $LOCAL_VAR"
# Modify global variable
GLOBAL_VAR="Modified inside function"
}
echo "Before function: $GLOBAL_VAR"
demo_scope
echo "After function: $GLOBAL_VAR"
echo "Trying to access LOCAL_VAR outside: ${LOCAL_VAR:-undefined}"
Advanced Argument Handling
Shift Operator
process_args() {
echo "Total arguments: $#"
while [[ $# -gt 0 ]]; do
echo "Argument: $1"
shift # Move to next argument
done
}
process_args "apple" "banana" "cherry"
# Output:
Total arguments: 3
Argument: apple
Argument: banana
Argument: cherry
getopts for Option Parsing
parse_options() {
local OPTIND # Reset getopts
local verbose=0
local output_file=""
while getopts ":ho:v" opt; do
case "$opt" in
h)
show_help
exit 0
;;
v)
verbose=1
;;
o)
output_file="$OPTARG"
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
:)
echo "Option -$OPTARG requires an argument" >&2
exit 1
;;
esac
done
shift $((OPTIND-1)) # Remove processed options
# Remaining arguments are positional
for arg in "$@"; do
echo "Positional argument: $arg"
done
echo "Verbose: $verbose"
echo "Output file: ${output_file:-not set}"
}
parse_options "$@"
Practical Function Examples for DevOps
Logging Function
local level="$1"
local message="$2"
local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
local log_file="${3:-/var/log/script.log}"
case "$level" in
INFO) local color="\033[0;32m" ;; # Green
WARN) local color="\033[0;33m" ;; # Yellow
ERROR) local color="\033[0;31m" ;; # Red
DEBUG) local color="\033[0;36m" ;; # Cyan
*) local color="\033[0m" ;; # Default
esac
local reset="\033[0m"
# Print to console with color
echo -e "${color}[$timestamp] [$level] $message${reset}"
# Write to log file without color codes
echo "[$timestamp] [$level] $message" >> "$log_file"
}
# Usage:
log "INFO" "Script started successfully"
log "WARN" "Disk space is running low"
log "ERROR" "Failed to connect to database"
Validation Functions
local ip="$1"
local stat=1
if [[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
OIFS=$IFS
IFS='.'
ip=($ip)
IFS=$OIFS
if [[ ${ip[0]} -le 255 ]] && [[ ${ip[1]} -le 255 ]] && \
[[ ${ip[2]} -le 255 ]] && [[ ${ip[3]} -le 255 ]]; then
stat=0
fi
fi
return $stat
}
validate_email() {
local email="$1"
local regex="^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
if [[ $email =~ $regex ]]; then
return 0
else
return 1
fi
}
# Usage:
validate_ip "192.168.1.1"
if [[ $? -eq 0 ]]; then
echo "Valid IP address"
else
echo "Invalid IP address"
fi
Utility Functions
[[ $EUID -eq 0 ]] && return 0 || return 1
}
check_dependencies() {
local deps=("$@")
local missing=()
for dep in "${deps[@]}"; do
if ! command -v "$dep" &>/dev/null; then
missing+=("$dep")
fi
done
if [[ ${#missing[@]} -gt 0 ]]; then
echo "Missing dependencies: ${missing[*]}" >&2
return 1
fi
return 0
}
send_notification() {
local subject="$1"
local message="$2"
local recipient="${3:-admin@example.com}"
if command -v mail &>/dev/null; then
echo "$message" | mail -s "$subject" "$recipient"
elif command -v curl &>/dev/null; then
# Send to webhook
curl -X POST -H "Content-Type: application/json" \
-d "{\"text\":\"$subject: $message\"}" \
https://hooks.slack.com/services/...
else
echo "Notification: $subject - $message" >&2
fi
}
Complete Function Library for DevOps
# functions.sh - DevOps utility function library
# Load this library in your scripts:
# source functions.sh
# ==================== ERROR HANDLING ====================
die() {
local msg="$1"
local code="${2:-1}"
log "ERROR" "$msg"
exit "$code"
}
check_error() {
local exit_code="$?"
local message="$1"
if [[ $exit_code -ne 0 ]]; then
die "$message (exit code: $exit_code)" "$exit_code"
fi
}
# ==================== SYSTEM CHECKS ====================
require_root() {
is_root || die "This script must be run as root"
}
check_disk_space() {
local path="${1:-/}"
local threshold="${2:-90}"
local usage="$(df "$path" | tail -1 | awk '{print $5}' | sed 's/%//')"
if [[ $usage -ge $threshold ]]; then
log "WARN" "Disk usage on $path is ${usage}% (threshold: ${threshold}%)"
return 1
fi
return 0
}
# ==================== FILE OPERATIONS ====================
backup_file() {
local file="$1"
local backup_dir="${2:-/backups}"
if [[ ! -f "$file" ]]; then
log "ERROR" "File not found: $file"
return 1
fi
mkdir -p "$backup_dir"
local timestamp="$(date '+%Y%m%d_%H%M%S')"
local backup_file="$backup_dir/$(basename "$file").$timestamp.bak"
cp -p "$file" "$backup_file"
check_error "Failed to backup $file"
log "INFO" "Backed up $file to $backup_file"
echo "$backup_file"
}
# ==================== NETWORK OPERATIONS ====================
wait_for_port() {
local host="$1"
local port="$2"
local timeout="${3:-30}"
local interval="${4:-2}"
local start_time="$(date +%s)"
local end_time="$((start_time + timeout))"
while [[ $(date +%s) -lt $end_time ]]; do
if timeout 1 bash -c "cat < /dev/null > /dev/tcp/$host/$port" 2>/dev/null; then
log "INFO" "Port $port on $host is open"
return 0
fi
sleep "$interval"
done
log "ERROR" "Timeout waiting for port $port on $host"
return 1
}
# ==================== MAIN GUARD ====================
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
echo "This is a library file. Source it in your scripts."
exit 1
fi
4. Error Handling in Shell Scripts
Why Error Handling Matters
Error Handling: The process of anticipating, detecting, and resolving programming, application, or communication errors. Proper error handling makes scripts robust, reliable, and easier to debug.
❌ Bad Practice: No Error Handling
# What could go wrong?
cd /nonexistent/directory
rm -rf *.log
echo "Cleanup complete!"
Problems: Script continues after errors, misleading success message, potential data loss.
✅ Good Practice: Proper Error Handling
# Safe version with error handling
set -euo pipefail
cleanup() {
log "ERROR" "Script failed. Cleaning up..."
# Add cleanup logic here
}
trap cleanup ERR
trap 'cleanup; exit 1' INT TERM
if ! cd /nonexistent/directory; then
die "Directory doesn't exist"
fi
if ! rm -rf *.log; then
die "Failed to remove log files"
fi
echo "Cleanup complete!"
Benefits: Stops on errors, provides clear error messages, cleans up resources, handles interrupts.
Error Handling Techniques
Exit Codes
# 0 = Success, 1-255 = Failure
ls /tmp
echo "Exit code: $?" # Usually 0
ls /nonexistent
echo "Exit code: $?" # Usually 2
# Check exit code
if ls /tmp &>/dev/null; then
echo "Command succeeded"
else
echo "Command failed"
fi
# Exit with custom code
validate_input() {
if [[ -z "$1" ]]; then
echo "Error: Input required" >&2
exit 100
fi
}
set Options
set -e
false # Script exits here
echo "This line won't execute"
# set -u: Treat unset variables as error
set -u
echo "$UNDEFINED_VAR" # Script exits
# set -o pipefail: Pipeline fails if any command fails
set -o pipefail
false | true
echo "Exit code: $?" # 1 (not 0)
# Recommended combination
set -euo pipefail
# Temporarily disable
set +e # Disable exit on error
false
echo "This executes even though false failed"
set -e # Re-enable
trap Command
cleanup() {
echo "Cleaning up..."
rm -f /tmp/tempfile.$$
}
trap cleanup EXIT # Run on script exit
trap cleanup ERR # Run on error
trap 'echo "Interrupted"; exit 1' INT # Ctrl+C
# Multiple traps
handle_error() {
local line="$1"
local cmd="$2"
local code="${3:-0}"
echo "Error on line $line: $cmd (exit code: $code)" >&2
}
trap 'handle_error $LINENO "$BASH_COMMAND" $?' ERR
Error Recovery Strategies
Retry Logic
local cmd="$1"
local max_attempts="${2:-3}"
local delay="${3:-2}"
local attempt=1
while [[ $attempt -le $max_attempts ]]; do
if eval "$cmd"; then
return 0
fi
log "WARN" "Attempt $attempt failed. Retrying in ${delay}s..."
sleep "$delay"
((attempt++))
done
log "ERROR" "Command failed after $max_attempts attempts: $cmd"
return 1
}
# Usage:
retry_command "curl -f http://example.com" 5 3
Fallback Strategy
local url="$1"
local output="$2"
local fallback_url="$3"
log "INFO" "Downloading from primary source: $url"
if curl -f -L -o "$output" "$url"; then
log "INFO" "Download successful from primary source"
return 0
fi
log "WARN" "Primary download failed. Trying fallback: $fallback_url"
if curl -f -L -o "$output" "$fallback_url"; then
log "INFO" "Download successful from fallback"
return 0
fi
die "Failed to download from all sources"
}
Complete Error Handling Framework
# error_framework.sh - Comprehensive error handling
# ==================== CONFIGURATION ====================
LOG_FILE="/var/log/script_errors.log"
MAX_RETRIES=3
RETRY_DELAY=2
# ==================== ERROR FUNCTIONS ====================
init_error_handling() {
set -euo pipefail
trap 'handle_error $LINENO "$BASH_COMMAND" $?' ERR
trap 'handle_exit' EXIT
trap 'handle_interrupt' INT TERM
}
handle_error() {
local line="$1"
local command="$2"
local exit_code="$3"
local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
local error_msg="[$timestamp] ERROR: Line $line: $command (exit code: $exit_code)"
# Log to file
echo "$error_msg" >> "$LOG_FILE"
# Print to stderr
echo -e "\033[0;31m$error_msg\033[0m" >&2
# Optional: Send alert
send_alert "Script Error" "$error_msg"
}
handle_exit() {
local exit_code="$?"
if [[ $exit_code -eq 0 ]]; then
log "INFO" "Script completed successfully"
else
log "ERROR" "Script failed with exit code: $exit_code"
fi
# Cleanup resources
cleanup_resources
}
handle_interrupt() {
log "WARN" "Script interrupted by user"
cleanup_resources
exit 1
}
safe_execute() {
local cmd="$1"
local error_msg="${2:-Failed to execute: $cmd}"
if ! eval "$cmd"; then
die "$error_msg"
fi
}
with_retry() {
local cmd="$1"
local max_retries="${2:-$MAX_RETRIES}"
local delay="${3:-$RETRY_DELAY}"
local attempt=1
while true; do
if eval "$cmd"; then
return 0
fi
if [[ $attempt -ge $max_retries ]]; then
die "Command failed after $max_retries attempts: $cmd"
fi
log "WARN" "Attempt $attempt failed. Retrying in ${delay}s..."
sleep "$delay"
((attempt++))
done
}
# ==================== USAGE EXAMPLE ====================
main() {
init_error_handling
log "INFO" "Starting script execution"
# Safe file operations
safe_execute "mkdir -p /tmp/test" "Failed to create directory"
# With retry logic
with_retry "curl -f http://example.com" 5 3
# More operations...
safe_execute "touch /tmp/test/file.txt"
log "INFO" "Script completed successfully"
}
main "$@"
5. Automating System Tasks
The Power of Automation
System Automation: Using scripts to perform routine system administration tasks automatically, reducing human error, saving time, and ensuring consistency across environments.
Common Automation Tasks
User Management
# user_management.sh - Automate user administration
create_user() {
local username="$1"
local ssh_key="$2"
# Check if user exists
if id "$username" &>/dev/null; then
log "WARN" "User $username already exists"
return 0
fi
# Create user with home directory
useradd -m -s /bin/bash "$username"
check_error "Failed to create user $username"
# Set up SSH key
local ssh_dir="/home/$username/.ssh"
mkdir -p "$ssh_dir"
echo "$ssh_key" >> "$ssh_dir/authorized_keys"
chmod 700 "$ssh_dir"
chmod 600 "$ssh_dir/authorized_keys"
chown -R "$username:$username" "$ssh_dir"
# Add to sudoers (optional)
echo "$username ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/"$username"
chmod 440 /etc/sudoers.d/"$username"
log "INFO" "User $username created successfully"
}
# Usage: create_user "alice" "ssh-rsa AAAAB3NzaC1yc2E..."
Package Management
# package_manager.sh - Cross-distro package management
install_packages() {
local packages=("$@")
if command -v apt-get &>/dev/null; then
# Debian/Ubuntu
apt-get update
apt-get install -y "${packages[@]}"
elif command -v yum &>/dev/null; then
# RHEL/CentOS
yum install -y "${packages[@]}"
elif command -v dnf &>/dev/null; then
# Fedora
dnf install -y "${packages[@]}"
elif command -v zypper &>/dev/null; then
# openSUSE
zypper install -y "${packages[@]}"
else
die "Unsupported package manager"
fi
}
update_system() {
if command -v apt-get &>/dev/null; then
apt-get update && apt-get upgrade -y
elif command -v yum &>/dev/null; then
yum update -y
elif command -v dnf &>/dev/null; then
dnf update -y
fi
}
# Usage: install_packages nginx postgresql redis
Service Management
# service_manager.sh - Manage system services
ensure_service_running() {
local service="$1"
if systemctl is-active --quiet "$service"; then
log "INFO" "Service $service is already running"
else
log "INFO" "Starting service $service"
systemctl start "$service"
fi
# Enable on boot
systemctl enable "$service" &>/dev/null
}
restart_service() {
local service="$1"
log "INFO" "Restarting service $service"
systemctl restart "$service"
check_error "Failed to restart $service"
}
get_service_status() {
local service="$1"
if systemctl is-active --quiet "$service"; then
echo "running"
else
echo "stopped"
fi
}
Advanced Automation Examples
Log Rotation Automation
# log_rotator.sh - Automate log rotation
LOG_DIR="/var/log/app"
MAX_DAYS=30
MAX_SIZE="100M"
rotate_logs() {
local log_dir="${1:-$LOG_DIR}"
find "$log_dir" -name "*.log" -type f | while IFS= read -r logfile; do
# Check file size
local size="$(stat -c%s "$logfile")"
local max_bytes="$(echo "$MAX_SIZE" | sed 's/M/*1024*1024/;s/K/*1024/;s/G/*1024*1024*1024/' | bc)"
if [[ $size -gt $max_bytes ]]; then
rotate_single_log "$logfile"
fi
done
# Clean old backups
find "$log_dir" -name "*.log.*" -type f -mtime +$MAX_DAYS -delete
}
rotate_single_log() {
local logfile="$1"
local timestamp="$(date '+%Y%m%d_%H%M%S')"
local backup_file="${logfile}.${timestamp}"
log "INFO" "Rotating $logfile to $backup_file"
# Copy and truncate
cp "$logfile" "$backup_file"
if [[ $? -eq 0 ]]; then
: > "$logfile" # Truncate original
fi
# Compress backup
gzip "$backup_file"
}
Configuration Management
# config_manager.sh - Manage configuration files
deploy_config() {
local src="$1"
local dest="$2"
local backup="${3:-yes}"
if [[ ! -f "$src" ]]; then
die "Source file not found: $src"
fi
# Backup existing file
if [[ -f "$dest" ]] && [[ "$backup" == "yes" ]]; then
local backup_file="${dest}.backup.$(date +%s)"
cp -p "$dest" "$backup_file"
log "INFO" "Backed up $dest to $backup_file"
fi
# Deploy new config
cp -p "$src" "$dest"
check_error "Failed to deploy config to $dest"
# Set permissions
chmod 644 "$dest"
log "INFO" "Configuration deployed to $dest"
}
validate_config() {
local config_file="$1"
local service="$2"
case "$service" in
nginx)
nginx -t -c "$config_file"
;;
apache2|httpd)
apachectl configtest -f "$config_file"
;;
postgresql)
sudo -u postgres pg_ctl -D /var/lib/postgresql/data -t 10 status
;;
*)
log "WARN" "No validation available for $service"
;;
esac
}
Complete System Provisioning Script
# provision_server.sh - Complete server provisioning
# ==================== CONFIGURATION ====================
SSH_PORT=2222
ADMIN_USER="devops"
ADMIN_SSH_KEY="ssh-rsa AAAAB3NzaC1yc2E... devops@company.com"
PACKAGES=("nginx" "postgresql" "redis" "python3" "git" "curl" "fail2ban" "ufw")
# ==================== MAIN PROVISIONING ====================
provision_server() {
require_root
init_error_handling
log "INFO" "Starting server provisioning"
# Step 1: System update
log "INFO" "Updating system packages"
update_system
# Step 2: Install packages
log "INFO" "Installing required packages"
install_packages "${PACKAGES[@]}"
# Step 3: Create admin user
log "INFO" "Creating admin user: $ADMIN_USER"
create_user "$ADMIN_USER" "$ADMIN_SSH_KEY"
# Step 4: Secure SSH
log "INFO" "Configuring SSH security"
secure_ssh "$SSH_PORT" "$ADMIN_USER"
# Step 5: Configure firewall
log "INFO" "Configuring firewall"
setup_firewall "$SSH_PORT"
# Step 6: Configure services
log "INFO" "Configuring services"
configure_nginx
configure_postgresql
configure_redis
# Step 7: Enable monitoring
log "INFO" "Setting up monitoring"
setup_monitoring
# Step 8: Final cleanup
log "INFO" "Cleaning up temporary files"
apt-get autoremove -y
apt-get clean
log "SUCCESS" "Server provisioning completed successfully!"
echo "=========================================="
echo "Provisioning Summary:"
echo "- SSH Port: $SSH_PORT"
echo "- Admin User: $ADMIN_USER"
echo "- Installed Packages: ${PACKAGES[*]}"
echo "=========================================="
}
# Helper functions (simplified for example)
secure_ssh() {
local port="$1"
local admin_user="$2"
# Backup original config
cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup
# Apply secure configuration
cat > /etc/ssh/sshd_config << EOF
Port $port
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers $admin_user
X11Forwarding no
ClientAliveInterval 300
ClientAliveCountMax 2
EOF
systemctl restart sshd
}
setup_firewall() {
local ssh_port="$1"
ufw --force reset
ufw default deny incoming
ufw default allow outgoing
ufw allow "$ssh_port/tcp"
ufw allow 80/tcp
ufw allow 443/tcp
ufw --force enable
}
# Run provisioning
provision_server
6. Script Scheduling with Cron
What is Cron?
Cron: A time-based job scheduler in Unix-like operating systems. Cron enables users to schedule jobs (commands or scripts) to run periodically at fixed times, dates, or intervals.
Cron Syntax Explained
Cron Time Format
0-59
0-23
1-31
1-12
0-7 (0=Sun, 7=Sun)
* = Any value, = Value list separator (1,3,5)- = Range of values (1-5)/ = Step values (*/2 = every 2 units)? = No specific value (only in some cron)L = Last day of month/weekW = Nearest weekday
Common Cron Examples
Basic Scheduling
30 2 * * * /path/to/script.sh
# Run every hour at minute 0
0 * * * * /path/to/script.sh
# Run every 15 minutes
*/15 * * * * /path/to/script.sh
# Run every Monday at 3:15 PM
15 15 * * 1 /path/to/script.sh
# Run on the 1st of every month at 6:00 AM
0 6 1 * * /path/to/script.sh
# Run every weekday at 9:00 AM
0 9 * * 1-5 /path/to/script.sh
Advanced Scheduling
0 9-17/2 * * 1-5 /path/to/script.sh
# Run on specific days of month (1st and 15th)
0 8 1,15 * * /path/to/script.sh
# Run on last day of month at midnight
0 0 L * * /path/to/script.sh
# Run every 10 minutes during work hours on weekdays
*/10 9-17 * * 1-5 /path/to/script.sh
# Run at 5 PM on the last Friday of every month
0 17 * * 5L /path/to/script.sh
# Run on January 1st at 00:01
1 0 1 1 * /path/to/script.sh
Special Strings
@reboot /path/to/script.sh
# @yearly or @annually - Run once a year
@yearly /path/to/script.sh
# Same as: 0 0 1 1 *
# @monthly - Run once a month
@monthly /path/to/script.sh
# Same as: 0 0 1 * *
# @weekly - Run once a week
@weekly /path/to/script.sh
# Same as: 0 0 * * 0
# @daily or @midnight - Run once a day
@daily /path/to/script.sh
# Same as: 0 0 * * *
# @hourly - Run once an hour
@hourly /path/to/script.sh
# Same as: 0 * * * *
# @every_minute - Run every minute (custom)
* * * * * /path/to/script.sh
Cron Management Commands
User Crontabs
$ crontab -e
# List your cron jobs
$ crontab -l
# Remove all your cron jobs
$ crontab -r
# Load cron jobs from file
$ crontab cronjobs.txt
# View other user's crontab (root only)
$ sudo crontab -u username -l
# Edit system crontab
$ sudo crontab -e
System Crontabs
/etc/crontab
/etc/cron.d/ # Directory for package cron jobs
/etc/cron.hourly/
/etc/cron.daily/
/etc/cron.weekly/
/etc/cron.monthly/
# /etc/crontab format (includes user field)
# Example:
# m h dom mon dow user command
17 * * * * root cd / && run-parts --report /etc/cron.hourly
25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )
# Create a system cron file
$ sudo nano /etc/cron.d/myapp
# Add: * * * * * root /path/to/script.sh
Cron Best Practices
Script Design for Cron
# cron_script.sh - Designed for cron execution
# Set PATH explicitly
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# Set working directory
cd /path/to/working/directory || exit 1
# Logging setup
LOG_FILE=/var/log/myscript.log
exec >> "$LOG_FILE" 2>&1
echo "=== $(date) - Script started ==="
# Main script logic
/path/to/command --option
echo "=== $(date) - Script completed ==="
Cron Environment
# Common issues and solutions:
# 1. PATH may not include what you need
# Solution: Set PATH at top of script
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# 2. No interactive shell environment
# Solution: Source necessary files
. /etc/profile
. ~/.bash_profile
# 3. No terminal, so no $DISPLAY
# Solution: Don't use GUI commands
# Or set DISPLAY if needed
DISPLAY=:0
# 4. Working directory is user's home
# Solution: cd to script directory
cd "$(dirname "$0")" || exit 1
# 5. Mail is sent to user by default
# Solution: Redirect output or set MAILTO
MAILTO="" # Disable email
MAILTO="admin@example.com" # Send to specific address
Preventing Overlap
acquire_lock() {
local lockfile="/tmp/$(basename "$0").lock"
if [[ -f "$lockfile" ]]; then
local pid="$(cat "$lockfile")"
if kill -0 "$pid" 2>/dev/null; then
echo "Script is already running (PID: $pid)" >&2
exit 1
fi
fi
echo $$ > "$lockfile"
trap 'rm -f "$lockfile"' EXIT
}
# Use flock command (more robust)
run_with_lock() {
local lockfile="/var/lock/$(basename "$0").lock"
(
flock -n 9 || exit 1
# Your script commands here
echo "Running exclusive code"
sleep 10
) 9>"$lockfile"
}
Complete Cron Job Management System
# cron_manager.sh - Manage and monitor cron jobs
CRON_DIR="/etc/cron.d"
BACKUP_DIR="/backup/cron"
LOG_DIR="/var/log/cronjobs"
add_cron_job() {
local name="$1"
local schedule="$2"
local command="$3"
local user="${4:-root}"
local log_file="${LOG_DIR}/${name}.log"
# Create log directory
mkdir -p "$LOG_DIR"
# Build cron entry with logging
local cron_entry="$schedule $user $command >> $log_file 2>&1"
# Write to cron file
echo "$cron_entry" > "$CRON_DIR/$name"
log "INFO" "Added cron job: $name"
}
remove_cron_job() {
local name="$1"
if [[ -f "$CRON_DIR/$name" ]]; then
# Backup before removal
mkdir -p "$BACKUP_DIR"
cp "$CRON_DIR/$name" "$BACKUP_DIR/${name}.backup.$(date +%s)"
rm -f "$CRON_DIR/$name"
log "INFO" "Removed cron job: $name"
else
log "WARN" "Cron job not found: $name"
fi
}
list_cron_jobs() {
echo "=== System Cron Jobs ==="
echo
# List system crontab
if [[ -f /etc/crontab ]]; then
echo "/etc/crontab:"
grep -v '^#' /etc/crontab | grep -v '^$'
echo
fi
# List cron.d files
if [[ -d "$CRON_DIR" ]]; then
echo "/etc/cron.d/ files:"
for file in "$CRON_DIR"/*; do
if [[ -f "$file" ]]; then
echo "$(basename "$file"):"
grep -v '^#' "$file" | grep -v '^$'
echo
fi
done
fi
}
monitor_cron_logs() {
local days="${1:-7}"
echo "=== Cron Job Execution Report (last $days days) ==="
echo
# Check system cron logs
if [[ -f /var/log/syslog ]]; then
echo "System cron executions:"
grep -i "CRON" /var/log/syslog | grep "$(date --date="-$days days" +'%b %d')" | tail -20
echo
fi
# Check custom cron logs
if [[ -d "$LOG_DIR" ]]; then
echo "Custom cron job logs:"
find "$LOG_DIR" -name "*.log" -type f -mtime -"$days" | while IFS= read -r logfile; do
local job_name="$(basename "$logfile" .log)"
local last_run="$(tail -1 "$logfile" 2>/dev/null | grep -o '[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}' | tail -1)"
local status="OK"
if tail -10 "$logfile" 2>/dev/null | grep -qi "error\|failed"; then
status="ERROR"
fi
printf "%-20s %-25s %s\n" "$job_name" "${last_run:-never}" "$status"
done
fi
}
# Usage examples
case "$1" in
add)
add_cron_job "$2" "$3" "$4" "$5"
;;
remove)
remove_cron_job "$2"
;;
list)
list_cron_jobs
;;
monitor)
monitor_cron_logs "$2"
;;
*)
echo "Usage: $0 {add|remove|list|monitor}"
exit 1
;;
esac
Cron Best Practices Checklist
- ✅ Use absolute paths - Cron has minimal PATH
- ✅ Set working directory - Use cd at script start
- ✅ Implement logging - Redirect output to log files
- ✅ Handle errors - Check exit codes in scripts
- ✅ Prevent overlaps - Use lock files or flock
- ✅ Test scripts manually - Before adding to cron
- ✅ Use descriptive names - For cron files and logs
- ✅ Monitor execution - Regularly check cron logs
- ✅ Backup crontabs - Before making changes
- ✅ Consider alternatives - systemd timers for complex scheduling
7. Real-world Script Examples
Backup Script
Complete Automated Backup System
# backup_manager.sh - Enterprise-grade backup solution
# ==================== CONFIGURATION ====================
BACKUP_ROOT="/backups"
RETENTION_DAYS=30
COMPRESSION="gzip" # gzip, bzip2, xz, or none
ENCRYPTION_KEY="" # Set for encryption (optional)
REMOTE_HOST="backup-server"
REMOTE_PATH="/remote/backups"
# Backup sources (directory: exclude_patterns)
declare -A BACKUP_SOURCES=(
["/etc"]="*.tmp,*.log"
["/home"]="*.mp3,*.mp4,*.iso"
["/var/www"]="cache,*.log"
["/opt/app"]="temp,logs"
)
# Database backups (type: credentials)
declare -A DATABASES=(
["mysql"]="root:password"
["postgres"]="postgres"
["mongodb"]=""
)
# ==================== FUNCTIONS ====================
init_backup() {
local backup_type="$1"
local timestamp="$(date '+%Y%m%d_%H%M%S')"
local backup_dir="$BACKUP_ROOT/$backup_type/$timestamp"
mkdir -p "$backup_dir"
check_error "Failed to create backup directory: $backup_dir"
log "INFO" "Starting $backup_type backup to $backup_dir"
echo "$backup_dir"
}
backup_filesystem() {
local backup_dir="$1"
for source_dir in "${!BACKUP_SOURCES[@]}"; do
local exclude_patterns="${BACKUP_SOURCES[$source_dir]}"
local backup_name="$(basename "$source_dir")"
local output_file="$backup_dir/${backup_name}.tar"
log "INFO" "Backing up $source_dir"
# Build exclude options
local exclude_opts=""
IFS=',' read -ra patterns <<< "$exclude_patterns"
for pattern in "${patterns[@]}"; do
exclude_opts+=" --exclude='$pattern'"
done
# Create tar archive
eval "tar cf $output_file $exclude_opts $source_dir 2>/dev/null"
check_error "Failed to backup $source_dir"
# Compress if requested
compress_file "$output_file"
done
}
backup_databases() {
local backup_dir="$1"
for db_type in "${!DATABASES[@]}"; do
local credentials="${DATABASES[$db_type]}"
case "$db_type" in
mysql)
backup_mysql "$backup_dir" "$credentials"
;;
postgres)
backup_postgres "$backup_dir" "$credentials"
;;
mongodb)
backup_mongodb "$backup_dir"
;;
esac
done
}
backup_mysql() {
local backup_dir="$1"
local credentials="$2"
if ! command -v mysqldump &>/dev/null; then
log "WARN" "mysqldump not found, skipping MySQL backup"
return
fi
log "INFO" "Backing up MySQL databases"
# Get database list
local dbs="$(mysql -u"${credentials%:*}" -p"${credentials#*:}" -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema|performance_schema|mysql|sys)")"
for db in $dbs; do
local backup_file="$backup_dir/mysql_${db}.sql"
mysqldump -u"${credentials%:*}" -p"${credentials#*:}" --single-transaction --routines --triggers "$db" > "$backup_file"
check_error "Failed to backup MySQL database: $db"
compress_file "$backup_file"
done
}
sync_to_remote() {
local backup_dir="$1"
if [[ -z "$REMOTE_HOST" ]] || [[ -z "$REMOTE_PATH" ]]; then
log "INFO" "Remote backup not configured"
return
fi
log "INFO" "Syncing backup to remote server: $REMOTE_HOST"
# Use rsync for efficient transfer
rsync -avz --delete "$backup_dir" "$REMOTE_HOST:$REMOTE_PATH/"
check_error "Failed to sync backup to remote server"
}
cleanup_old_backups() {
log "INFO" "Cleaning up backups older than $RETENTION_DAYS days"
find "$BACKUP_ROOT" -type f -name "*.tar*" -o -name "*.sql*" -o -name "*.dump*" | \
while IFS= read -r file; do
if [[ $(find "$file" -mtime +$RETENTION_DAYS -print) ]]; then
rm -f "$file"
log "DEBUG" "Removed old backup: $file"
fi
done
# Remove empty directories
find "$BACKUP_ROOT" -type d -empty -delete
}
generate_backup_report() {
local backup_dir="$1"
local report_file="$backup_dir/backup_report.txt"
echo "=== BACKUP REPORT $(date) ===" > "$report_file"
echo "Backup directory: $backup_dir" >> "$report_file"
echo "Total size: $(du -sh "$backup_dir" | cut -f1)" >> "$report_file"
echo "" >> "$report_file"
echo "=== FILES BACKED UP ===" >> "$report_file"
find "$backup_dir" -type f -name "*.tar*" -exec ls -lh {} \; >> "$report_file"
echo "" >> "$report_file"
echo "=== DATABASE BACKUPS ===" >> "$report_file"
find "$backup_dir" -type f -name "*.sql*" -o -name "*.dump*" -exec ls -lh {} \; >> "$report_file"
}
# ==================== MAIN ====================
main() {
require_root
init_error_handling
local backup_type="full"
local backup_dir=""
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
--incremental)
backup_type="incremental"
;;
--dry-run)
DRY_RUN=1
;;
esac
shift
done
log "INFO" "Starting $backup_type backup"
# Initialize backup directory
backup_dir=$(init_backup "$backup_type")
# Perform backups
backup_filesystem "$backup_dir"
backup_databases "$backup_dir"
# Sync to remote
sync_to_remote "$backup_dir"
# Cleanup old backups
cleanup_old_backups
# Generate report
generate_backup_report "$backup_dir"
log "SUCCESS" "Backup completed successfully: $backup_dir"
send_notification "Backup Completed" "$backup_type backup completed successfully"
}
main "$@"
Monitoring Script
Enterprise System Monitoring Script
# system_monitor.sh - Comprehensive system monitoring
# ==================== CONFIGURATION ====================
ALERT_THRESHOLDS=(
["CPU"]=80
["MEMORY"]=85
["DISK"]=90
["LOAD"]=5.0
)
SERVICES_TO_MONITOR=(
"nginx"
"postgresql"
"redis"
"ssh"
"cron"
)
PORTS_TO_CHECK=(
["80"]="HTTP"
["443"]="HTTPS"
["22"]="SSH"
["5432"]="PostgreSQL"
["6379"]="Redis"
)
LOG_PATTERNS=(
["ERROR"]="/var/log/syslog"
["Failed password"]="/var/log/auth.log"
["Out of memory"]="/var/log/kern.log"
)
# ==================== MONITORING FUNCTIONS ====================
check_cpu() {
local cpu_usage="$(top -bn1 | grep "Cpu(s)" | awk '{print $2 + $4}')"
local threshold="${ALERT_THRESHOLDS[CPU]}"
printf "CPU Usage: %.1f%%" "$cpu_usage"
if (( $(echo "$cpu_usage > $threshold" | bc -l) )); then
send_alert "High CPU Usage" "CPU usage is ${cpu_usage}% (threshold: ${threshold}%)"
return 1
fi
return 0
}
check_memory() {
local mem_info="$(free -m)"
local total="$(echo "$mem_info" | grep Mem | awk '{print $2}')"
local used="$(echo "$mem_info" | grep Mem | awk '{print $3}')"
local percentage="$(echo "scale=2; $used * 100 / $total" | bc)"
local threshold="${ALERT_THRESHOLDS[MEMORY]}"
printf "Memory: %s/%s MB (%.1f%%)" "$used" "$total" "$percentage"
if (( $(echo "$percentage > $threshold" | bc -l) )); then
send_alert "High Memory Usage" "Memory usage is ${percentage}% (threshold: ${threshold}%)"
return 1
fi
return 0
}
check_disk() {
local errors=0
local threshold="${ALERT_THRESHOLDS[DISK]}"
df -h | grep -v tmpfs | tail -n +2 | while IFS= read -r line; do
local filesystem="$(echo "$line" | awk '{print $1}')"
local usage="$(echo "$line" | awk '{print $5}' | sed 's/%//')"
local mount="$(echo "$line" | awk '{print $6}')"
printf "Disk %s (%s): %s%%\\n" "$filesystem" "$mount" "$usage"
if [[ $usage -ge $threshold ]]; then
send_alert "High Disk Usage" "Disk $filesystem ($mount) is ${usage}% full (threshold: ${threshold}%)"
((errors++))
fi
done
return $errors
}
check_services() {
local errors=0
for service in "${SERVICES_TO_MONITOR[@]}"; do
if systemctl is-active --quiet "$service"; then
printf "Service %s: RUNNING\\n" "$service"
else
printf "Service %s: STOPPED\\n" "$service"
send_alert "Service Down" "Service $service is not running"
((errors++))
fi
done
return $errors
}
check_ports() {
local errors=0
for port in "${!PORTS_TO_CHECK[@]}"; do
local service_name="${PORTS_TO_CHECK[$port]}"
if timeout 1 bash -c "cat < /dev/null > /dev/tcp/127.0.0.1/$port" 2>/dev/null; then
printf "Port %s (%s): OPEN\\n" "$port" "$service_name"
else
printf "Port %s (%s): CLOSED\\n" "$port" "$service_name"
send_alert "Port Closed" "Port $port ($service_name) is not accessible"
((errors++))
fi
done
return $errors
}
check_logs() {
local errors=0
for pattern in "${!LOG_PATTERNS[@]}"; do
local logfile="${LOG_PATTERNS[$pattern]}"
local count="$(grep -c "$pattern" "$logfile" 2>/dev/null || echo 0)"
if [[ $count -gt 0 ]]; then
printf "Log pattern '%s': %d matches\\n" "$pattern" "$count"
if [[ $count -gt 10 ]]; then # Threshold for alerts
send_alert "Log Alert" "Found $count occurrences of '$pattern' in $logfile"
((errors++))
fi
fi
done
return $errors
}
generate_report() {
local total_errors="$1"
local report_file="/var/log/system_monitor_$(date +%Y%m%d).log"
echo "=== SYSTEM MONITORING REPORT $(date) ===" > "$report_file"
echo "Total alerts generated: $total_errors" >> "$report_file"
echo "Status: $([[ $total_errors -eq 0 ]] && echo "HEALTHY" || echo "ISSUES DETECTED")" >> "$report_file"
echo "" >> "$report_file"
}
# ==================== MAIN ====================
main() {
require_root
init_error_handling
local total_errors=0
log "INFO" "Starting system monitoring"
echo "=== SYSTEM MONITORING CHECK $(date) ==="
echo
# Run all checks
echo "1. CPU Usage:"
check_cpu || ((total_errors++))
echo
echo "2. Memory Usage:"
check_memory || ((total_errors++))
echo
echo "3. Disk Usage:"
check_disk || ((total_errors++))
echo
echo "4. Service Status:"
check_services || ((total_errors++))
echo
echo "5. Port Availability:"
check_ports || ((total_errors++))
echo
echo "6. Log Analysis:"
check_logs || ((total_errors++))
echo
# Generate report
generate_report "$total_errors"
if [[ $total_errors -eq 0 ]]; then
log "SUCCESS" "All system checks passed"
else
log "WARN" "Found $total_errors issues that need attention"
fi
echo "=== MONITORING COMPLETE ==="
echo "Total issues detected: $total_errors"
}
# Run monitoring
main "$@"
Production-Ready Script Checklist
- ✅ Error handling: set -euo pipefail, trap, check exit codes
- ✅ Logging: Comprehensive logging with timestamps and levels
- ✅ Configuration: External config files or environment variables
- ✅ Input validation: Validate all user inputs and arguments
- ✅ Idempotency: Script can be run multiple times safely
- ✅ Documentation: Clear usage instructions and examples
- ✅ Security: No hardcoded passwords, proper permissions
- ✅ Portability: Works across different Linux distributions
- ✅ Resource management: Clean up temporary files
- ✅ Testing: Unit tests or at least manual testing scenarios
- ✅ Monitoring: Scripts should report their own status
- ✅ Version control: All scripts in version control with change history
Shell Scripting Best Practices
Do's
- Use
#!/usr/bin/env bashfor portability - Enable strict mode:
set -euo pipefail - Use functions to organize code
- Validate all inputs and arguments
- Use meaningful variable names
- Comment complex logic
- Implement proper error handling
- Use arrays for lists of items
- Quote all variable expansions
- Use
[[ ]]for conditionals - Prefer
$( )over backticks - Use local variables in functions
- Test scripts in a safe environment
Don'ts
- Don't use ALL_CAPS for non-exported variables
- Don't parse ls output (use globs or find)
- Don't use
cdwithout checking return code - Don't use
echofor variable data without quoting - Don't ignore exit codes
- Don't write scripts without error handling
- Don't use deprecated syntax
- Don't hardcode paths (use variables)
- Don't run as root unnecessarily
- Don't create world-writable files
- Don't use
cat file | grep pattern(usegrep pattern file) - Don't forget to test edge cases
Shell Scripting Quick Reference
Essential Commands
$ bash -x script.sh # Debug mode
$ shellcheck script.sh # Lint script
$ time ./script.sh # Measure execution time
$ strace -f ./script.sh # Trace system calls
Testing & Debugging
set -x # Print commands as they execute
set -v # Print shell input lines
trap 'echo "Line $LINENO: $BASH_COMMAND"' DEBUG
PS4='+ $LINENO: ' # Custom debug prompt
Useful One-Liners
diff <(ls dir1) <(ls dir2)
# Read lines from file
while IFS= read -r line; do echo "$line"; done < file
# Generate sequence
for i in {1..10}; do echo "$i"; done
# Check if command exists
command -v cmd >/dev/null 2>&1
Practice Scenarios for DevOps Scripting
- Create a script that automatically provisions a new server with your preferred stack (web server, database, monitoring).
- Write a log analysis script that identifies the top 10 IP addresses with failed SSH attempts and automatically blocks them.
- Develop a backup script that supports incremental backups, encryption, and remote synchronization.
- Create a monitoring script that alerts when disk usage exceeds 90%, memory usage exceeds 85%, or CPU load exceeds 5.0.
- Write a deployment script that can rollback to a previous version if health checks fail.
- Develop a user management script that can create users, set up SSH keys, and configure sudo access from a CSV file.
- Create a certificate management script that automatically renews Let's Encrypt certificates and reloads services.
- Write a Docker container management script that can start, stop, update, and monitor containers.
- Develop a configuration drift detection script that compares current system state with a known good baseline.
- Create a script that automatically scales application instances based on CPU load or request rate.
Key Takeaways
- Automation is power: Shell scripting turns repetitive tasks into automated workflows
- Start simple, build complex: Begin with basic scripts and add features gradually
- Error handling is non-negotiable: Robust scripts handle failures gracefully
- Documentation saves time: Comment your code and provide usage instructions
- Test thoroughly: Test scripts in safe environments before production
- Security matters: Don't hardcode secrets, use proper permissions
- Reusability pays off: Build libraries of functions for common tasks
- Monitoring is essential: Scripts should log their activities and report status
- Cron is your friend: Schedule regular tasks for maintenance and monitoring
- Continuous improvement: Refactor scripts as you learn better techniques
Mastering shell scripting is one of the most valuable skills for a DevOps professional. It enables you to automate infrastructure, deploy applications, monitor systems, and respond to incidents efficiently. Remember: the goal is not just to write scripts, but to create reliable, maintainable automation that makes systems more resilient and operations more efficient.
No comments:
Post a Comment