Saturday, December 27, 2025

Shell Scripting Mastery - DevOps Automation Guide

Shell Scripting Mastery - DevOps Automation Guide

Shell Scripting for DevOps

Published: December 2025 | Topic: Automation & Scripting for DevOps

Automation is the superpower of DevOps. Shell scripting enables you to automate repetitive tasks, manage infrastructure, deploy applications, and monitor systems efficiently. Mastering Bash scripting is essential for any DevOps professional looking to streamline operations and build reliable automation pipelines.

Why Shell Scripting Matters for DevOps

Shell scripting provides:

  • Automation: Eliminate manual, repetitive tasks
  • Consistency: Ensure identical execution across environments
  • Infrastructure as Code: Manage infrastructure through scripts
  • Rapid Prototyping: Quickly test ideas and workflows
  • Cross-platform Compatibility: Run on any Unix-like system
  • Integration: Connect different tools and services
  • Monitoring: Create custom monitoring solutions
  • Backup & Recovery: Automate data protection strategies

The DevOps Scripting Workflow

Planning
Define requirements
Scripting
Write Bash script
Testing
Debug & validate
Automation
Schedule with cron
Monitoring
Logs & alerts

1. Introduction to Bash Scripting

What is Bash?

Bash (Bourne Again SHell): The default command-line interpreter for most Linux distributions and macOS. It's a powerful scripting language that allows you to automate system administration tasks, process data, and control program execution.

Your First Bash Script

#!/bin/bash
# hello.sh - My first Bash script
# This script prints a greeting message

echo "Hello, World!"
echo "Today is: $(date)"
echo "Current user: $USER"
echo "Working directory: $(pwd)"

Script Execution Methods

Method 1: Using bash command

# Make script executable
$ chmod +x hello.sh

# Execute with bash
$ bash hello.sh
$ ./hello.sh # Requires shebang and executable permission

# Output:
Hello, World!
Today is: Mon Dec 1 10:00:01 UTC 2025
Current user: alice
Working directory: /home/alice

Method 2: Source execution

# Source the script (runs in current shell)
$ source hello.sh
$ . hello.sh # Shorthand for source

# Difference:
# ./script.sh - Runs in new shell
# source script.sh - Runs in current shell
# (Environment changes persist with source)

Method 3: Debug mode

# Debug with -x (trace execution)
$ bash -x hello.sh

# Output with debugging:
+ echo 'Hello, World!'
Hello, World!
+ date
+ echo 'Today is: Mon Dec 1 10:00:01 UTC 2025'
Today is: Mon Dec 1 10:00:01 UTC 2025

# Debug inside script
set -x # Enable debugging
# ... script commands ...
set +x # Disable debugging

Essential Script Components

Shebang (#!)

# Always start scripts with shebang
#!/bin/bash - Most common
#!/usr/bin/env bash - More portable
#!/bin/sh - For POSIX shell scripts
#!/usr/bin/python - For Python scripts

# Why shebang matters:
1. Tells system which interpreter to use
2. Makes script self-contained
3. Allows direct execution (./script)
4. Ensures correct syntax highlighting

Comments & Documentation

# Single-line comments
# This is a comment
echo "Hello" # Inline comment

# Multi-line comments (using here-doc)
: << 'COMMENT'
This is a multi-line comment.
It can span multiple lines.
Useful for function documentation.
COMMENT

# Script header template
#!/bin/bash
#
# Script: backup.sh
# Author: DevOps Team
# Date: 2025-12-01
# Description: Automated backup script
# Usage: ./backup.sh [options]

Complete Starter Script Template

#!/usr/bin/env bash
#
# template.sh - Bash script template for DevOps
#
# Features:
# - Proper error handling
# - Command-line arguments
# - Logging
# - Configuration
#
# Usage: ./template.sh [OPTIONS]
#

# ============================================
# CONFIGURATION
# ============================================
SCRIPT_NAME="$(basename "$0")"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
LOG_FILE="/var/log/${SCRIPT_NAME%.*}.log"
TEMP_DIR="/tmp/${SCRIPT_NAME%.*}_$(date +%s)"

# ============================================
# FUNCTIONS
# ============================================

log_message() {
    local message="$1"
    local level="${2:-INFO}"
    local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
    echo "[$timestamp] [$level] $message" | tee -a "$LOG_FILE"
}

cleanup() {
    log_message "Cleaning up temporary files"
    rm -rf "$TEMP_DIR"
}

show_help() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS]

Options:
  -h, --help    Show this help message
  -v, --verbose  Enable verbose output
  -d, --dry-run  Simulate without making changes

Examples:
  $SCRIPT_NAME --verbose
  $SCRIPT_NAME --dry-run
EOF
}

# ============================================
# MAIN SCRIPT
# ============================================

# Set up error handling
set -euo pipefail
trap cleanup EXIT
trap 'log_message "Script interrupted" "ERROR"; exit 1' INT TERM

# Parse command-line arguments
VERBOSE=0
DRY_RUN=0

while [[ $# -gt 0 ]]; do
    case "$1" in
        -h|--help)
            show_help
            exit 0
            ;;
        -v|--verbose)
            VERBOSE=1
            shift
            ;;
        -d|--dry-run)
            DRY_RUN=1
            shift
            ;;
        *)
            log_message "Unknown option: $1" "ERROR"
            show_help
            exit 1
            ;;
    esac
done

# Main script logic
log_message "Script started" "INFO"

if [[ $VERBOSE -eq 1 ]]; then
    log_message "Verbose mode enabled" "DEBUG"
fi

if [[ $DRY_RUN -eq 1 ]]; then
    log_message "Dry run mode - no changes will be made" "WARNING"
fi

# Create temporary directory
mkdir -p "$TEMP_DIR"
log_message "Created temp directory: $TEMP_DIR" "DEBUG"

# Your script logic goes here
log_message "Performing main task..." "INFO"
sleep 2

log_message "Script completed successfully" "INFO"
exit 0

2. Variables, Loops, and Conditionals

Variables in Bash

Variables: Named storage locations that hold data. In Bash, variables can store strings, numbers, arrays, and command outputs. Variable names are case-sensitive and should be descriptive.

Variable Declaration & Usage

# Basic variable assignment
name="Alice"
count=42
path="/home/alice/documents"

# Using variables
echo "Hello, $name"
echo "Count is: $count"
echo "Path: ${path}" # Curly braces for clarity

# Command substitution
current_date="$(date)"
file_count="$(ls -1 | wc -l)"
kernel_version="$(uname -r)"

Special Variables

# Positional parameters
$0 # Script name
$1, $2, ... # Arguments
$# # Number of arguments
$@ # All arguments as separate words
$* # All arguments as single word

# Special parameters
$$ # Current process ID
$! # Last background process ID
$? # Exit status of last command

# Example script with arguments
#!/bin/bash
echo "Script: $0"
echo "First argument: $1"
echo "Total arguments: $#"
echo "All arguments: $@"

Arrays

# Creating arrays
fruits=("apple" "banana" "cherry")
numbers=(1 2 3 4 5)
files=("$(ls)")

# Accessing array elements
echo "First fruit: ${fruits[0]}"
echo "All fruits: ${fruits[@]}"
echo "Number of fruits: ${#fruits[@]}"

# Adding to array
fruits+=("orange" "grape")

# Associative arrays (Bash 4+)
declare -A config
config["host"]="localhost"
config["port"]="8080"
config["user"]="admin"
echo "Host: ${config[host]}"

Conditionals (if/else/case)

If-Else Statements

# Basic if statement
if [[ -f "/etc/passwd" ]]; then
    echo "File exists"
fi

# If-else statement
if [[ $USER == "root" ]]; then
    echo "Running as root"
else
    echo "Running as regular user"
fi

# If-elif-else statement
if [[ $1 -gt 100 ]]; then
    echo "Greater than 100"
elif [[ $1 -gt 50 ]]; then
    echo "Greater than 50"
else
    echo "50 or less"
fi

File Test Operators

# File tests
[[ -e "file" ]] # Exists
[[ -f "file" ]] # Regular file
[[ -d "dir" ]] # Directory
[[ -r "file" ]] # Readable
[[ -w "file" ]] # Writable
[[ -x "file" ]] # Executable
[[ -s "file" ]] # Not empty

# String tests
[[ -z "$str" ]] # Empty string
[[ -n "$str" ]] # Non-empty string
[[ "$str1" == "$str2" ]] # Equal
[[ "$str1" != "$str2" ]] # Not equal

# Numerical tests
[[ $a -eq $b ]] # Equal
[[ $a -ne $b ]] # Not equal
[[ $a -lt $b ]] # Less than
[[ $a -le $b ]] # Less or equal
[[ $a -gt $b ]] # Greater than
[[ $a -ge $b ]] # Greater or equal

Case Statements

# Case statement syntax
case "$1" in
    start)
        echo "Starting service..."
        # start commands
        ;;
    stop)
        echo "Stopping service..."
        # stop commands
        ;;
    restart)
        echo "Restarting service..."
        # restart commands
        ;;
    status)
        echo "Service status..."
        # status commands
        ;;
    *)
        echo "Usage: $0 {start|stop|restart|status}"
        exit 1
        ;;
esac

Loops in Bash

For Loops

# Basic for loop
for i in 1 2 3 4 5; do
    echo "Number: $i"
done

# For loop with range
for i in {1..10}; do
    echo "Count: $i"
done

# For loop with step
for i in {0..100..10}; do
    echo "Step: $i"
done

# For loop over array
fruits=("apple" "banana" "cherry")
for fruit in "${fruits[@]}"; do
    echo "Fruit: $fruit"
done

While Loops

# Basic while loop
counter=1
while [[ $counter -le 5 ]]; do
    echo "Counter: $counter"
    ((counter++))
done

# While reading file
while IFS= read -r line; do
    echo "Line: $line"
done < "input.txt"

# Infinite loop with break
while true; do
    echo "Press Ctrl+C to exit"
    sleep 1
done

Until Loops

# Until loop (opposite of while)
counter=1
until [[ $counter -gt 5 ]]; do
    echo "Counter: $counter"
    ((counter++))
done

# Wait for service to start
attempts=0
until curl -f http://localhost:8080 &>/dev/null; do
    ((attempts++))
    if [[ $attempts -ge 10 ]]; then
        echo "Service failed to start"
        exit 1
    fi
    echo "Waiting for service... (attempt $attempts)"
    sleep 2
done
echo "Service is up!"

Practical Example: System Health Check Script

#!/bin/bash
# system_health.sh - Check system health metrics

# Configuration
THRESHOLD_CPU=80
THRESHOLD_MEM=85
THRESHOLD_DISK=90
LOG_FILE="/var/log/system_health.log"

# Functions
log_alert() {
    local message="$1"
    local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
    echo "[$timestamp] ALERT: $message" >> "$LOG_FILE"
    echo "ALERT: $message"
}

check_cpu() {
    local cpu_usage="$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)"
    echo -n "CPU Usage: ${cpu_usage}% "
    if [[ $(echo "$cpu_usage > $THRESHOLD_CPU" | bc) -eq 1 ]]; then
        log_alert "High CPU usage: ${cpu_usage}%"
    else
        echo "[OK]"
    fi
}

check_memory() {
    local mem_total="$(free | grep Mem | awk '{print $2}')"
    local mem_used="$(free | grep Mem | awk '{print $3}')"
    local mem_percent="$(echo "scale=2; $mem_used * 100 / $mem_total" | bc)"
    echo -n "Memory Usage: ${mem_percent}% "
    if [[ $(echo "$mem_percent > $THRESHOLD_MEM" | bc) -eq 1 ]]; then
        log_alert "High memory usage: ${mem_percent}%"
    else
        echo "[OK]"
    fi
}

check_disk() {
    df -h | grep -v tmpfs | tail -n +2 | while IFS= read -r line; do
        local filesystem="$(echo "$line" | awk '{print $1}')"
        local usage="$(echo "$line" | awk '{print $5}' | sed 's/%//')"
        local mount="$(echo "$line" | awk '{print $6}')"
        echo -n "Disk $filesystem ($mount): ${usage}% "
        if [[ $usage -ge $THRESHOLD_DISK ]]; then
            log_alert "High disk usage on $filesystem ($mount): ${usage}%"
        else
            echo "[OK]"
        fi
    done
}

check_services() {
    local services=("ssh" "nginx" "postgresql")
    for service in "${services[@]}"; do
        if systemctl is-active --quiet "$service"; then
            echo "Service $service: [RUNNING]"
        else
            log_alert "Service $service is not running"
        fi
    done
}

# Main execution
echo "=== System Health Check $(date) ==="
echo
check_cpu
check_memory
check_disk
check_services
echo
echo "=== Check Complete ==="

3. Functions & Arguments

What are Functions?

Functions: Reusable blocks of code that perform specific tasks. Functions help organize scripts, avoid code duplication, and make scripts more readable and maintainable.

Function Syntax and Usage

Basic Function Definition

# Method 1: Traditional syntax
function_name() {
    # Function body
    echo "This is a function"
}

# Method 2: Using 'function' keyword
function function_name {
    echo "This is also a function"
}

# Calling a function
function_name

# Function with parameters
greet() {
    local name="$1"
    local time="$2"
    echo "Good $time, $name!"
}

greet "Alice" "morning"

Return Values

# Functions return exit codes (0-255)
is_even() {
    local number="$1"
    if (( number % 2 == 0 )); then
        return 0 # Success (true)
    else
        return 1 # Failure (false)
    fi
}

is_even 42
if [[ $? -eq 0 ]]; then
    echo "Number is even"
else
    echo "Number is odd"
fi

# Return data via stdout
get_timestamp() {
    date '+%Y-%m-%d %H:%M:%S'
}

timestamp="$(get_timestamp)"
echo "Current time: $timestamp"

Variable Scope

# Global variables (accessible everywhere)
GLOBAL_VAR="I'm global"

demo_scope() {
    # Local variables (only in function)
    local LOCAL_VAR="I'm local"
    
    echo "Inside function:"
    echo " Global: $GLOBAL_VAR"
    echo " Local: $LOCAL_VAR"
    
    # Modify global variable
    GLOBAL_VAR="Modified inside function"
}

echo "Before function: $GLOBAL_VAR"
demo_scope
echo "After function: $GLOBAL_VAR"
echo "Trying to access LOCAL_VAR outside: ${LOCAL_VAR:-undefined}"

Advanced Argument Handling

Shift Operator

# Process arguments one by one
process_args() {
    echo "Total arguments: $#"
    
    while [[ $# -gt 0 ]]; do
        echo "Argument: $1"
        shift # Move to next argument
    done
}

process_args "apple" "banana" "cherry"

# Output:
Total arguments: 3
Argument: apple
Argument: banana
Argument: cherry

getopts for Option Parsing

# Professional option parsing
parse_options() {
    local OPTIND # Reset getopts
    local verbose=0
    local output_file=""
    
    while getopts ":ho:v" opt; do
        case "$opt" in
            h)
                show_help
                exit 0
                ;;
            v)
                verbose=1
                ;;
            o)
                output_file="$OPTARG"
                ;;
            \?)
                echo "Invalid option: -$OPTARG" >&2
                exit 1
                ;;
            :)
                echo "Option -$OPTARG requires an argument" >&2
                exit 1
                ;;
        esac
    done
    
    shift $((OPTIND-1)) # Remove processed options
    
    # Remaining arguments are positional
    for arg in "$@"; do
        echo "Positional argument: $arg"
    done
    
    echo "Verbose: $verbose"
    echo "Output file: ${output_file:-not set}"
}

parse_options "$@"

Practical Function Examples for DevOps

Logging Function

log() {
    local level="$1"
    local message="$2"
    local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
    local log_file="${3:-/var/log/script.log}"
    
    case "$level" in
        INFO) local color="\033[0;32m" ;; # Green
        WARN) local color="\033[0;33m" ;; # Yellow
        ERROR) local color="\033[0;31m" ;; # Red
        DEBUG) local color="\033[0;36m" ;; # Cyan
        *) local color="\033[0m" ;; # Default
    esac
    
    local reset="\033[0m"
    
    # Print to console with color
    echo -e "${color}[$timestamp] [$level] $message${reset}"
    
    # Write to log file without color codes
    echo "[$timestamp] [$level] $message" >> "$log_file"
}

# Usage:
log "INFO" "Script started successfully"
log "WARN" "Disk space is running low"
log "ERROR" "Failed to connect to database"

Validation Functions

validate_ip() {
    local ip="$1"
    local stat=1
    
    if [[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
        OIFS=$IFS
        IFS='.'
        ip=($ip)
        IFS=$OIFS
        if [[ ${ip[0]} -le 255 ]] && [[ ${ip[1]} -le 255 ]] && \
                [[ ${ip[2]} -le 255 ]] && [[ ${ip[3]} -le 255 ]]; then
            stat=0
        fi
    fi
    return $stat
}

validate_email() {
    local email="$1"
    local regex="^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    
    if [[ $email =~ $regex ]]; then
        return 0
    else
        return 1
    fi
}

# Usage:
validate_ip "192.168.1.1"
if [[ $? -eq 0 ]]; then
    echo "Valid IP address"
else
    echo "Invalid IP address"
fi

Utility Functions

is_root() {
    [[ $EUID -eq 0 ]] && return 0 || return 1
}

check_dependencies() {
    local deps=("$@")
    local missing=()
    
    for dep in "${deps[@]}"; do
        if ! command -v "$dep" &>/dev/null; then
            missing+=("$dep")
        fi
    done
    
    if [[ ${#missing[@]} -gt 0 ]]; then
        echo "Missing dependencies: ${missing[*]}" >&2
        return 1
    fi
    return 0
}

send_notification() {
    local subject="$1"
    local message="$2"
    local recipient="${3:-admin@example.com}"
    
    if command -v mail &>/dev/null; then
        echo "$message" | mail -s "$subject" "$recipient"
    elif command -v curl &>/dev/null; then
        # Send to webhook
        curl -X POST -H "Content-Type: application/json" \
            -d "{\"text\":\"$subject: $message\"}" \
            https://hooks.slack.com/services/...
    else
        echo "Notification: $subject - $message" >&2
    fi
}

Complete Function Library for DevOps

#!/bin/bash
# functions.sh - DevOps utility function library

# Load this library in your scripts:
# source functions.sh

# ==================== ERROR HANDLING ====================

die() {
    local msg="$1"
    local code="${2:-1}"
    log "ERROR" "$msg"
    exit "$code"
}

check_error() {
    local exit_code="$?"
    local message="$1"
    
    if [[ $exit_code -ne 0 ]]; then
        die "$message (exit code: $exit_code)" "$exit_code"
    fi
}

# ==================== SYSTEM CHECKS ====================

require_root() {
    is_root || die "This script must be run as root"
}

check_disk_space() {
    local path="${1:-/}"
    local threshold="${2:-90}"
    
    local usage="$(df "$path" | tail -1 | awk '{print $5}' | sed 's/%//')"
    
    if [[ $usage -ge $threshold ]]; then
        log "WARN" "Disk usage on $path is ${usage}% (threshold: ${threshold}%)"
        return 1
    fi
    return 0
}

# ==================== FILE OPERATIONS ====================

backup_file() {
    local file="$1"
    local backup_dir="${2:-/backups}"
    
    if [[ ! -f "$file" ]]; then
        log "ERROR" "File not found: $file"
        return 1
    fi
    
    mkdir -p "$backup_dir"
    local timestamp="$(date '+%Y%m%d_%H%M%S')"
    local backup_file="$backup_dir/$(basename "$file").$timestamp.bak"
    
    cp -p "$file" "$backup_file"
    check_error "Failed to backup $file"
    
    log "INFO" "Backed up $file to $backup_file"
    echo "$backup_file"
}

# ==================== NETWORK OPERATIONS ====================

wait_for_port() {
    local host="$1"
    local port="$2"
    local timeout="${3:-30}"
    local interval="${4:-2}"
    
    local start_time="$(date +%s)"
    local end_time="$((start_time + timeout))"
    
    while [[ $(date +%s) -lt $end_time ]]; do
        if timeout 1 bash -c "cat < /dev/null > /dev/tcp/$host/$port" 2>/dev/null; then
            log "INFO" "Port $port on $host is open"
            return 0
        fi
        sleep "$interval"
    done

    log "ERROR" "Timeout waiting for port $port on $host"
    return 1
}

# ==================== MAIN GUARD ====================

if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
    echo "This is a library file. Source it in your scripts."
    exit 1
fi

4. Error Handling in Shell Scripts

Why Error Handling Matters

Error Handling: The process of anticipating, detecting, and resolving programming, application, or communication errors. Proper error handling makes scripts robust, reliable, and easier to debug.

❌ Bad Practice: No Error Handling

#!/bin/bash
# What could go wrong?

cd /nonexistent/directory
rm -rf *.log
echo "Cleanup complete!"

Problems: Script continues after errors, misleading success message, potential data loss.

✅ Good Practice: Proper Error Handling

#!/bin/bash
# Safe version with error handling

set -euo pipefail

cleanup() {
    log "ERROR" "Script failed. Cleaning up..."
    # Add cleanup logic here
}

trap cleanup ERR
trap 'cleanup; exit 1' INT TERM

if ! cd /nonexistent/directory; then
    die "Directory doesn't exist"
fi

if ! rm -rf *.log; then
    die "Failed to remove log files"
fi

echo "Cleanup complete!"

Benefits: Stops on errors, provides clear error messages, cleans up resources, handles interrupts.

Error Handling Techniques

Exit Codes

# Every command returns an exit code
# 0 = Success, 1-255 = Failure

ls /tmp
echo "Exit code: $?" # Usually 0

ls /nonexistent
echo "Exit code: $?" # Usually 2

# Check exit code
if ls /tmp &>/dev/null; then
    echo "Command succeeded"
else
    echo "Command failed"
fi

# Exit with custom code
validate_input() {
    if [[ -z "$1" ]]; then
        echo "Error: Input required" >&2
        exit 100
    fi
}

set Options

# set -e: Exit immediately on error
set -e
false # Script exits here
echo "This line won't execute"

# set -u: Treat unset variables as error
set -u
echo "$UNDEFINED_VAR" # Script exits

# set -o pipefail: Pipeline fails if any command fails
set -o pipefail
false | true
echo "Exit code: $?" # 1 (not 0)

# Recommended combination
set -euo pipefail

# Temporarily disable
set +e # Disable exit on error
false
echo "This executes even though false failed"
set -e # Re-enable

trap Command

# Trap errors
cleanup() {
    echo "Cleaning up..."
    rm -f /tmp/tempfile.$$
}

trap cleanup EXIT # Run on script exit
trap cleanup ERR # Run on error
trap 'echo "Interrupted"; exit 1' INT # Ctrl+C

# Multiple traps
handle_error() {
    local line="$1"
    local cmd="$2"
    local code="${3:-0}"
    echo "Error on line $line: $cmd (exit code: $code)" >&2
}

trap 'handle_error $LINENO "$BASH_COMMAND" $?' ERR

Error Recovery Strategies

Retry Logic

retry_command() {
    local cmd="$1"
    local max_attempts="${2:-3}"
    local delay="${3:-2}"
    
    local attempt=1
    
    while [[ $attempt -le $max_attempts ]]; do
        if eval "$cmd"; then
            return 0
        fi
        
        log "WARN" "Attempt $attempt failed. Retrying in ${delay}s..."
        sleep "$delay"
        ((attempt++))
    done
    
    log "ERROR" "Command failed after $max_attempts attempts: $cmd"
    return 1
}

# Usage:
retry_command "curl -f http://example.com" 5 3

Fallback Strategy

download_with_fallback() {
    local url="$1"
    local output="$2"
    local fallback_url="$3"
    
    log "INFO" "Downloading from primary source: $url"
    
    if curl -f -L -o "$output" "$url"; then
        log "INFO" "Download successful from primary source"
        return 0
    fi

    log "WARN" "Primary download failed. Trying fallback: $fallback_url"
    
    if curl -f -L -o "$output" "$fallback_url"; then
        log "INFO" "Download successful from fallback"
        return 0
    fi

    die "Failed to download from all sources"
}

Complete Error Handling Framework

#!/bin/bash
# error_framework.sh - Comprehensive error handling

# ==================== CONFIGURATION ====================
LOG_FILE="/var/log/script_errors.log"
MAX_RETRIES=3
RETRY_DELAY=2

# ==================== ERROR FUNCTIONS ====================

init_error_handling() {
    set -euo pipefail
    trap 'handle_error $LINENO "$BASH_COMMAND" $?' ERR
    trap 'handle_exit' EXIT
    trap 'handle_interrupt' INT TERM
}

handle_error() {
    local line="$1"
    local command="$2"
    local exit_code="$3"
    
    local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
    local error_msg="[$timestamp] ERROR: Line $line: $command (exit code: $exit_code)"
    
    # Log to file
    echo "$error_msg" >> "$LOG_FILE"
    
    # Print to stderr
    echo -e "\033[0;31m$error_msg\033[0m" >&2
    
    # Optional: Send alert
    send_alert "Script Error" "$error_msg"
}

handle_exit() {
    local exit_code="$?"
    
    if [[ $exit_code -eq 0 ]]; then
        log "INFO" "Script completed successfully"
    else
        log "ERROR" "Script failed with exit code: $exit_code"
    fi
    
    # Cleanup resources
    cleanup_resources
}

handle_interrupt() {
    log "WARN" "Script interrupted by user"
    cleanup_resources
    exit 1
}

safe_execute() {
    local cmd="$1"
    local error_msg="${2:-Failed to execute: $cmd}"
    
    if ! eval "$cmd"; then
        die "$error_msg"
    fi
}

with_retry() {
    local cmd="$1"
    local max_retries="${2:-$MAX_RETRIES}"
    local delay="${3:-$RETRY_DELAY}"
    
    local attempt=1
    
    while true; do
        if eval "$cmd"; then
            return 0
        fi
        
        if [[ $attempt -ge $max_retries ]]; then
            die "Command failed after $max_retries attempts: $cmd"
        fi
        
        log "WARN" "Attempt $attempt failed. Retrying in ${delay}s..."
        sleep "$delay"
        ((attempt++))
    done
}

# ==================== USAGE EXAMPLE ====================

main() {
    init_error_handling
    
    log "INFO" "Starting script execution"
    
    # Safe file operations
    safe_execute "mkdir -p /tmp/test" "Failed to create directory"
    
    # With retry logic
    with_retry "curl -f http://example.com" 5 3
    
    # More operations...
    safe_execute "touch /tmp/test/file.txt"
    
    log "INFO" "Script completed successfully"
}

main "$@"

5. Automating System Tasks

The Power of Automation

System Automation: Using scripts to perform routine system administration tasks automatically, reducing human error, saving time, and ensuring consistency across environments.

Common Automation Tasks

User Management

#!/bin/bash
# user_management.sh - Automate user administration

create_user() {
    local username="$1"
    local ssh_key="$2"
    
    # Check if user exists
    if id "$username" &>/dev/null; then
        log "WARN" "User $username already exists"
        return 0
    fi

    # Create user with home directory
    useradd -m -s /bin/bash "$username"
    check_error "Failed to create user $username"

    # Set up SSH key
    local ssh_dir="/home/$username/.ssh"
    mkdir -p "$ssh_dir"
    echo "$ssh_key" >> "$ssh_dir/authorized_keys"
    chmod 700 "$ssh_dir"
    chmod 600 "$ssh_dir/authorized_keys"
    chown -R "$username:$username" "$ssh_dir"

    # Add to sudoers (optional)
    echo "$username ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/"$username"
    chmod 440 /etc/sudoers.d/"$username"

    log "INFO" "User $username created successfully"
}

# Usage: create_user "alice" "ssh-rsa AAAAB3NzaC1yc2E..."

Package Management

#!/bin/bash
# package_manager.sh - Cross-distro package management

install_packages() {
    local packages=("$@")
    
    if command -v apt-get &>/dev/null; then
        # Debian/Ubuntu
        apt-get update
        apt-get install -y "${packages[@]}"
    elif command -v yum &>/dev/null; then
        # RHEL/CentOS
        yum install -y "${packages[@]}"
    elif command -v dnf &>/dev/null; then
        # Fedora
        dnf install -y "${packages[@]}"
    elif command -v zypper &>/dev/null; then
        # openSUSE
        zypper install -y "${packages[@]}"
    else
        die "Unsupported package manager"
    fi
}

update_system() {
    if command -v apt-get &>/dev/null; then
        apt-get update && apt-get upgrade -y
    elif command -v yum &>/dev/null; then
        yum update -y
    elif command -v dnf &>/dev/null; then
        dnf update -y
    fi
}

# Usage: install_packages nginx postgresql redis

Service Management

#!/bin/bash
# service_manager.sh - Manage system services

ensure_service_running() {
    local service="$1"
    
    if systemctl is-active --quiet "$service"; then
        log "INFO" "Service $service is already running"
    else
        log "INFO" "Starting service $service"
        systemctl start "$service"
    fi

    # Enable on boot
    systemctl enable "$service" &>/dev/null
}

restart_service() {
    local service="$1"
    
    log "INFO" "Restarting service $service"
    systemctl restart "$service"
    check_error "Failed to restart $service"
}

get_service_status() {
    local service="$1"
    
    if systemctl is-active --quiet "$service"; then
        echo "running"
    else
        echo "stopped"
    fi
}

Advanced Automation Examples

Log Rotation Automation

#!/bin/bash
# log_rotator.sh - Automate log rotation

LOG_DIR="/var/log/app"
MAX_DAYS=30
MAX_SIZE="100M"

rotate_logs() {
    local log_dir="${1:-$LOG_DIR}"
    
    find "$log_dir" -name "*.log" -type f | while IFS= read -r logfile; do
        # Check file size
        local size="$(stat -c%s "$logfile")"
        local max_bytes="$(echo "$MAX_SIZE" | sed 's/M/*1024*1024/;s/K/*1024/;s/G/*1024*1024*1024/' | bc)"
        
        if [[ $size -gt $max_bytes ]]; then
            rotate_single_log "$logfile"
        fi
    done

    # Clean old backups
    find "$log_dir" -name "*.log.*" -type f -mtime +$MAX_DAYS -delete
}

rotate_single_log() {
    local logfile="$1"
    local timestamp="$(date '+%Y%m%d_%H%M%S')"
    local backup_file="${logfile}.${timestamp}"
    
    log "INFO" "Rotating $logfile to $backup_file"
    
    # Copy and truncate
    cp "$logfile" "$backup_file"
    if [[ $? -eq 0 ]]; then
        : > "$logfile" # Truncate original
    fi

    # Compress backup
    gzip "$backup_file"
}

Configuration Management

#!/bin/bash
# config_manager.sh - Manage configuration files

deploy_config() {
    local src="$1"
    local dest="$2"
    local backup="${3:-yes}"
    
    if [[ ! -f "$src" ]]; then
        die "Source file not found: $src"
    fi

    # Backup existing file
    if [[ -f "$dest" ]] && [[ "$backup" == "yes" ]]; then
        local backup_file="${dest}.backup.$(date +%s)"
        cp -p "$dest" "$backup_file"
        log "INFO" "Backed up $dest to $backup_file"
    fi

    # Deploy new config
    cp -p "$src" "$dest"
    check_error "Failed to deploy config to $dest"

    # Set permissions
    chmod 644 "$dest"

    log "INFO" "Configuration deployed to $dest"
}

validate_config() {
    local config_file="$1"
    local service="$2"
    
    case "$service" in
        nginx)
            nginx -t -c "$config_file"
            ;;
        apache2|httpd)
            apachectl configtest -f "$config_file"
            ;;
        postgresql)
            sudo -u postgres pg_ctl -D /var/lib/postgresql/data -t 10 status
            ;;
        *)
            log "WARN" "No validation available for $service"
            ;;
    esac
}

Complete System Provisioning Script

#!/bin/bash
# provision_server.sh - Complete server provisioning

# ==================== CONFIGURATION ====================
SSH_PORT=2222
ADMIN_USER="devops"
ADMIN_SSH_KEY="ssh-rsa AAAAB3NzaC1yc2E... devops@company.com"
PACKAGES=("nginx" "postgresql" "redis" "python3" "git" "curl" "fail2ban" "ufw")

# ==================== MAIN PROVISIONING ====================

provision_server() {
    require_root
    init_error_handling

    log "INFO" "Starting server provisioning"

    # Step 1: System update
    log "INFO" "Updating system packages"
    update_system

    # Step 2: Install packages
    log "INFO" "Installing required packages"
    install_packages "${PACKAGES[@]}"

    # Step 3: Create admin user
    log "INFO" "Creating admin user: $ADMIN_USER"
    create_user "$ADMIN_USER" "$ADMIN_SSH_KEY"

    # Step 4: Secure SSH
    log "INFO" "Configuring SSH security"
    secure_ssh "$SSH_PORT" "$ADMIN_USER"

    # Step 5: Configure firewall
    log "INFO" "Configuring firewall"
    setup_firewall "$SSH_PORT"

    # Step 6: Configure services
    log "INFO" "Configuring services"
    configure_nginx
    configure_postgresql
    configure_redis

    # Step 7: Enable monitoring
    log "INFO" "Setting up monitoring"
    setup_monitoring

    # Step 8: Final cleanup
    log "INFO" "Cleaning up temporary files"
    apt-get autoremove -y
    apt-get clean

    log "SUCCESS" "Server provisioning completed successfully!"
    echo "=========================================="
    echo "Provisioning Summary:"
    echo "- SSH Port: $SSH_PORT"
    echo "- Admin User: $ADMIN_USER"
    echo "- Installed Packages: ${PACKAGES[*]}"
    echo "=========================================="
}

# Helper functions (simplified for example)
secure_ssh() {
    local port="$1"
    local admin_user="$2"
    
    # Backup original config
    cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup

    # Apply secure configuration
    cat > /etc/ssh/sshd_config << EOF
Port $port
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers $admin_user
X11Forwarding no
ClientAliveInterval 300
ClientAliveCountMax 2
EOF

    systemctl restart sshd
}

setup_firewall() {
    local ssh_port="$1"
    
    ufw --force reset
    ufw default deny incoming
    ufw default allow outgoing
    ufw allow "$ssh_port/tcp"
    ufw allow 80/tcp
    ufw allow 443/tcp
    ufw --force enable
}

# Run provisioning
provision_server

6. Script Scheduling with Cron

What is Cron?

Cron: A time-based job scheduler in Unix-like operating systems. Cron enables users to schedule jobs (commands or scripts) to run periodically at fixed times, dates, or intervals.

Cron Syntax Explained

Cron Time Format

Minute
0-59
Hour
0-23
Day of Month
1-31
Month
1-12
Day of Week
0-7 (0=Sun, 7=Sun)
Special Characters:
* = Any value
, = Value list separator (1,3,5)
- = Range of values (1-5)
/ = Step values (*/2 = every 2 units)
? = No specific value (only in some cron)
L = Last day of month/week
W = Nearest weekday

Common Cron Examples

Basic Scheduling

# Run every day at 2:30 AM
30 2 * * * /path/to/script.sh

# Run every hour at minute 0
0 * * * * /path/to/script.sh

# Run every 15 minutes
*/15 * * * * /path/to/script.sh

# Run every Monday at 3:15 PM
15 15 * * 1 /path/to/script.sh

# Run on the 1st of every month at 6:00 AM
0 6 1 * * /path/to/script.sh

# Run every weekday at 9:00 AM
0 9 * * 1-5 /path/to/script.sh

Advanced Scheduling

# Run every 2 hours during business hours (9 AM - 5 PM)
0 9-17/2 * * 1-5 /path/to/script.sh

# Run on specific days of month (1st and 15th)
0 8 1,15 * * /path/to/script.sh

# Run on last day of month at midnight
0 0 L * * /path/to/script.sh

# Run every 10 minutes during work hours on weekdays
*/10 9-17 * * 1-5 /path/to/script.sh

# Run at 5 PM on the last Friday of every month
0 17 * * 5L /path/to/script.sh

# Run on January 1st at 00:01
1 0 1 1 * /path/to/script.sh

Special Strings

# @reboot - Run at startup
@reboot /path/to/script.sh

# @yearly or @annually - Run once a year
@yearly /path/to/script.sh
# Same as: 0 0 1 1 *

# @monthly - Run once a month
@monthly /path/to/script.sh
# Same as: 0 0 1 * *

# @weekly - Run once a week
@weekly /path/to/script.sh
# Same as: 0 0 * * 0

# @daily or @midnight - Run once a day
@daily /path/to/script.sh
# Same as: 0 0 * * *

# @hourly - Run once an hour
@hourly /path/to/script.sh
# Same as: 0 * * * *

# @every_minute - Run every minute (custom)
* * * * * /path/to/script.sh

Cron Management Commands

User Crontabs

# Edit your personal crontab
$ crontab -e

# List your cron jobs
$ crontab -l

# Remove all your cron jobs
$ crontab -r

# Load cron jobs from file
$ crontab cronjobs.txt

# View other user's crontab (root only)
$ sudo crontab -u username -l

# Edit system crontab
$ sudo crontab -e

System Crontabs

# System-wide crontab files
/etc/crontab
/etc/cron.d/ # Directory for package cron jobs
/etc/cron.hourly/
/etc/cron.daily/
/etc/cron.weekly/
/etc/cron.monthly/

# /etc/crontab format (includes user field)
# Example:
# m h dom mon dow user command
17 *    * * *   root   cd / && run-parts --report /etc/cron.hourly
25 6    * * *   root   test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6    * * 7   root   test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6    1 * *   root   test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

# Create a system cron file
$ sudo nano /etc/cron.d/myapp
# Add: * * * * * root /path/to/script.sh

Cron Best Practices

Script Design for Cron

#!/bin/bash
# cron_script.sh - Designed for cron execution

# Set PATH explicitly
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# Set working directory
cd /path/to/working/directory || exit 1

# Logging setup
LOG_FILE=/var/log/myscript.log
exec >> "$LOG_FILE" 2>&1

echo "=== $(date) - Script started ==="

# Main script logic
/path/to/command --option

echo "=== $(date) - Script completed ==="

Cron Environment

# Cron runs with minimal environment
# Common issues and solutions:

# 1. PATH may not include what you need
# Solution: Set PATH at top of script
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# 2. No interactive shell environment
# Solution: Source necessary files
. /etc/profile
. ~/.bash_profile

# 3. No terminal, so no $DISPLAY
# Solution: Don't use GUI commands
# Or set DISPLAY if needed
DISPLAY=:0

# 4. Working directory is user's home
# Solution: cd to script directory
cd "$(dirname "$0")" || exit 1

# 5. Mail is sent to user by default
# Solution: Redirect output or set MAILTO
MAILTO="" # Disable email
MAILTO="admin@example.com" # Send to specific address

Preventing Overlap

# Use lock files to prevent overlapping runs
acquire_lock() {
    local lockfile="/tmp/$(basename "$0").lock"
    
    if [[ -f "$lockfile" ]]; then
        local pid="$(cat "$lockfile")"
        if kill -0 "$pid" 2>/dev/null; then
            echo "Script is already running (PID: $pid)" >&2
            exit 1
        fi
    fi

    echo $$ > "$lockfile"
    trap 'rm -f "$lockfile"' EXIT
}

# Use flock command (more robust)
run_with_lock() {
    local lockfile="/var/lock/$(basename "$0").lock"
    
    (
        flock -n 9 || exit 1
        # Your script commands here
        echo "Running exclusive code"
        sleep 10
    ) 9>"$lockfile"
}

Complete Cron Job Management System

#!/bin/bash
# cron_manager.sh - Manage and monitor cron jobs

CRON_DIR="/etc/cron.d"
BACKUP_DIR="/backup/cron"
LOG_DIR="/var/log/cronjobs"

add_cron_job() {
    local name="$1"
    local schedule="$2"
    local command="$3"
    local user="${4:-root}"
    local log_file="${LOG_DIR}/${name}.log"
    
    # Create log directory
    mkdir -p "$LOG_DIR"
    
    # Build cron entry with logging
    local cron_entry="$schedule $user $command >> $log_file 2>&1"
    
    # Write to cron file
    echo "$cron_entry" > "$CRON_DIR/$name"
    
    log "INFO" "Added cron job: $name"
}

remove_cron_job() {
    local name="$1"
    
    if [[ -f "$CRON_DIR/$name" ]]; then
        # Backup before removal
        mkdir -p "$BACKUP_DIR"
        cp "$CRON_DIR/$name" "$BACKUP_DIR/${name}.backup.$(date +%s)"
        
        rm -f "$CRON_DIR/$name"
        log "INFO" "Removed cron job: $name"
    else
        log "WARN" "Cron job not found: $name"
    fi
}

list_cron_jobs() {
    echo "=== System Cron Jobs ==="
    echo
    
    # List system crontab
    if [[ -f /etc/crontab ]]; then
        echo "/etc/crontab:"
        grep -v '^#' /etc/crontab | grep -v '^$'
        echo
    fi

    # List cron.d files
    if [[ -d "$CRON_DIR" ]]; then
        echo "/etc/cron.d/ files:"
        for file in "$CRON_DIR"/*; do
            if [[ -f "$file" ]]; then
                echo "$(basename "$file"):"
                grep -v '^#' "$file" | grep -v '^$'
                echo
            fi
        done
    fi
}

monitor_cron_logs() {
    local days="${1:-7}"
    
    echo "=== Cron Job Execution Report (last $days days) ==="
    echo
    
    # Check system cron logs
    if [[ -f /var/log/syslog ]]; then
        echo "System cron executions:"
        grep -i "CRON" /var/log/syslog | grep "$(date --date="-$days days" +'%b %d')" | tail -20
        echo
    fi

    # Check custom cron logs
    if [[ -d "$LOG_DIR" ]]; then
        echo "Custom cron job logs:"
        find "$LOG_DIR" -name "*.log" -type f -mtime -"$days" | while IFS= read -r logfile; do
            local job_name="$(basename "$logfile" .log)"
            local last_run="$(tail -1 "$logfile" 2>/dev/null | grep -o '[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}' | tail -1)"
            local status="OK"
            
            if tail -10 "$logfile" 2>/dev/null | grep -qi "error\|failed"; then
                status="ERROR"
            fi
            
            printf "%-20s %-25s %s\n" "$job_name" "${last_run:-never}" "$status"
        done
    fi
}

# Usage examples
case "$1" in
    add)
        add_cron_job "$2" "$3" "$4" "$5"
        ;;
    remove)
        remove_cron_job "$2"
        ;;
    list)
        list_cron_jobs
        ;;
    monitor)
        monitor_cron_logs "$2"
        ;;
    *)
        echo "Usage: $0 {add|remove|list|monitor}"
        exit 1
        ;;
esac

Cron Best Practices Checklist

  • ✅ Use absolute paths - Cron has minimal PATH
  • ✅ Set working directory - Use cd at script start
  • ✅ Implement logging - Redirect output to log files
  • ✅ Handle errors - Check exit codes in scripts
  • ✅ Prevent overlaps - Use lock files or flock
  • ✅ Test scripts manually - Before adding to cron
  • ✅ Use descriptive names - For cron files and logs
  • ✅ Monitor execution - Regularly check cron logs
  • ✅ Backup crontabs - Before making changes
  • ✅ Consider alternatives - systemd timers for complex scheduling

7. Real-world Script Examples

Backup Script

Complete Automated Backup System

#!/bin/bash
# backup_manager.sh - Enterprise-grade backup solution

# ==================== CONFIGURATION ====================
BACKUP_ROOT="/backups"
RETENTION_DAYS=30
COMPRESSION="gzip" # gzip, bzip2, xz, or none
ENCRYPTION_KEY="" # Set for encryption (optional)
REMOTE_HOST="backup-server"
REMOTE_PATH="/remote/backups"

# Backup sources (directory: exclude_patterns)
declare -A BACKUP_SOURCES=(
    ["/etc"]="*.tmp,*.log"
    ["/home"]="*.mp3,*.mp4,*.iso"
    ["/var/www"]="cache,*.log"
    ["/opt/app"]="temp,logs"
)

# Database backups (type: credentials)
declare -A DATABASES=(
    ["mysql"]="root:password"
    ["postgres"]="postgres"
    ["mongodb"]=""
)

# ==================== FUNCTIONS ====================

init_backup() {
    local backup_type="$1"
    local timestamp="$(date '+%Y%m%d_%H%M%S')"
    local backup_dir="$BACKUP_ROOT/$backup_type/$timestamp"
    
    mkdir -p "$backup_dir"
    check_error "Failed to create backup directory: $backup_dir"
    
    log "INFO" "Starting $backup_type backup to $backup_dir"
    echo "$backup_dir"
}

backup_filesystem() {
    local backup_dir="$1"
    
    for source_dir in "${!BACKUP_SOURCES[@]}"; do
        local exclude_patterns="${BACKUP_SOURCES[$source_dir]}"
        local backup_name="$(basename "$source_dir")"
        local output_file="$backup_dir/${backup_name}.tar"
    
        log "INFO" "Backing up $source_dir"
    
        # Build exclude options
        local exclude_opts=""
        IFS=',' read -ra patterns <<< "$exclude_patterns"
        for pattern in "${patterns[@]}"; do
            exclude_opts+=" --exclude='$pattern'"
        done

        # Create tar archive
        eval "tar cf $output_file $exclude_opts $source_dir 2>/dev/null"
        check_error "Failed to backup $source_dir"

        # Compress if requested
        compress_file "$output_file"
    done
}

backup_databases() {
    local backup_dir="$1"
    
    for db_type in "${!DATABASES[@]}"; do
        local credentials="${DATABASES[$db_type]}"
        
        case "$db_type" in
            mysql)
                backup_mysql "$backup_dir" "$credentials"
                ;;
            postgres)
                backup_postgres "$backup_dir" "$credentials"
                ;;
            mongodb)
                backup_mongodb "$backup_dir"
                ;;
        esac
    done
}

backup_mysql() {
    local backup_dir="$1"
    local credentials="$2"
    
    if ! command -v mysqldump &>/dev/null; then
        log "WARN" "mysqldump not found, skipping MySQL backup"
        return
    fi

    log "INFO" "Backing up MySQL databases"
    
    # Get database list
    local dbs="$(mysql -u"${credentials%:*}" -p"${credentials#*:}" -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema|performance_schema|mysql|sys)")"
    
    for db in $dbs; do
        local backup_file="$backup_dir/mysql_${db}.sql"
        mysqldump -u"${credentials%:*}" -p"${credentials#*:}" --single-transaction --routines --triggers "$db" > "$backup_file"
        check_error "Failed to backup MySQL database: $db"
        compress_file "$backup_file"
    done
}

sync_to_remote() {
    local backup_dir="$1"
    
    if [[ -z "$REMOTE_HOST" ]] || [[ -z "$REMOTE_PATH" ]]; then
        log "INFO" "Remote backup not configured"
        return
    fi

    log "INFO" "Syncing backup to remote server: $REMOTE_HOST"
    
    # Use rsync for efficient transfer
    rsync -avz --delete "$backup_dir" "$REMOTE_HOST:$REMOTE_PATH/"
    check_error "Failed to sync backup to remote server"
}

cleanup_old_backups() {
    log "INFO" "Cleaning up backups older than $RETENTION_DAYS days"
    
    find "$BACKUP_ROOT" -type f -name "*.tar*" -o -name "*.sql*" -o -name "*.dump*" | \
        while IFS= read -r file; do
            if [[ $(find "$file" -mtime +$RETENTION_DAYS -print) ]]; then
                rm -f "$file"
                log "DEBUG" "Removed old backup: $file"
            fi
        done

    # Remove empty directories
    find "$BACKUP_ROOT" -type d -empty -delete
}

generate_backup_report() {
    local backup_dir="$1"
    local report_file="$backup_dir/backup_report.txt"
    
    echo "=== BACKUP REPORT $(date) ===" > "$report_file"
    echo "Backup directory: $backup_dir" >> "$report_file"
    echo "Total size: $(du -sh "$backup_dir" | cut -f1)" >> "$report_file"
    echo "" >> "$report_file"
    
    echo "=== FILES BACKED UP ===" >> "$report_file"
    find "$backup_dir" -type f -name "*.tar*" -exec ls -lh {} \; >> "$report_file"
    
    echo "" >> "$report_file"
    echo "=== DATABASE BACKUPS ===" >> "$report_file"
    find "$backup_dir" -type f -name "*.sql*" -o -name "*.dump*" -exec ls -lh {} \; >> "$report_file"
}

# ==================== MAIN ====================

main() {
    require_root
    init_error_handling

    local backup_type="full"
    local backup_dir=""

    # Parse arguments
    while [[ $# -gt 0 ]]; do
        case "$1" in
            --incremental)
                backup_type="incremental"
                ;;
            --dry-run)
                DRY_RUN=1
                ;;
        esac
        shift
    done

    log "INFO" "Starting $backup_type backup"

    # Initialize backup directory
    backup_dir=$(init_backup "$backup_type")

    # Perform backups
    backup_filesystem "$backup_dir"
    backup_databases "$backup_dir"

    # Sync to remote
    sync_to_remote "$backup_dir"

    # Cleanup old backups
    cleanup_old_backups

    # Generate report
    generate_backup_report "$backup_dir"

    log "SUCCESS" "Backup completed successfully: $backup_dir"
    send_notification "Backup Completed" "$backup_type backup completed successfully"
}

main "$@"

Monitoring Script

Enterprise System Monitoring Script

#!/bin/bash
# system_monitor.sh - Comprehensive system monitoring

# ==================== CONFIGURATION ====================
ALERT_THRESHOLDS=(
    ["CPU"]=80
    ["MEMORY"]=85
    ["DISK"]=90
    ["LOAD"]=5.0
)

SERVICES_TO_MONITOR=(
    "nginx"
    "postgresql"
    "redis"
    "ssh"
    "cron"
)

PORTS_TO_CHECK=(
    ["80"]="HTTP"
    ["443"]="HTTPS"
    ["22"]="SSH"
    ["5432"]="PostgreSQL"
    ["6379"]="Redis"
)

LOG_PATTERNS=(
    ["ERROR"]="/var/log/syslog"
    ["Failed password"]="/var/log/auth.log"
    ["Out of memory"]="/var/log/kern.log"
)

# ==================== MONITORING FUNCTIONS ====================

check_cpu() {
    local cpu_usage="$(top -bn1 | grep "Cpu(s)" | awk '{print $2 + $4}')"
    local threshold="${ALERT_THRESHOLDS[CPU]}"
    
    printf "CPU Usage: %.1f%%" "$cpu_usage"
    
    if (( $(echo "$cpu_usage > $threshold" | bc -l) )); then
        send_alert "High CPU Usage" "CPU usage is ${cpu_usage}% (threshold: ${threshold}%)"
        return 1
    fi
    return 0
}

check_memory() {
    local mem_info="$(free -m)"
    local total="$(echo "$mem_info" | grep Mem | awk '{print $2}')"
    local used="$(echo "$mem_info" | grep Mem | awk '{print $3}')"
    local percentage="$(echo "scale=2; $used * 100 / $total" | bc)"
    local threshold="${ALERT_THRESHOLDS[MEMORY]}"
    
    printf "Memory: %s/%s MB (%.1f%%)" "$used" "$total" "$percentage"
    
    if (( $(echo "$percentage > $threshold" | bc -l) )); then
        send_alert "High Memory Usage" "Memory usage is ${percentage}% (threshold: ${threshold}%)"
        return 1
    fi
    return 0
}

check_disk() {
    local errors=0
    local threshold="${ALERT_THRESHOLDS[DISK]}"
    
    df -h | grep -v tmpfs | tail -n +2 | while IFS= read -r line; do
        local filesystem="$(echo "$line" | awk '{print $1}')"
        local usage="$(echo "$line" | awk '{print $5}' | sed 's/%//')"
        local mount="$(echo "$line" | awk '{print $6}')"
    
        printf "Disk %s (%s): %s%%\\n" "$filesystem" "$mount" "$usage"
    
        if [[ $usage -ge $threshold ]]; then
            send_alert "High Disk Usage" "Disk $filesystem ($mount) is ${usage}% full (threshold: ${threshold}%)"
            ((errors++))
        fi
    done

    return $errors
}

check_services() {
    local errors=0
    
    for service in "${SERVICES_TO_MONITOR[@]}"; do
        if systemctl is-active --quiet "$service"; then
            printf "Service %s: RUNNING\\n" "$service"
        else
            printf "Service %s: STOPPED\\n" "$service"
            send_alert "Service Down" "Service $service is not running"
            ((errors++))
        fi
    done

    return $errors
}

check_ports() {
    local errors=0
    
    for port in "${!PORTS_TO_CHECK[@]}"; do
        local service_name="${PORTS_TO_CHECK[$port]}"
        
        if timeout 1 bash -c "cat < /dev/null > /dev/tcp/127.0.0.1/$port" 2>/dev/null; then
            printf "Port %s (%s): OPEN\\n" "$port" "$service_name"
        else
            printf "Port %s (%s): CLOSED\\n" "$port" "$service_name"
            send_alert "Port Closed" "Port $port ($service_name) is not accessible"
            ((errors++))
        fi
    done

    return $errors
}

check_logs() {
    local errors=0
    
    for pattern in "${!LOG_PATTERNS[@]}"; do
        local logfile="${LOG_PATTERNS[$pattern]}"
        local count="$(grep -c "$pattern" "$logfile" 2>/dev/null || echo 0)"
    
        if [[ $count -gt 0 ]]; then
            printf "Log pattern '%s': %d matches\\n" "$pattern" "$count"
            if [[ $count -gt 10 ]]; then # Threshold for alerts
                send_alert "Log Alert" "Found $count occurrences of '$pattern' in $logfile"
                ((errors++))
            fi
        fi
    done

    return $errors
}

generate_report() {
    local total_errors="$1"
    local report_file="/var/log/system_monitor_$(date +%Y%m%d).log"
    
    echo "=== SYSTEM MONITORING REPORT $(date) ===" > "$report_file"
    echo "Total alerts generated: $total_errors" >> "$report_file"
    echo "Status: $([[ $total_errors -eq 0 ]] && echo "HEALTHY" || echo "ISSUES DETECTED")" >> "$report_file"
    echo "" >> "$report_file"
}

# ==================== MAIN ====================

main() {
    require_root
    init_error_handling

    local total_errors=0
    
    log "INFO" "Starting system monitoring"

    echo "=== SYSTEM MONITORING CHECK $(date) ==="
echo

    # Run all checks
    echo "1. CPU Usage:"
    check_cpu || ((total_errors++))
echo

    echo "2. Memory Usage:"
    check_memory || ((total_errors++))
echo

    echo "3. Disk Usage:"
    check_disk || ((total_errors++))
echo

    echo "4. Service Status:"
    check_services || ((total_errors++))
echo

    echo "5. Port Availability:"
    check_ports || ((total_errors++))
echo

    echo "6. Log Analysis:"
    check_logs || ((total_errors++))
echo

    # Generate report
    generate_report "$total_errors"

    if [[ $total_errors -eq 0 ]]; then
        log "SUCCESS" "All system checks passed"
    else
        log "WARN" "Found $total_errors issues that need attention"
    fi

    echo "=== MONITORING COMPLETE ==="
    echo "Total issues detected: $total_errors"
}

# Run monitoring
main "$@"

Production-Ready Script Checklist

  • ✅ Error handling: set -euo pipefail, trap, check exit codes
  • ✅ Logging: Comprehensive logging with timestamps and levels
  • ✅ Configuration: External config files or environment variables
  • ✅ Input validation: Validate all user inputs and arguments
  • ✅ Idempotency: Script can be run multiple times safely
  • ✅ Documentation: Clear usage instructions and examples
  • ✅ Security: No hardcoded passwords, proper permissions
  • ✅ Portability: Works across different Linux distributions
  • ✅ Resource management: Clean up temporary files
  • ✅ Testing: Unit tests or at least manual testing scenarios
  • ✅ Monitoring: Scripts should report their own status
  • ✅ Version control: All scripts in version control with change history

Shell Scripting Best Practices

Do's

  • Use #!/usr/bin/env bash for portability
  • Enable strict mode: set -euo pipefail
  • Use functions to organize code
  • Validate all inputs and arguments
  • Use meaningful variable names
  • Comment complex logic
  • Implement proper error handling
  • Use arrays for lists of items
  • Quote all variable expansions
  • Use [[ ]] for conditionals
  • Prefer $( ) over backticks
  • Use local variables in functions
  • Test scripts in a safe environment

Don'ts

  • Don't use ALL_CAPS for non-exported variables
  • Don't parse ls output (use globs or find)
  • Don't use cd without checking return code
  • Don't use echo for variable data without quoting
  • Don't ignore exit codes
  • Don't write scripts without error handling
  • Don't use deprecated syntax
  • Don't hardcode paths (use variables)
  • Don't run as root unnecessarily
  • Don't create world-writable files
  • Don't use cat file | grep pattern (use grep pattern file)
  • Don't forget to test edge cases

Shell Scripting Quick Reference

Essential Commands

$ bash -n script.sh # Syntax check
$ bash -x script.sh # Debug mode
$ shellcheck script.sh # Lint script
$ time ./script.sh # Measure execution time
$ strace -f ./script.sh # Trace system calls

Testing & Debugging

# Debugging options
set -x # Print commands as they execute
set -v # Print shell input lines
trap 'echo "Line $LINENO: $BASH_COMMAND"' DEBUG
PS4='+ $LINENO: ' # Custom debug prompt

Useful One-Liners

# Process substitution
diff <(ls dir1) <(ls dir2)
# Read lines from file
while IFS= read -r line; do echo "$line"; done < file
# Generate sequence
for i in {1..10}; do echo "$i"; done
# Check if command exists
command -v cmd >/dev/null 2>&1

Practice Scenarios for DevOps Scripting

  1. Create a script that automatically provisions a new server with your preferred stack (web server, database, monitoring).
  2. Write a log analysis script that identifies the top 10 IP addresses with failed SSH attempts and automatically blocks them.
  3. Develop a backup script that supports incremental backups, encryption, and remote synchronization.
  4. Create a monitoring script that alerts when disk usage exceeds 90%, memory usage exceeds 85%, or CPU load exceeds 5.0.
  5. Write a deployment script that can rollback to a previous version if health checks fail.
  6. Develop a user management script that can create users, set up SSH keys, and configure sudo access from a CSV file.
  7. Create a certificate management script that automatically renews Let's Encrypt certificates and reloads services.
  8. Write a Docker container management script that can start, stop, update, and monitor containers.
  9. Develop a configuration drift detection script that compares current system state with a known good baseline.
  10. Create a script that automatically scales application instances based on CPU load or request rate.

Key Takeaways

  • Automation is power: Shell scripting turns repetitive tasks into automated workflows
  • Start simple, build complex: Begin with basic scripts and add features gradually
  • Error handling is non-negotiable: Robust scripts handle failures gracefully
  • Documentation saves time: Comment your code and provide usage instructions
  • Test thoroughly: Test scripts in safe environments before production
  • Security matters: Don't hardcode secrets, use proper permissions
  • Reusability pays off: Build libraries of functions for common tasks
  • Monitoring is essential: Scripts should log their activities and report status
  • Cron is your friend: Schedule regular tasks for maintenance and monitoring
  • Continuous improvement: Refactor scripts as you learn better techniques

Mastering shell scripting is one of the most valuable skills for a DevOps professional. It enables you to automate infrastructure, deploy applications, monitor systems, and respond to incidents efficiently. Remember: the goal is not just to write scripts, but to create reliable, maintainable automation that makes systems more resilient and operations more efficient.

No comments:

Post a Comment

Shell Scripting Mastery - DevOps Automation Guide

Shell Scripting Mastery - DevOps Automation Guide Shell Scripting for DevOps Published: ...