Shell Scripting Mastery - DevOps Automation Guide

Shell Scripting for DevOps

Published: December 2025 | Topic: Automation & Scripting for DevOps

Automation is the superpower of DevOps. Shell scripting enables you to automate repetitive tasks, manage infrastructure, deploy applications, and monitor systems efficiently. Mastering Bash scripting is essential for any DevOps professional looking to streamline operations and build reliable automation pipelines.

Why Shell Scripting Matters for DevOps

Shell scripting provides:

Automation: Eliminate manual, repetitive tasks
Consistency: Ensure identical execution across environments
Infrastructure as Code: Manage infrastructure through scripts
Rapid Prototyping: Quickly test ideas and workflows
Cross-platform Compatibility: Run on any Unix-like system
Integration: Connect different tools and services
Monitoring: Create custom monitoring solutions
Backup & Recovery: Automate data protection strategies

The DevOps Scripting Workflow

Planning
Define requirements

→

Scripting
Write Bash script

→

Testing
Debug & validate

→

Automation
Schedule with cron

→

Monitoring
Logs & alerts

1. Introduction to Bash Scripting

What is Bash?

Bash (Bourne Again SHell): The default command-line interpreter for most Linux distributions and macOS. It's a powerful scripting language that allows you to automate system administration tasks, process data, and control program execution.

Your First Bash Script

                #!/bin/bash

                # hello.sh - My first Bash script

                # This script prints a greeting message

                echo "Hello, World!"

                echo "Today is: $(date)"

                echo "Current user: $USER"

                echo "Working directory: $(pwd)"

Script Execution Methods

Method 1: Using bash command

                        # Make script executable

                        $ chmod +x hello.sh

                        # Execute with bash

                        $ bash hello.sh

                        $ ./hello.sh  # Requires shebang and executable permission

                        # Output:

                        Hello, World!

                        Today is: Mon Dec  1 10:00:01 UTC 2025

                        Current user: alice

                        Working directory: /home/alice

Method 2: Source execution

                        # Source the script (runs in current shell)

                        $ source hello.sh

                        $ . hello.sh  # Shorthand for source

                        # Difference:

                        # ./script.sh - Runs in new shell

                        # source script.sh - Runs in current shell

                        # (Environment changes persist with source)

Method 3: Debug mode

                        # Debug with -x (trace execution)

                        $ bash -x hello.sh

                        # Output with debugging:

                        + echo 'Hello, World!'

                        Hello, World!

                        + date

                        + echo 'Today is: Mon Dec  1 10:00:01 UTC 2025'

                        Today is: Mon Dec  1 10:00:01 UTC 2025

                        # Debug inside script

                        set -x  # Enable debugging

                        # ... script commands ...

                        set +x  # Disable debugging

Essential Script Components

Shebang (#!)

                        # Always start scripts with shebang

                        #!/bin/bash - Most common

                        #!/usr/bin/env bash - More portable

                        #!/bin/sh - For POSIX shell scripts

                        #!/usr/bin/python - For Python scripts

                        # Why shebang matters:

                        1. Tells system which interpreter to use

                        2. Makes script self-contained

                        3. Allows direct execution (./script)

                        4. Ensures correct syntax highlighting

Comments & Documentation

                        # Single-line comments

                        # This is a comment

                        echo "Hello"  # Inline comment

                        # Multi-line comments (using here-doc)

                        : << 'COMMENT'

                        This is a multi-line comment.

                        It can span multiple lines.

                        Useful for function documentation.

                        COMMENT

                        # Script header template

                        #!/bin/bash

                        #

                        # Script: backup.sh

                        # Author: DevOps Team

                        # Date: 2025-12-01

                        # Description: Automated backup script

                        # Usage: ./backup.sh [options]

Complete Starter Script Template

                    #!/usr/bin/env bash

                    #

                    # template.sh - Bash script template for DevOps

                    #

                    # Features:

                    # - Proper error handling

                    # - Command-line arguments

                    # - Logging

                    # - Configuration

                    #

                    # Usage: ./template.sh [OPTIONS]

                    #

                    # ============================================

                    # CONFIGURATION

                    # ============================================

                    SCRIPT_NAME="$(basename "$0")"

                    SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

                    LOG_FILE="/var/log/${SCRIPT_NAME%.*}.log"

                    TEMP_DIR="/tmp/${SCRIPT_NAME%.*}_$(date +%s)"

                    # ============================================

                    # FUNCTIONS

                    # ============================================

                    log_message() {

                        local message="$1"

                        local level="${2:-INFO}"

                        local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"

                        echo "[$timestamp] [$level] $message" | tee -a "$LOG_FILE"

                    }

                    cleanup() {

                        log_message "Cleaning up temporary files"

                        rm -rf "$TEMP_DIR"

                    }

                    show_help() {

                        cat << EOF

                    Usage: $SCRIPT_NAME [OPTIONS]

                    Options:

                      -h, --help    Show this help message

                      -v, --verbose  Enable verbose output

                      -d, --dry-run  Simulate without making changes

                    Examples:

                      $SCRIPT_NAME --verbose

                      $SCRIPT_NAME --dry-run

                    EOF

                    }

                    # ============================================

                    # MAIN SCRIPT

                    # ============================================

                    # Set up error handling

                    set -euo pipefail

                    trap cleanup EXIT

                    trap 'log_message "Script interrupted" "ERROR"; exit 1' INT TERM

                    # Parse command-line arguments

                    VERBOSE=0

                    DRY_RUN=0

                    while [[ $# -gt 0 ]]; do

                        case "$1" in

                            -h|--help)

                                show_help

                                exit 0

                                ;;

                            -v|--verbose)

                                VERBOSE=1

                                shift

                                ;;

                            -d|--dry-run)

                                DRY_RUN=1

                                shift

                                ;;

                            *)

                                log_message "Unknown option: $1" "ERROR"

                                show_help

                                exit 1

                                ;;

                        esac

                    done

                    # Main script logic

                    log_message "Script started" "INFO"

                    if [[ $VERBOSE -eq 1 ]]; then

                        log_message "Verbose mode enabled" "DEBUG"

                    fi

                    if [[ $DRY_RUN -eq 1 ]]; then

                        log_message "Dry run mode - no changes will be made" "WARNING"

                    fi

                    # Create temporary directory

                    mkdir -p "$TEMP_DIR"

                    log_message "Created temp directory: $TEMP_DIR" "DEBUG"

                    # Your script logic goes here

                    log_message "Performing main task..." "INFO"

                    sleep 2

                    log_message "Script completed successfully" "INFO"

                    exit 0

2. Variables, Loops, and Conditionals

Variables in Bash

Variables: Named storage locations that hold data. In Bash, variables can store strings, numbers, arrays, and command outputs. Variable names are case-sensitive and should be descriptive.

Variable Declaration & Usage

                        # Basic variable assignment

                        name="Alice"

                        count=42

                        path="/home/alice/documents"

                        # Using variables

                        echo "Hello, $name"

                        echo "Count is: $count"

                        echo "Path: ${path}"  # Curly braces for clarity

                        # Command substitution

                        current_date="$(date)"

                        file_count="$(ls -1 | wc -l)"

                        kernel_version="$(uname -r)"

Special Variables

                        # Positional parameters

                        $0  # Script name

                        $1, $2, ...  # Arguments

                        $#  # Number of arguments

                        $@  # All arguments as separate words

                        $*  # All arguments as single word

                        # Special parameters

                        $$  # Current process ID

                        $!  # Last background process ID

                        $?  # Exit status of last command

                        # Example script with arguments

                        #!/bin/bash

                        echo "Script: $0"

                        echo "First argument: $1"

                        echo "Total arguments: $#"

                        echo "All arguments: $@"

Arrays

                        # Creating arrays

                        fruits=("apple" "banana" "cherry")

                        numbers=(1 2 3 4 5)

                        files=("$(ls)")

                        # Accessing array elements

                        echo "First fruit: ${fruits[0]}"

                        echo "All fruits: ${fruits[@]}"

                        echo "Number of fruits: ${#fruits[@]}"

                        # Adding to array

                        fruits+=("orange" "grape")

                        # Associative arrays (Bash 4+)

                        declare -A config

                        config["host"]="localhost"

                        config["port"]="8080"

                        config["user"]="admin"

                        echo "Host: ${config[host]}"

Conditionals (if/else/case)

If-Else Statements

                        # Basic if statement

                        if [[ -f "/etc/passwd" ]]; then

                            echo "File exists"

                        fi

                        # If-else statement

                        if [[ $USER == "root" ]]; then

                            echo "Running as root"

                        else

                            echo "Running as regular user"

                        fi

                        # If-elif-else statement

                        if [[ $1 -gt 100 ]]; then

                            echo "Greater than 100"

                        elif [[ $1 -gt 50 ]]; then

                            echo "Greater than 50"

                        else

                            echo "50 or less"

                        fi

File Test Operators

                        # File tests

                        [[ -e "file" ]]  # Exists

                        [[ -f "file" ]]  # Regular file

                        [[ -d "dir" ]]  # Directory

                        [[ -r "file" ]]  # Readable

                        [[ -w "file" ]]  # Writable

                        [[ -x "file" ]]  # Executable

                        [[ -s "file" ]]  # Not empty

                        # String tests

                        [[ -z "$str" ]]  # Empty string

                        [[ -n "$str" ]]  # Non-empty string

                        [[ "$str1" == "$str2" ]]  # Equal

                        [[ "$str1" != "$str2" ]]  # Not equal

                        # Numerical tests

                        [[ $a -eq $b ]]  # Equal

                        [[ $a -ne $b ]]  # Not equal

                        [[ $a -lt $b ]]  # Less than

                        [[ $a -le $b ]]  # Less or equal

                        [[ $a -gt $b ]]  # Greater than

                        [[ $a -ge $b ]]  # Greater or equal

Case Statements

                # Case statement syntax

                case "$1" in

                    start)

                        echo "Starting service..."

                        # start commands

                        ;;

                    stop)

                        echo "Stopping service..."

                        # stop commands

                        ;;

                    restart)

                        echo "Restarting service..."

                        # restart commands

                        ;;

                    status)

                        echo "Service status..."

                        # status commands

                        ;;

                    *)

                        echo "Usage: $0 {start|stop|restart|status}"

                        exit 1

                        ;;

                esac

Loops in Bash

For Loops

                        # Basic for loop

                        for i in 1 2 3 4 5; do

                            echo "Number: $i"

                        done

                        # For loop with range

                        for i in {1..10}; do

                            echo "Count: $i"

                        done

                        # For loop with step

                        for i in {0..100..10}; do

                            echo "Step: $i"

                        done

                        # For loop over array

                        fruits=("apple" "banana" "cherry")

                        for fruit in "${fruits[@]}"; do

                            echo "Fruit: $fruit"

                        done

While Loops

                        # Basic while loop

                        counter=1

                        while [[ $counter -le 5 ]]; do

                            echo "Counter: $counter"

                            ((counter++))

                        done

                        # While reading file

                        while IFS= read -r line; do

                            echo "Line: $line"

                        done < "input.txt"

                        # Infinite loop with break

                        while true; do

                            echo "Press Ctrl+C to exit"

                            sleep 1

                        done

Until Loops

                        # Until loop (opposite of while)

                        counter=1

                        until [[ $counter -gt 5 ]]; do

                            echo "Counter: $counter"

                            ((counter++))

                        done

                        # Wait for service to start

                        attempts=0

                        until curl -f http://localhost:8080 &>/dev/null; do

                            ((attempts++))

                            if [[ $attempts -ge 10 ]]; then

                                echo "Service failed to start"

                                exit 1

                            fi

                            echo "Waiting for service... (attempt $attempts)"

                            sleep 2

                        done

                        echo "Service is up!"

Practical Example: System Health Check Script

                    #!/bin/bash

                    # system_health.sh - Check system health metrics

                    # Configuration

                    THRESHOLD_CPU=80

                    THRESHOLD_MEM=85

                    THRESHOLD_DISK=90

                    LOG_FILE="/var/log/system_health.log"

                    # Functions

                    log_alert() {

                        local message="$1"

                        local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"

                        echo "[$timestamp] ALERT: $message" >> "$LOG_FILE"

                        echo "ALERT: $message"

                    }

                    check_cpu() {

                        local cpu_usage="$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)"

                        echo -n "CPU Usage: ${cpu_usage}% "

                        if [[ $(echo "$cpu_usage > $THRESHOLD_CPU" | bc) -eq 1 ]]; then

                            log_alert "High CPU usage: ${cpu_usage}%"

                        else

                            echo "[OK]"

                        fi

                    }

                    check_memory() {

                        local mem_total="$(free | grep Mem | awk '{print $2}')"

                        local mem_used="$(free | grep Mem | awk '{print $3}')"

                        local mem_percent="$(echo "scale=2; $mem_used * 100 / $mem_total" | bc)"

                        echo -n "Memory Usage: ${mem_percent}% "

                        if [[ $(echo "$mem_percent > $THRESHOLD_MEM" | bc) -eq 1 ]]; then

                            log_alert "High memory usage: ${mem_percent}%"

                        else

                            echo "[OK]"

                        fi

                    }

                    check_disk() {

                        df -h | grep -v tmpfs | tail -n +2 | while IFS= read -r line; do

                            local filesystem="$(echo "$line" | awk '{print $1}')"

                            local usage="$(echo "$line" | awk '{print $5}' | sed 's/%//')"

                            local mount="$(echo "$line" | awk '{print $6}')"

                            echo -n "Disk $filesystem ($mount): ${usage}% "

                            if [[ $usage -ge $THRESHOLD_DISK ]]; then

                                log_alert "High disk usage on $filesystem ($mount): ${usage}%"

                            else

                                echo "[OK]"

                            fi

                        done

                    }

                    check_services() {

                        local services=("ssh" "nginx" "postgresql")

                        for service in "${services[@]}"; do

                            if systemctl is-active --quiet "$service"; then

                                echo "Service $service: [RUNNING]"

                            else

                                log_alert "Service $service is not running"

                            fi

                        done

                    }

                    # Main execution

                    echo "=== System Health Check $(date) ==="

                    echo

                    check_cpu

                    check_memory

                    check_disk

                    check_services

                    echo

                    echo "=== Check Complete ==="

3. Functions & Arguments

What are Functions?

Functions: Reusable blocks of code that perform specific tasks. Functions help organize scripts, avoid code duplication, and make scripts more readable and maintainable.

Function Syntax and Usage

Basic Function Definition

                        # Method 1: Traditional syntax

                        function_name() {

                            # Function body

                            echo "This is a function"

                        }

                        # Method 2: Using 'function' keyword

                        function function_name {

                            echo "This is also a function"

                        }

                        # Calling a function

                        function_name

                        # Function with parameters

                        greet() {

                            local name="$1"

                            local time="$2"

                            echo "Good $time, $name!"

                        }

                        greet "Alice" "morning"

Return Values

                        # Functions return exit codes (0-255)

                        is_even() {

                            local number="$1"

                            if (( number % 2 == 0 )); then

                                return 0  # Success (true)

                            else

                                return 1  # Failure (false)

                            fi

                        }

                        is_even 42

                        if [[ $? -eq 0 ]]; then

                            echo "Number is even"

                        else

                            echo "Number is odd"

                        fi

                        # Return data via stdout

                        get_timestamp() {

                            date '+%Y-%m-%d %H:%M:%S'

                        }

                        timestamp="$(get_timestamp)"

                        echo "Current time: $timestamp"

Variable Scope

                        # Global variables (accessible everywhere)

                        GLOBAL_VAR="I'm global"

                        demo_scope() {

                            # Local variables (only in function)

                            local LOCAL_VAR="I'm local"

                            echo "Inside function:"

                            echo "  Global: $GLOBAL_VAR"

                            echo "  Local: $LOCAL_VAR"

                            # Modify global variable

                            GLOBAL_VAR="Modified inside function"

                        }

                        echo "Before function: $GLOBAL_VAR"

                        demo_scope

                        echo "After function: $GLOBAL_VAR"

                        echo "Trying to access LOCAL_VAR outside: ${LOCAL_VAR:-undefined}"

Advanced Argument Handling

Shift Operator

                        # Process arguments one by one

                        process_args() {

                            echo "Total arguments: $#"

                            while [[ $# -gt 0 ]]; do

                                echo "Argument: $1"

                                shift  # Move to next argument

                            done

                        }

                        process_args "apple" "banana" "cherry"

                        # Output:

                        Total arguments: 3

                        Argument: apple

                        Argument: banana

                        Argument: cherry

getopts for Option Parsing

                        # Professional option parsing

                        parse_options() {

                            local OPTIND  # Reset getopts

                            local verbose=0

                            local output_file=""

                            while getopts ":ho:v" opt; do

                                case "$opt" in

                                    h)

                                        show_help

                                        exit 0

                                        ;;

                                    v)

                                        verbose=1

                                        ;;

                                    o)

                                        output_file="$OPTARG"

                                        ;;

                                    \?)

                                        echo "Invalid option: -$OPTARG" >&2

                                        exit 1

                                        ;;

                                    :)

                                        echo "Option -$OPTARG requires an argument" >&2

                                        exit 1

                                        ;;

                                esac

                            done

                            shift $((OPTIND-1))  # Remove processed options

                            # Remaining arguments are positional

                            for arg in "$@"; do

                                echo "Positional argument: $arg"

                            done

                            echo "Verbose: $verbose"

                            echo "Output file: ${output_file:-not set}"

                        }

                        parse_options "$@"

Practical Function Examples for DevOps

Logging Function

                        log() {

                            local level="$1"

                            local message="$2"

                            local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"

                            local log_file="${3:-/var/log/script.log}"

                            case "$level" in

                                INFO) local color="\033[0;32m" ;;  # Green

                                WARN) local color="\033[0;33m" ;;  # Yellow

                                ERROR) local color="\033[0;31m" ;; # Red

                                DEBUG) local color="\033[0;36m" ;; # Cyan

                                *) local color="\033[0m" ;;    # Default

                            esac

                            local reset="\033[0m"

                            # Print to console with color

                            echo -e "${color}[$timestamp] [$level] $message${reset}"

                            # Write to log file without color codes

                            echo "[$timestamp] [$level] $message" >> "$log_file"

                        }

                        # Usage:

                        log "INFO" "Script started successfully"

                        log "WARN" "Disk space is running low"

                        log "ERROR" "Failed to connect to database"

Validation Functions

                        validate_ip() {

                            local ip="$1"

                            local stat=1

                            if [[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then

                                OIFS=$IFS

                                IFS='.'

                                ip=($ip)

                                IFS=$OIFS

                                if [[ ${ip[0]} -le 255 ]] && [[ ${ip[1]} -le 255 ]] && \

                                        [[ ${ip[2]} -le 255 ]] && [[ ${ip[3]} -le 255 ]]; then

                                    stat=0

                                fi

                            fi

                            return $stat

                        }

                        validate_email() {

                            local email="$1"

                            local regex="^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"

                            if [[ $email =~ $regex ]]; then

                                return 0

                            else

                                return 1

                            fi

                        }

                        # Usage:

                        validate_ip "192.168.1.1"

                        if [[ $? -eq 0 ]]; then

                            echo "Valid IP address"

                        else

                            echo "Invalid IP address"

                        fi

Utility Functions

                        is_root() {

                            [[ $EUID -eq 0 ]] && return 0 || return 1

                        }

                        check_dependencies() {

                            local deps=("$@")

                            local missing=()

                            for dep in "${deps[@]}"; do

                                if ! command -v "$dep" &>/dev/null; then

                                    missing+=("$dep")

                                fi

                            done

                            if [[ ${#missing[@]} -gt 0 ]]; then

                                echo "Missing dependencies: ${missing[*]}" >&2

                                return 1

                            fi

                            return 0

                        }

                        send_notification() {

                            local subject="$1"

                            local message="$2"

                            local recipient="${3:-admin@example.com}"

                            if command -v mail &>/dev/null; then

                                echo "$message" | mail -s "$subject" "$recipient"

                            elif command -v curl &>/dev/null; then

                                # Send to webhook

                                curl -X POST -H "Content-Type: application/json" \

                                    -d "{\"text\":\"$subject: $message\"}" \

                                    https://hooks.slack.com/services/...

                            else

                                echo "Notification: $subject - $message" >&2

                            fi

                        }

Complete Function Library for DevOps

                    #!/bin/bash

                    # functions.sh - DevOps utility function library

                    # Load this library in your scripts:

                    # source functions.sh

                    # ==================== ERROR HANDLING ====================

                    die() {

                        local msg="$1"

                        local code="${2:-1}"

                        log "ERROR" "$msg"

                        exit "$code"

                    }

                    check_error() {

                        local exit_code="$?"

                        local message="$1"

                        if [[ $exit_code -ne 0 ]]; then

                            die "$message (exit code: $exit_code)" "$exit_code"

                        fi

                    }

                    # ==================== SYSTEM CHECKS ====================

                    require_root() {

                        is_root || die "This script must be run as root"

                    }

                    check_disk_space() {

                        local path="${1:-/}"

                        local threshold="${2:-90}"

                        local usage="$(df "$path" | tail -1 | awk '{print $5}' | sed 's/%//')"

                        if [[ $usage -ge $threshold ]]; then

                            log "WARN" "Disk usage on $path is ${usage}% (threshold: ${threshold}%)"

                            return 1

                        fi

                        return 0

                    }

                    # ==================== FILE OPERATIONS ====================

                    backup_file() {

                        local file="$1"

                        local backup_dir="${2:-/backups}"

                        if [[ ! -f "$file" ]]; then

                            log "ERROR" "File not found: $file"

                            return 1

                        fi

                        mkdir -p "$backup_dir"

                        local timestamp="$(date '+%Y%m%d_%H%M%S')"

                        local backup_file="$backup_dir/$(basename "$file").$timestamp.bak"

                        cp -p "$file" "$backup_file"

                        check_error "Failed to backup $file"

                        log "INFO" "Backed up $file to $backup_file"

                        echo "$backup_file"

                    }

                    # ==================== NETWORK OPERATIONS ====================

                    wait_for_port() {

                        local host="$1"

                        local port="$2"

                        local timeout="${3:-30}"

                        local interval="${4:-2}"

                        local start_time="$(date +%s)"

                        local end_time="$((start_time + timeout))"

                        while [[ $(date +%s) -lt $end_time ]]; do

                            if timeout 1 bash -c "cat < /dev/null > /dev/tcp/$host/$port" 2>/dev/null; then

                                log "INFO" "Port $port on $host is open"

                                return 0

                            fi

                            sleep "$interval"

                        done

                        log "ERROR" "Timeout waiting for port $port on $host"

                        return 1

                    }

                    # ==================== MAIN GUARD ====================

                    if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

                        echo "This is a library file. Source it in your scripts."

                        exit 1

                    fi

4. Error Handling in Shell Scripts

Why Error Handling Matters

Error Handling: The process of anticipating, detecting, and resolving programming, application, or communication errors. Proper error handling makes scripts robust, reliable, and easier to debug.

❌ Bad Practice: No Error Handling

                    #!/bin/bash

                    # What could go wrong?

                    cd /nonexistent/directory

                    rm -rf *.log

                    echo "Cleanup complete!"

Problems: Script continues after errors, misleading success message, potential data loss.

✅ Good Practice: Proper Error Handling

                    #!/bin/bash

                    # Safe version with error handling

                    set -euo pipefail

                    cleanup() {

                        log "ERROR" "Script failed. Cleaning up..."

                        # Add cleanup logic here

                    }

                    trap cleanup ERR

                    trap 'cleanup; exit 1' INT TERM

                    if ! cd /nonexistent/directory; then

                        die "Directory doesn't exist"

                    fi

                    if ! rm -rf *.log; then

                        die "Failed to remove log files"

                    fi

                    echo "Cleanup complete!"

Benefits: Stops on errors, provides clear error messages, cleans up resources, handles interrupts.

Error Handling Techniques

Exit Codes

                        # Every command returns an exit code

                        # 0 = Success, 1-255 = Failure

                        ls /tmp

                        echo "Exit code: $?"  # Usually 0

                        ls /nonexistent

                        echo "Exit code: $?"  # Usually 2

                        # Check exit code

                        if ls /tmp &>/dev/null; then

                            echo "Command succeeded"

                        else

                            echo "Command failed"

                        fi

                        # Exit with custom code

                        validate_input() {

                            if [[ -z "$1" ]]; then

                                echo "Error: Input required" >&2

                                exit 100

                            fi

                        }

set Options

                        # set -e: Exit immediately on error

                        set -e

                        false  # Script exits here

                        echo "This line won't execute"

                        # set -u: Treat unset variables as error

                        set -u

                        echo "$UNDEFINED_VAR"  # Script exits

                        # set -o pipefail: Pipeline fails if any command fails

                        set -o pipefail

                        false | true

                        echo "Exit code: $?"  # 1 (not 0)

                        # Recommended combination

                        set -euo pipefail

                        # Temporarily disable

                        set +e  # Disable exit on error

                        false

                        echo "This executes even though false failed"

                        set -e  # Re-enable

trap Command

                        # Trap errors

                        cleanup() {

                            echo "Cleaning up..."

                            rm -f /tmp/tempfile.$$

                        }

                        trap cleanup EXIT  # Run on script exit

                        trap cleanup ERR   # Run on error

                        trap 'echo "Interrupted"; exit 1' INT  # Ctrl+C

                        # Multiple traps

                        handle_error() {

                            local line="$1"

                            local cmd="$2"

                            local code="${3:-0}"

                            echo "Error on line $line: $cmd (exit code: $code)" >&2

                        }

                        trap 'handle_error $LINENO "$BASH_COMMAND" $?' ERR

Error Recovery Strategies

Retry Logic

                        retry_command() {

                            local cmd="$1"

                            local max_attempts="${2:-3}"

                            local delay="${3:-2}"

                            local attempt=1

                            while [[ $attempt -le $max_attempts ]]; do

                                if eval "$cmd"; then

                                    return 0

                                fi

                                log "WARN" "Attempt $attempt failed. Retrying in ${delay}s..."

                                sleep "$delay"

                                ((attempt++))

                            done

                            log "ERROR" "Command failed after $max_attempts attempts: $cmd"

                            return 1

                        }

                        # Usage:

                        retry_command "curl -f http://example.com" 5 3

Fallback Strategy

                        download_with_fallback() {

                            local url="$1"

                            local output="$2"

                            local fallback_url="$3"

                            log "INFO" "Downloading from primary source: $url"

                            if curl -f -L -o "$output" "$url"; then

                                log "INFO" "Download successful from primary source"

                                return 0

                            fi

                            log "WARN" "Primary download failed. Trying fallback: $fallback_url"

                            if curl -f -L -o "$output" "$fallback_url"; then

                                log "INFO" "Download successful from fallback"

                                return 0

                            fi

                            die "Failed to download from all sources"

                        }

Complete Error Handling Framework

                    #!/bin/bash

                    # error_framework.sh - Comprehensive error handling

                    # ==================== CONFIGURATION ====================

                    LOG_FILE="/var/log/script_errors.log"

                    MAX_RETRIES=3

                    RETRY_DELAY=2

                    # ==================== ERROR FUNCTIONS ====================

                    init_error_handling() {

                        set -euo pipefail

                        trap 'handle_error $LINENO "$BASH_COMMAND" $?' ERR

                        trap 'handle_exit' EXIT

                        trap 'handle_interrupt' INT TERM

                    }

                    handle_error() {

                        local line="$1"

                        local command="$2"

                        local exit_code="$3"

                        local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"

                        local error_msg="[$timestamp] ERROR: Line $line: $command (exit code: $exit_code)"

                        # Log to file

                        echo "$error_msg" >> "$LOG_FILE"

                        # Print to stderr

                        echo -e "\033[0;31m$error_msg\033[0m" >&2

                        # Optional: Send alert

                        send_alert "Script Error" "$error_msg"

                    }

                    handle_exit() {

                        local exit_code="$?"

                        if [[ $exit_code -eq 0 ]]; then

                            log "INFO" "Script completed successfully"

                        else

                            log "ERROR" "Script failed with exit code: $exit_code"

                        fi

                        # Cleanup resources

                        cleanup_resources

                    }

                    handle_interrupt() {

                        log "WARN" "Script interrupted by user"

                        cleanup_resources

                        exit 1

                    }

                    safe_execute() {

                        local cmd="$1"

                        local error_msg="${2:-Failed to execute: $cmd}"

                        if ! eval "$cmd"; then

                            die "$error_msg"

                        fi

                    }

                    with_retry() {

                        local cmd="$1"

                        local max_retries="${2:-$MAX_RETRIES}"

                        local delay="${3:-$RETRY_DELAY}"

                        local attempt=1

                        while true; do

                            if eval "$cmd"; then

                                return 0

                            fi

                            if [[ $attempt -ge $max_retries ]]; then

                                die "Command failed after $max_retries attempts: $cmd"

                            fi

                            log "WARN" "Attempt $attempt failed. Retrying in ${delay}s..."

                            sleep "$delay"

                            ((attempt++))

                        done

                    }

                    # ==================== USAGE EXAMPLE ====================

                    main() {

                        init_error_handling

                        log "INFO" "Starting script execution"

                        # Safe file operations

                        safe_execute "mkdir -p /tmp/test" "Failed to create directory"

                        # With retry logic

                        with_retry "curl -f http://example.com" 5 3

                        # More operations...

                        safe_execute "touch /tmp/test/file.txt"

                        log "INFO" "Script completed successfully"

                    }

                    main "$@"

5. Automating System Tasks

The Power of Automation

System Automation: Using scripts to perform routine system administration tasks automatically, reducing human error, saving time, and ensuring consistency across environments.

Common Automation Tasks

User Management

                        #!/bin/bash

                        # user_management.sh - Automate user administration

                        create_user() {

                            local username="$1"

                            local ssh_key="$2"

                            # Check if user exists

                            if id "$username" &>/dev/null; then

                                log "WARN" "User $username already exists"

                                return 0

                            fi

                            # Create user with home directory

                            useradd -m -s /bin/bash "$username"

                            check_error "Failed to create user $username"

                            # Set up SSH key

                            local ssh_dir="/home/$username/.ssh"

                            mkdir -p "$ssh_dir"

                            echo "$ssh_key" >> "$ssh_dir/authorized_keys"

                            chmod 700 "$ssh_dir"

                            chmod 600 "$ssh_dir/authorized_keys"

                            chown -R "$username:$username" "$ssh_dir"

                            # Add to sudoers (optional)

                            echo "$username ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/"$username"

                            chmod 440 /etc/sudoers.d/"$username"

                            log "INFO" "User $username created successfully"

                        }

                        # Usage: create_user "alice" "ssh-rsa AAAAB3NzaC1yc2E..."

Package Management

                        #!/bin/bash

                        # package_manager.sh - Cross-distro package management

                        install_packages() {

                            local packages=("$@")

                            if command -v apt-get &>/dev/null; then

                                # Debian/Ubuntu

                                apt-get update

                                apt-get install -y "${packages[@]}"

                            elif command -v yum &>/dev/null; then

                                # RHEL/CentOS

                                yum install -y "${packages[@]}"

                            elif command -v dnf &>/dev/null; then

                                # Fedora

                                dnf install -y "${packages[@]}"

                            elif command -v zypper &>/dev/null; then

                                # openSUSE

                                zypper install -y "${packages[@]}"

                            else

                                die "Unsupported package manager"

                            fi

                        }

                        update_system() {

                            if command -v apt-get &>/dev/null; then

                                apt-get update && apt-get upgrade -y

                            elif command -v yum &>/dev/null; then

                                yum update -y

                            elif command -v dnf &>/dev/null; then

                                dnf update -y

                            fi

                        }

                        # Usage: install_packages nginx postgresql redis

Service Management

                        #!/bin/bash

                        # service_manager.sh - Manage system services

                        ensure_service_running() {

                            local service="$1"

                            if systemctl is-active --quiet "$service"; then

                                log "INFO" "Service $service is already running"

                            else

                                log "INFO" "Starting service $service"

                                systemctl start "$service"

                            fi

                            # Enable on boot

                            systemctl enable "$service" &>/dev/null

                        }

                        restart_service() {

                            local service="$1"

                            log "INFO" "Restarting service $service"

                            systemctl restart "$service"

                            check_error "Failed to restart $service"

                        }

                        get_service_status() {

                            local service="$1"

                            if systemctl is-active --quiet "$service"; then

                                echo "running"

                            else

                                echo "stopped"

                            fi

                        }

Advanced Automation Examples

Log Rotation Automation

                        #!/bin/bash

                        # log_rotator.sh - Automate log rotation

                        LOG_DIR="/var/log/app"

                        MAX_DAYS=30

                        MAX_SIZE="100M"

                        rotate_logs() {

                            local log_dir="${1:-$LOG_DIR}"

                            find "$log_dir" -name "*.log" -type f | while IFS= read -r logfile; do

                                # Check file size

                                local size="$(stat -c%s "$logfile")"

                                local max_bytes="$(echo "$MAX_SIZE" | sed 's/M/*1024*1024/;s/K/*1024/;s/G/*1024*1024*1024/' | bc)"

                                if [[ $size -gt $max_bytes ]]; then

                                    rotate_single_log "$logfile"

                                fi

                            done

                            # Clean old backups

                            find "$log_dir" -name "*.log.*" -type f -mtime +$MAX_DAYS -delete

                        }

                        rotate_single_log() {

                            local logfile="$1"

                            local timestamp="$(date '+%Y%m%d_%H%M%S')"

                            local backup_file="${logfile}.${timestamp}"

                            log "INFO" "Rotating $logfile to $backup_file"

                            # Copy and truncate

                            cp "$logfile" "$backup_file"

                            if [[ $? -eq 0 ]]; then

                                : > "$logfile"  # Truncate original

                            fi

                            # Compress backup

                            gzip "$backup_file"

                        }

Configuration Management

                        #!/bin/bash

                        # config_manager.sh - Manage configuration files

                        deploy_config() {

                            local src="$1"

                            local dest="$2"

                            local backup="${3:-yes}"

                            if [[ ! -f "$src" ]]; then

                                die "Source file not found: $src"

                            fi

                            # Backup existing file

                            if [[ -f "$dest" ]] && [[ "$backup" == "yes" ]]; then

                                local backup_file="${dest}.backup.$(date +%s)"

                                cp -p "$dest" "$backup_file"

                                log "INFO" "Backed up $dest to $backup_file"

                            fi

                            # Deploy new config

                            cp -p "$src" "$dest"

                            check_error "Failed to deploy config to $dest"

                            # Set permissions

                            chmod 644 "$dest"

                            log "INFO" "Configuration deployed to $dest"

                        }

                        validate_config() {

                            local config_file="$1"

                            local service="$2"

                            case "$service" in

                                nginx)

                                    nginx -t -c "$config_file"

                                    ;;

                                apache2|httpd)

                                    apachectl configtest -f "$config_file"

                                    ;;

                                postgresql)

                                    sudo -u postgres pg_ctl -D /var/lib/postgresql/data -t 10 status

                                    ;;

                                *)

                                    log "WARN" "No validation available for $service"

                                    ;;

                            esac

                        }

Complete System Provisioning Script

                    #!/bin/bash

                    # provision_server.sh - Complete server provisioning

                    # ==================== CONFIGURATION ====================

                    SSH_PORT=2222

                    ADMIN_USER="devops"

                    ADMIN_SSH_KEY="ssh-rsa AAAAB3NzaC1yc2E... devops@company.com"

                    PACKAGES=("nginx" "postgresql" "redis" "python3" "git" "curl" "fail2ban" "ufw")

                    # ==================== MAIN PROVISIONING ====================

                    provision_server() {

                        require_root

                        init_error_handling

                        log "INFO" "Starting server provisioning"

                        # Step 1: System update

                        log "INFO" "Updating system packages"

                        update_system

                        # Step 2: Install packages

                        log "INFO" "Installing required packages"

                        install_packages "${PACKAGES[@]}"

                        # Step 3: Create admin user

                        log "INFO" "Creating admin user: $ADMIN_USER"

                        create_user "$ADMIN_USER" "$ADMIN_SSH_KEY"

                        # Step 4: Secure SSH

                        log "INFO" "Configuring SSH security"

                        secure_ssh "$SSH_PORT" "$ADMIN_USER"

                        # Step 5: Configure firewall

                        log "INFO" "Configuring firewall"

                        setup_firewall "$SSH_PORT"

                        # Step 6: Configure services

                        log "INFO" "Configuring services"

                        configure_nginx

                        configure_postgresql

                        configure_redis

                        # Step 7: Enable monitoring

                        log "INFO" "Setting up monitoring"

                        setup_monitoring

                        # Step 8: Final cleanup

                        log "INFO" "Cleaning up temporary files"

                        apt-get autoremove -y

                        apt-get clean

                        log "SUCCESS" "Server provisioning completed successfully!"

                        echo "=========================================="

                        echo "Provisioning Summary:"

                        echo "- SSH Port: $SSH_PORT"

                        echo "- Admin User: $ADMIN_USER"

                        echo "- Installed Packages: ${PACKAGES[*]}"

                        echo "=========================================="

                    }

                    # Helper functions (simplified for example)

                    secure_ssh() {

                        local port="$1"

                        local admin_user="$2"

                        # Backup original config

                        cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup

                        # Apply secure configuration

                        cat > /etc/ssh/sshd_config << EOF

                    Port $port

                    PermitRootLogin no

                    PasswordAuthentication no

                    PubkeyAuthentication yes

                    AllowUsers $admin_user

                    X11Forwarding no

                    ClientAliveInterval 300

                    ClientAliveCountMax 2

                    EOF

                        systemctl restart sshd

                    }

                    setup_firewall() {

                        local ssh_port="$1"

                        ufw --force reset

                        ufw default deny incoming

                        ufw default allow outgoing

                        ufw allow "$ssh_port/tcp"

                        ufw allow 80/tcp

                        ufw allow 443/tcp

                        ufw --force enable

                    }

                    # Run provisioning

                    provision_server

6. Script Scheduling with Cron

What is Cron?

Cron: A time-based job scheduler in Unix-like operating systems. Cron enables users to schedule jobs (commands or scripts) to run periodically at fixed times, dates, or intervals.

Cron Syntax Explained

Cron Time Format

Minute
0-59

Hour
0-23

Day of Month
1-31

Month
1-12

Day of Week
0-7 (0=Sun, 7=Sun)

Special Characters:
* = Any value
, = Value list separator (1,3,5)
- = Range of values (1-5)
/ = Step values (*/2 = every 2 units)
? = No specific value (only in some cron)
L = Last day of month/week
W = Nearest weekday

Common Cron Examples

Basic Scheduling

                        # Run every day at 2:30 AM

                        30 2 * * * /path/to/script.sh

                        # Run every hour at minute 0

                        0 * * * * /path/to/script.sh

                        # Run every 15 minutes

                        */15 * * * * /path/to/script.sh

                        # Run every Monday at 3:15 PM

                        15 15 * * 1 /path/to/script.sh

                        # Run on the 1st of every month at 6:00 AM

                        0 6 1 * * /path/to/script.sh

                        # Run every weekday at 9:00 AM

                        0 9 * * 1-5 /path/to/script.sh

Advanced Scheduling

                        # Run every 2 hours during business hours (9 AM - 5 PM)

                        0 9-17/2 * * 1-5 /path/to/script.sh

                        # Run on specific days of month (1st and 15th)

                        0 8 1,15 * * /path/to/script.sh

                        # Run on last day of month at midnight

                        0 0 L * * /path/to/script.sh

                        # Run every 10 minutes during work hours on weekdays

                        */10 9-17 * * 1-5 /path/to/script.sh

                        # Run at 5 PM on the last Friday of every month

                        0 17 * * 5L /path/to/script.sh

                        # Run on January 1st at 00:01

                        1 0 1 1 * /path/to/script.sh

Special Strings

                        # @reboot - Run at startup

                        @reboot /path/to/script.sh

                        # @yearly or @annually - Run once a year

                        @yearly /path/to/script.sh

                        # Same as: 0 0 1 1 *

                        # @monthly - Run once a month

                        @monthly /path/to/script.sh

                        # Same as: 0 0 1 * *

                        # @weekly - Run once a week

                        @weekly /path/to/script.sh

                        # Same as: 0 0 * * 0

                        # @daily or @midnight - Run once a day

                        @daily /path/to/script.sh

                        # Same as: 0 0 * * *

                        # @hourly - Run once an hour

                        @hourly /path/to/script.sh

                        # Same as: 0 * * * *

                        # @every_minute - Run every minute (custom)

                        * * * * * /path/to/script.sh

Cron Management Commands

User Crontabs

                        # Edit your personal crontab

                        $ crontab -e

                        # List your cron jobs

                        $ crontab -l

                        # Remove all your cron jobs

                        $ crontab -r

                        # Load cron jobs from file

                        $ crontab cronjobs.txt

                        # View other user's crontab (root only)

                        $ sudo crontab -u username -l

                        # Edit system crontab

                        $ sudo crontab -e

System Crontabs

                        # System-wide crontab files

                        /etc/crontab

                        /etc/cron.d/  # Directory for package cron jobs

                        /etc/cron.hourly/

                        /etc/cron.daily/

                        /etc/cron.weekly/

                        /etc/cron.monthly/

                        # /etc/crontab format (includes user field)

                        # Example:

                        # m h dom mon dow user  command

                        17 *    * * *   root   cd / && run-parts --report /etc/cron.hourly

                        25 6    * * *   root   test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )

                        47 6    * * 7   root   test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )

                        52 6    1 * *   root   test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

                        # Create a system cron file

                        $ sudo nano /etc/cron.d/myapp

                        # Add: * * * * * root /path/to/script.sh

Cron Best Practices

Script Design for Cron

                        #!/bin/bash

                        # cron_script.sh - Designed for cron execution

                        # Set PATH explicitly

                        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

                        # Set working directory

                        cd /path/to/working/directory || exit 1

                        # Logging setup

                        LOG_FILE=/var/log/myscript.log

                        exec >> "$LOG_FILE" 2>&1

                        echo "=== $(date) - Script started ==="

                        # Main script logic

                        /path/to/command --option

                        echo "=== $(date) - Script completed ==="

Cron Environment

                        # Cron runs with minimal environment

                        # Common issues and solutions:

                        # 1. PATH may not include what you need

                        # Solution: Set PATH at top of script

                        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

                        # 2. No interactive shell environment

                        # Solution: Source necessary files

                        . /etc/profile

                        . ~/.bash_profile

                        # 3. No terminal, so no $DISPLAY

                        # Solution: Don't use GUI commands

                        # Or set DISPLAY if needed

                        DISPLAY=:0

                        # 4. Working directory is user's home

                        # Solution: cd to script directory

                        cd "$(dirname "$0")" || exit 1

                        # 5. Mail is sent to user by default

                        # Solution: Redirect output or set MAILTO

                        MAILTO=""  # Disable email

                        MAILTO="admin@example.com"  # Send to specific address

Preventing Overlap

                        # Use lock files to prevent overlapping runs

                        acquire_lock() {

                            local lockfile="/tmp/$(basename "$0").lock"

                            if [[ -f "$lockfile" ]]; then

                                local pid="$(cat "$lockfile")"

                                if kill -0 "$pid" 2>/dev/null; then

                                    echo "Script is already running (PID: $pid)" >&2

                                    exit 1

                                fi

                            fi

                            echo $$ > "$lockfile"

                            trap 'rm -f "$lockfile"' EXIT

                        }

                        # Use flock command (more robust)

                        run_with_lock() {

                            local lockfile="/var/lock/$(basename "$0").lock"

                            (

                                flock -n 9 || exit 1

                                # Your script commands here

                                echo "Running exclusive code"

                                sleep 10

                            ) 9>"$lockfile"

                        }

Complete Cron Job Management System

                    #!/bin/bash

                    # cron_manager.sh - Manage and monitor cron jobs

                    CRON_DIR="/etc/cron.d"

                    BACKUP_DIR="/backup/cron"

                    LOG_DIR="/var/log/cronjobs"

                    add_cron_job() {

                        local name="$1"

                        local schedule="$2"

                        local command="$3"

                        local user="${4:-root}"

                        local log_file="${LOG_DIR}/${name}.log"

                        # Create log directory

                        mkdir -p "$LOG_DIR"

                        # Build cron entry with logging

                        local cron_entry="$schedule $user $command >> $log_file 2>&1"

                        # Write to cron file

                        echo "$cron_entry" > "$CRON_DIR/$name"

                        log "INFO" "Added cron job: $name"

                    }

                    remove_cron_job() {

                        local name="$1"

                        if [[ -f "$CRON_DIR/$name" ]]; then

                            # Backup before removal

                            mkdir -p "$BACKUP_DIR"

                            cp "$CRON_DIR/$name" "$BACKUP_DIR/${name}.backup.$(date +%s)"

                            rm -f "$CRON_DIR/$name"

                            log "INFO" "Removed cron job: $name"

                        else

                            log "WARN" "Cron job not found: $name"

                        fi

                    }

                    list_cron_jobs() {

                        echo "=== System Cron Jobs ==="

                        echo

                        # List system crontab

                        if [[ -f /etc/crontab ]]; then

                            echo "/etc/crontab:"

                            grep -v '^#' /etc/crontab | grep -v '^$'

                            echo

                        fi

                        # List cron.d files

                        if [[ -d "$CRON_DIR" ]]; then

                            echo "/etc/cron.d/ files:"

                            for file in "$CRON_DIR"/*; do

                                if [[ -f "$file" ]]; then

                                    echo "$(basename "$file"):"

                                    grep -v '^#' "$file" | grep -v '^$'

                                    echo

                                fi

                            done

                        fi

                    }

                    monitor_cron_logs() {

                        local days="${1:-7}"

                        echo "=== Cron Job Execution Report (last $days days) ==="

                        echo

                        # Check system cron logs

                        if [[ -f /var/log/syslog ]]; then

                            echo "System cron executions:"

                            grep -i "CRON" /var/log/syslog | grep "$(date --date="-$days days" +'%b %d')" | tail -20

                            echo

                        fi

                        # Check custom cron logs

                        if [[ -d "$LOG_DIR" ]]; then

                            echo "Custom cron job logs:"

                            find "$LOG_DIR" -name "*.log" -type f -mtime -"$days" | while IFS= read -r logfile; do

                                local job_name="$(basename "$logfile" .log)"

                                local last_run="$(tail -1 "$logfile" 2>/dev/null | grep -o '[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}' | tail -1)"

                                local status="OK"

                                if tail -10 "$logfile" 2>/dev/null | grep -qi "error\|failed"; then

                                    status="ERROR"

                                fi

                                printf "%-20s %-25s %s\n" "$job_name" "${last_run:-never}" "$status"

                            done

                        fi

                    }

                    # Usage examples

                    case "$1" in

                        add)

                            add_cron_job "$2" "$3" "$4" "$5"

                            ;;

                        remove)

                            remove_cron_job "$2"

                            ;;

                        list)

                            list_cron_jobs

                            ;;

                        monitor)

                            monitor_cron_logs "$2"

                            ;;

                        *)

                            echo "Usage: $0 {add|remove|list|monitor}"

                            exit 1

                            ;;

                    esac

Cron Best Practices Checklist

✅ Use absolute paths - Cron has minimal PATH
✅ Set working directory - Use cd at script start
✅ Implement logging - Redirect output to log files
✅ Handle errors - Check exit codes in scripts
✅ Prevent overlaps - Use lock files or flock
✅ Test scripts manually - Before adding to cron
✅ Use descriptive names - For cron files and logs

✅ Monitor execution - Regularly check cron logs

✅ Backup crontabs - Before making changes

✅ Consider alternatives - systemd timers for complex scheduling

7. Real-world Script Examples

Backup Script

Complete Automated Backup System

#!/bin/bash
# backup_manager.sh - Enterprise-grade backup solution

# ==================== CONFIGURATION ====================
BACKUP_ROOT="/backups"
RETENTION_DAYS=30
COMPRESSION="gzip" # gzip, bzip2, xz, or none
ENCRYPTION_KEY="" # Set for encryption (optional)
REMOTE_HOST="backup-server"
REMOTE_PATH="/remote/backups"

# Backup sources (directory: exclude_patterns)
declare -A BACKUP_SOURCES=(
    ["/etc"]="*.tmp,*.log"
    ["/home"]="*.mp3,*.mp4,*.iso"
    ["/var/www"]="cache,*.log"
    ["/opt/app"]="temp,logs"
)

# Database backups (type: credentials)
declare -A DATABASES=(
    ["mysql"]="root:password"
    ["postgres"]="postgres"
    ["mongodb"]=""
)

# ==================== FUNCTIONS ====================

init_backup() {
    local backup_type="$1"
    local timestamp="$(date '+%Y%m%d_%H%M%S')"
    local backup_dir="$BACKUP_ROOT/$backup_type/$timestamp"

    mkdir -p "$backup_dir"
    check_error "Failed to create backup directory: $backup_dir"

    log "INFO" "Starting $backup_type backup to $backup_dir"
    echo "$backup_dir"
}

backup_filesystem() {
    local backup_dir="$1"

    for source_dir in "${!BACKUP_SOURCES[@]}"; do
        local exclude_patterns="${BACKUP_SOURCES[$source_dir]}"
        local backup_name="$(basename "$source_dir")"
        local output_file="$backup_dir/${backup_name}.tar"

        log "INFO" "Backing up $source_dir"

        # Build exclude options
        local exclude_opts=""
        IFS=',' read -ra patterns <<< "$exclude_patterns"
        for pattern in "${patterns[@]}"; do
            exclude_opts+=" --exclude='$pattern'"
        done

        # Create tar archive
        eval "tar cf $output_file $exclude_opts $source_dir 2>/dev/null"
        check_error "Failed to backup $source_dir"

        # Compress if requested
        compress_file "$output_file"
    done
}

backup_databases() {
    local backup_dir="$1"

    for db_type in "${!DATABASES[@]}"; do
        local credentials="${DATABASES[$db_type]}"

        case "$db_type" in
            mysql)
                backup_mysql "$backup_dir" "$credentials"
                ;;
            postgres)
                backup_postgres "$backup_dir" "$credentials"
                ;;
            mongodb)
                backup_mongodb "$backup_dir"
                ;;
        esac
    done
}

backup_mysql() {
    local backup_dir="$1"
    local credentials="$2"

    if ! command -v mysqldump &>/dev/null; then
        log "WARN" "mysqldump not found, skipping MySQL backup"
        return
    fi

    log "INFO" "Backing up MySQL databases"

    # Get database list
    local dbs="$(mysql -u"${credentials%:*}" -p"${credentials#*:}" -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema|performance_schema|mysql|sys)")"

    for db in $dbs; do
        local backup_file="$backup_dir/mysql_${db}.sql"
        mysqldump -u"${credentials%:*}" -p"${credentials#*:}" --single-transaction --routines --triggers "$db" > "$backup_file"
        check_error "Failed to backup MySQL database: $db"
        compress_file "$backup_file"
    done
}

sync_to_remote() {
    local backup_dir="$1"

    if [[ -z "$REMOTE_HOST" ]] || [[ -z "$REMOTE_PATH" ]]; then
        log "INFO" "Remote backup not configured"
        return
    fi

    log "INFO" "Syncing backup to remote server: $REMOTE_HOST"

    # Use rsync for efficient transfer
    rsync -avz --delete "$backup_dir" "$REMOTE_HOST:$REMOTE_PATH/"
    check_error "Failed to sync backup to remote server"
}

cleanup_old_backups() {
    log "INFO" "Cleaning up backups older than $RETENTION_DAYS days"

    find "$BACKUP_ROOT" -type f -name "*.tar*" -o -name "*.sql*" -o -name "*.dump*" | \
        while IFS= read -r file; do
            if [[ $(find "$file" -mtime +$RETENTION_DAYS -print) ]]; then
                rm -f "$file"
                log "DEBUG" "Removed old backup: $file"
            fi
        done

    # Remove empty directories
    find "$BACKUP_ROOT" -type d -empty -delete
}

generate_backup_report() {
    local backup_dir="$1"
    local report_file="$backup_dir/backup_report.txt"

    echo "=== BACKUP REPORT $(date) ===" > "$report_file"
    echo "Backup directory: $backup_dir" >> "$report_file"
    echo "Total size: $(du -sh "$backup_dir" | cut -f1)" >> "$report_file"
    echo "" >> "$report_file"

    echo "=== FILES BACKED UP ===" >> "$report_file"
    find "$backup_dir" -type f -name "*.tar*" -exec ls -lh {} \; >> "$report_file"

    echo "" >> "$report_file"
    echo "=== DATABASE BACKUPS ===" >> "$report_file"
    find "$backup_dir" -type f -name "*.sql*" -o -name "*.dump*" -exec ls -lh {} \; >> "$report_file"
}

# ==================== MAIN ====================

main() {
    require_root
    init_error_handling

    local backup_type="full"
    local backup_dir=""

    # Parse arguments
    while [[ $# -gt 0 ]]; do
        case "$1" in
            --incremental)
                backup_type="incremental"
                ;;
            --dry-run)
                DRY_RUN=1
                ;;
        esac
        shift
    done

    log "INFO" "Starting $backup_type backup"

    # Initialize backup directory
    backup_dir=$(init_backup "$backup_type")

    # Perform backups
    backup_filesystem "$backup_dir"
    backup_databases "$backup_dir"

    # Sync to remote
    sync_to_remote "$backup_dir"

    # Cleanup old backups
    cleanup_old_backups

    # Generate report
    generate_backup_report "$backup_dir"

    log "SUCCESS" "Backup completed successfully: $backup_dir"
    send_notification "Backup Completed" "$backup_type backup completed successfully"
}

main "$@"

Monitoring Script

Enterprise System Monitoring Script

#!/bin/bash
# system_monitor.sh - Comprehensive system monitoring

# ==================== CONFIGURATION ====================
ALERT_THRESHOLDS=(
    ["CPU"]=80
    ["MEMORY"]=85
    ["DISK"]=90
    ["LOAD"]=5.0
)

SERVICES_TO_MONITOR=(
    "nginx"
    "postgresql"
    "redis"
    "ssh"
    "cron"
)

PORTS_TO_CHECK=(
    ["80"]="HTTP"
    ["443"]="HTTPS"
    ["22"]="SSH"
    ["5432"]="PostgreSQL"
    ["6379"]="Redis"
)

LOG_PATTERNS=(
    ["ERROR"]="/var/log/syslog"
    ["Failed password"]="/var/log/auth.log"
    ["Out of memory"]="/var/log/kern.log"
)

# ==================== MONITORING FUNCTIONS ====================

check_cpu() {
    local cpu_usage="$(top -bn1 | grep "Cpu(s)" | awk '{print $2 + $4}')"
    local threshold="${ALERT_THRESHOLDS[CPU]}"

    printf "CPU Usage: %.1f%%" "$cpu_usage"

    if (( $(echo "$cpu_usage > $threshold" | bc -l) )); then
        send_alert "High CPU Usage" "CPU usage is ${cpu_usage}% (threshold: ${threshold}%)"
        return 1
    fi
    return 0
}

check_memory() {
    local mem_info="$(free -m)"
    local total="$(echo "$mem_info" | grep Mem | awk '{print $2}')"
    local used="$(echo "$mem_info" | grep Mem | awk '{print $3}')"
    local percentage="$(echo "scale=2; $used * 100 / $total" | bc)"
    local threshold="${ALERT_THRESHOLDS[MEMORY]}"

    printf "Memory: %s/%s MB (%.1f%%)" "$used" "$total" "$percentage"

    if (( $(echo "$percentage > $threshold" | bc -l) )); then
        send_alert "High Memory Usage" "Memory usage is ${percentage}% (threshold: ${threshold}%)"
        return 1
    fi
    return 0
}

check_disk() {
    local errors=0
    local threshold="${ALERT_THRESHOLDS[DISK]}"

    df -h | grep -v tmpfs | tail -n +2 | while IFS= read -r line; do
        local filesystem="$(echo "$line" | awk '{print $1}')"
        local usage="$(echo "$line" | awk '{print $5}' | sed 's/%//')"
        local mount="$(echo "$line" | awk '{print $6}')"

        printf "Disk %s (%s): %s%%\\n" "$filesystem" "$mount" "$usage"

        if [[ $usage -ge $threshold ]]; then
            send_alert "High Disk Usage" "Disk $filesystem ($mount) is ${usage}% full (threshold: ${threshold}%)"
            ((errors++))
        fi
    done

    return $errors
}

check_services() {
    local errors=0

    for service in "${SERVICES_TO_MONITOR[@]}"; do
        if systemctl is-active --quiet "$service"; then
            printf "Service %s: RUNNING\\n" "$service"
        else
            printf "Service %s: STOPPED\\n" "$service"
            send_alert "Service Down" "Service $service is not running"
            ((errors++))
        fi
    done

    return $errors
}

check_ports() {
    local errors=0

    for port in "${!PORTS_TO_CHECK[@]}"; do
        local service_name="${PORTS_TO_CHECK[$port]}"

        if timeout 1 bash -c "cat < /dev/null > /dev/tcp/127.0.0.1/$port" 2>/dev/null; then
            printf "Port %s (%s): OPEN\\n" "$port" "$service_name"
        else
            printf "Port %s (%s): CLOSED\\n" "$port" "$service_name"
            send_alert "Port Closed" "Port $port ($service_name) is not accessible"
            ((errors++))
        fi
    done

    return $errors
}

check_logs() {
    local errors=0

    for pattern in "${!LOG_PATTERNS[@]}"; do
        local logfile="${LOG_PATTERNS[$pattern]}"
        local count="$(grep -c "$pattern" "$logfile" 2>/dev/null || echo 0)"

        if [[ $count -gt 0 ]]; then
            printf "Log pattern '%s': %d matches\\n" "$pattern" "$count"
            if [[ $count -gt 10 ]]; then # Threshold for alerts
                send_alert "Log Alert" "Found $count occurrences of '$pattern' in $logfile"
                ((errors++))
            fi
        fi
    done

    return $errors
}

generate_report() {
    local total_errors="$1"
    local report_file="/var/log/system_monitor_$(date +%Y%m%d).log"

    echo "=== SYSTEM MONITORING REPORT $(date) ===" > "$report_file"
    echo "Total alerts generated: $total_errors" >> "$report_file"
    echo "Status: $([[ $total_errors -eq 0 ]] && echo "HEALTHY" || echo "ISSUES DETECTED")" >> "$report_file"
    echo "" >> "$report_file"
}

# ==================== MAIN ====================

main() {
    require_root
    init_error_handling

    local total_errors=0

    log "INFO" "Starting system monitoring"

    echo "=== SYSTEM MONITORING CHECK $(date) ==="
echo

    # Run all checks
    echo "1. CPU Usage:"
    check_cpu || ((total_errors++))
echo

    echo "2. Memory Usage:"
    check_memory || ((total_errors++))
echo

    echo "3. Disk Usage:"
    check_disk || ((total_errors++))
echo

    echo "4. Service Status:"
    check_services || ((total_errors++))
echo

    echo "5. Port Availability:"
    check_ports || ((total_errors++))
echo

    echo "6. Log Analysis:"
    check_logs || ((total_errors++))
echo

    # Generate report
    generate_report "$total_errors"

    if [[ $total_errors -eq 0 ]]; then
        log "SUCCESS" "All system checks passed"
    else
        log "WARN" "Found $total_errors issues that need attention"
    fi

    echo "=== MONITORING COMPLETE ==="
    echo "Total issues detected: $total_errors"
}

# Run monitoring
main "$@"

Production-Ready Script Checklist

✅ Error handling: set -euo pipefail, trap, check exit codes

✅ Logging: Comprehensive logging with timestamps and levels

✅ Configuration: External config files or environment variables

✅ Input validation: Validate all user inputs and arguments

✅ Idempotency: Script can be run multiple times safely

✅ Documentation: Clear usage instructions and examples

✅ Security: No hardcoded passwords, proper permissions

✅ Portability: Works across different Linux distributions

✅ Resource management: Clean up temporary files

✅ Testing: Unit tests or at least manual testing scenarios

✅ Monitoring: Scripts should report their own status

✅ Version control: All scripts in version control with change history

Shell Scripting Best Practices

Do's

Use #!/usr/bin/env bash for portability

Enable strict mode: set -euo pipefail

Use functions to organize code

Validate all inputs and arguments

Use meaningful variable names

Comment complex logic

Implement proper error handling

Use arrays for lists of items

Quote all variable expansions

Use [[ ]] for conditionals

Prefer $( ) over backticks

Use local variables in functions

Test scripts in a safe environment

Don'ts

Don't use ALL_CAPS for non-exported variables

Don't parse ls output (use globs or find)

Don't use cd without checking return code

Don't use echo for variable data without quoting

Don't ignore exit codes

Don't write scripts without error handling

Don't use deprecated syntax

Don't hardcode paths (use variables)

Don't run as root unnecessarily

Don't create world-writable files

Don't use cat file | grep pattern (use grep pattern file)

Don't forget to test edge cases

Shell Scripting Quick Reference

Essential Commands

$ bash -n script.sh # Syntax check
$ bash -x script.sh # Debug mode
$ shellcheck script.sh # Lint script
$ time ./script.sh # Measure execution time
$ strace -f ./script.sh # Trace system calls

Testing & Debugging

# Debugging options
set -x # Print commands as they execute
set -v # Print shell input lines
trap 'echo "Line $LINENO: $BASH_COMMAND"' DEBUG
PS4='+ $LINENO: ' # Custom debug prompt

Useful One-Liners

# Process substitution
diff <(ls dir1) <(ls dir2)
# Read lines from file
while IFS= read -r line; do echo "$line"; done < file
# Generate sequence
for i in {1..10}; do echo "$i"; done
# Check if command exists
command -v cmd >/dev/null 2>&1

Practice Scenarios for DevOps Scripting

Create a script that automatically provisions a new server with your preferred stack (web server, database, monitoring).

Write a log analysis script that identifies the top 10 IP addresses with failed SSH attempts and automatically blocks them.

Develop a backup script that supports incremental backups, encryption, and remote synchronization.

Create a monitoring script that alerts when disk usage exceeds 90%, memory usage exceeds 85%, or CPU load exceeds 5.0.

Write a deployment script that can rollback to a previous version if health checks fail.

Develop a user management script that can create users, set up SSH keys, and configure sudo access from a CSV file.

Create a certificate management script that automatically renews Let's Encrypt certificates and reloads services.

Write a Docker container management script that can start, stop, update, and monitor containers.

Develop a configuration drift detection script that compares current system state with a known good baseline.

Create a script that automatically scales application instances based on CPU load or request rate.

Key Takeaways

Automation is power: Shell scripting turns repetitive tasks into automated workflows

Start simple, build complex: Begin with basic scripts and add features gradually

Error handling is non-negotiable: Robust scripts handle failures gracefully

Documentation saves time: Comment your code and provide usage instructions

Test thoroughly: Test scripts in safe environments before production

Security matters: Don't hardcode secrets, use proper permissions

Reusability pays off: Build libraries of functions for common tasks

Monitoring is essential: Scripts should log their activities and report status

Cron is your friend: Schedule regular tasks for maintenance and monitoring

Continuous improvement: Refactor scripts as you learn better techniques

Mastering shell scripting is one of the most valuable skills for a DevOps professional. It enables you to automate infrastructure, deploy applications, monitor systems, and respond to incidents efficiently. Remember: the goal is not just to write scripts, but to create reliable, maintainable automation that makes systems more resilient and operations more efficient.

Saturday, December 27, 2025