Saturday, December 13, 2025

Linux Process & System Management

Linux Process & System Management - DevOps Process Control & Automation

Linux Process & System Management

Published: December 2023 | Topic: System Administration & Automation for DevOps

Mastering process and system management is crucial for DevOps engineers. You need to understand how Linux manages processes, how to monitor system resources, schedule automated tasks, and control system startup. These skills are essential for maintaining stable, performant production systems and automating operational tasks.

The Linux Process Model

In Linux, everything runs as a process. Understanding processes is fundamental to system administration:

  • Process: A running instance of a program
  • PID: Process ID - unique number identifying each process
  • PPID: Parent Process ID - the process that created this process
  • UID/GID: User/Group ID of the process owner
  • Process States: Running, Sleeping, Stopped, Zombie
  • Process Hierarchy: Tree structure starting from init/systemd (PID 1)

Process Hierarchy Visualization

systemd/init
PID 1
Parent of all processes
sshd
PID 456
SSH daemon
nginx
PID 789
Web server
bash
PID 1234
User shell

1. Understanding Processes & Jobs

What are Processes and Jobs?

A process is a running program instance managed by the kernel. A job is a process that's managed by your shell (with job control). Every process has:

Process Attributes

  • PID: Unique Process ID
  • PPID: Parent Process ID
  • UID/GID: User/Group ownership
  • Priority: Nice value (-20 to 19)
  • State: Running, Sleeping, etc.
  • Memory Usage: RSS, VSZ
  • CPU Usage: %CPU time

Process States

  • R Running or Runnable
  • S Sleeping (interruptible)
  • D Uninterruptible Sleep
  • T Stopped by signal
  • Z Zombie (terminated)
  • X Dead (won't be seen)

Process Creation: fork() and exec()

Linux creates processes through two system calls:

fork() - Create Copy

Creates a duplicate of the current process (child). Parent and child initially run same code.

# Parent process continues
Parent PID: 1000
Child PID: 1001
# Both processes run from same point

exec() - Replace Image

Replaces current process with a new program. Process continues with new program code.

# Child process loads new program
bash → exec() → ls
# Same PID (1001), new program

Viewing Process Information

# Basic process listing
$ ps
PID TTY TIME CMD
1234 pts/0 00:00:00 bash
5678 pts/0 00:00:01 python

# Detailed process info
$ ps -ef
# Full format listing

# Show process tree
$ pstree
# or
$ ps -ejH

# Find process by name
$ pgrep nginx
$ pidof nginx

2. Foreground vs Background Processes

What are Foreground and Background Processes?

Shells can run processes in two modes:

Foreground Processes

  • Runs in current terminal
  • Blocks shell input until complete
  • Receives keyboard signals (Ctrl+C, Ctrl+Z)
  • Standard input/output connected to terminal
  • Use when: Interactive programs, need user input
$ vim file.txt
# Shell waits until vim exits
# Ctrl+C to interrupt

Background Processes

  • Runs independently of terminal
  • Shell immediately returns prompt
  • Doesn't receive keyboard input
  • Output may still appear in terminal
  • Use when: Long-running tasks, don't need interaction
$ tar -czf backup.tar.gz /data &
# [1] 12345 - Job number and PID
# Shell returns immediately

Job Control Commands

Starting Jobs

$ long_task.sh &
# Start in background

$ Ctrl+Z
# Suspend foreground job
# [1]+ Stopped long_task.sh

Managing Jobs

$ jobs
# List shell jobs
[1] Running tar -czf backup.tar.gz /data &
[2]- Stopped vim file.txt
[3]+ Stopped top

$ fg %1
# Bring job 1 to foreground
$ bg %2
# Resume job 2 in background

Disowning Jobs

# Keep job running after logout
$ long_task.sh &
$ disown %1
# or start with nohup
$ nohup long_task.sh &
# Output goes to nohup.out

Practical Job Control Workflow

# Start a backup in background
$ tar -czf /backup/data.tar.gz /data &
[1] 12345

# Check job status
$ jobs -l
[1]+ 12345 Running tar -czf /backup/data.tar.gz /data &

# Start another task, then suspend it
$ find / -name "*.log" 2>/dev/null
# Too slow, press Ctrl+Z
[2]+ Stopped find / -name "*.log" 2>/dev/null

# Resume find in background
$ bg %2
[2]+ find / -name "*.log" 2>/dev/null &

# Bring backup to foreground to monitor
$ fg %1
# Now monitoring tar output
# Press Ctrl+Z to suspend again

# Kill the find job
$ kill %2

3. Process Monitoring: ps, top, htop

What is Process Monitoring?

Monitoring tools help you understand system resource usage, identify performance bottlenecks, and troubleshoot process issues. Essential for maintaining healthy systems.

Essential Monitoring Commands

ps - Process Status

The classic process viewer. Many output formats available.

$ ps aux
# BSD style: user-oriented
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

$ ps -ef
# Unix style: full listing

$ ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head
# Custom format, sorted by CPU

top - Interactive Process Viewer

Real-time process monitoring with interactive controls.

# Launch top
$ top

# Top keyboard shortcuts:
h - Help
P - Sort by CPU (default)
M - Sort by memory
N - Sort by PID
k - Kill process (enter PID)
r - Renice process
1 - Toggle CPU cores
q - Quit

htop - Enhanced top

Colorful, user-friendly process viewer with more features.

# Install if not available
$ sudo apt install htop # Debian/Ubuntu
$ sudo yum install htop # RHEL/CentOS

$ htop

# Features:
• Color-coded display
• Mouse support
• Tree view (F5)
• Filter processes (F4)
• Kill with F9
• Customize with F2

Advanced Monitoring Tools

System Resource Monitoring

# Memory usage
$ free -h
# or
$ cat /proc/meminfo

# CPU information
$ lscpu
$ cat /proc/cpuinfo

# Disk I/O statistics
$ iostat -x 1
$ iotop

# Network connections
$ ss -tulpn
$ netstat -tulpn

Process Priority: nice & renice

Control process CPU scheduling priority (-20 highest, 19 lowest).

# Start process with low priority
$ nice -n 19 compression.sh
# Nice value 19 (lowest priority)

# Start process with high priority
$ nice -n -20 important_task.sh
# Nice value -20 (highest priority)

# Change priority of running process
$ renice -n 10 -p 1234
# Change PID 1234 to nice 10

# Change priority of all processes for user
$ renice -n 5 -u alice

Monitoring Best Practices for DevOps

  • Establish baselines: Know normal resource usage patterns
  • Monitor key metrics: CPU, memory, disk I/O, network
  • Set up alerts: For high resource usage or process failures
  • Use process trees: Understand parent-child relationships
  • Check for zombies: Processes stuck in zombie state
  • Monitor system load: Load average over 1, 5, 15 minutes
  • Track process lifetimes: Sudden deaths or long runtimes

4. Scheduling Tasks with cron and at

What is Task Scheduling?

Linux provides tools to automate tasks to run at specific times. cron schedules recurring tasks, while at schedules one-time tasks.

cron - Recurring Task Scheduler

Understanding cron Syntax

Minute (0-59)
Hour (0-23)
Day of Month (1-31)
Month (1-12)
Day of Week (0-7, 0=Sun)
Command to Execute

cron Special Characters

# * = any value
0 * * * * command
# Every hour at minute 0

# , = value list separator
0 0,12 * * * command
# Twice daily at midnight and noon

# - = range of values
0 9-17 * * * command
# Every hour 9am-5pm

# / = step values
*/15 * * * * command
# Every 15 minutes

cron Shortcuts

# Common predefined schedules
@hourly command
# 0 * * * *

@daily command
# 0 0 * * * (midnight)

@weekly command
# 0 0 * * 0 (Sunday midnight)

@monthly command
# 0 0 1 * * (1st of month)

@yearly command
# 0 0 1 1 * (Jan 1st)

@reboot command
# Run at system startup

Managing cron Jobs

User crontabs

Each user has their own crontab file.

# Edit your crontab
$ crontab -e

# List your cron jobs
$ crontab -l

# Remove all your cron jobs
$ crontab -r

# Load crontab from file
$ crontab mycron.txt

System crontabs

System-wide cron jobs in /etc/cron.* directories.

# System crontab file
/etc/crontab

# Directory for hourly scripts
/etc/cron.hourly/

# Directory for daily scripts
/etc/cron.daily/

# Directory for weekly scripts
/etc/cron.weekly/

# Directory for monthly scripts
/etc/cron.monthly/

# Additional cron files
/etc/cron.d/

at - One-time Task Scheduler

# Schedule job to run at specific time
$ at 11:30 PM tomorrow
at> /path/to/script.sh
at> Ctrl+D
job 1 at Thu Dec 2 23:30:00 2023

# Schedule in 2 hours
$ at now + 2 hours
at> echo "Task complete" | mail -s "Reminder" user@example.com

# List scheduled jobs
$ atq
# or
$ at -l

# Remove scheduled job
$ atrm 1
# Remove job number 1

Practical cron Examples for DevOps

# Backup database daily at 2 AM
0 2 * * * /usr/bin/mysqldump -u root -pPASSWORD database > /backup/db-$(date +\%Y\%m\%d).sql

# Monitor disk space every hour
0 * * * * /usr/local/bin/check_disk_space.sh

# Rotate logs weekly on Sunday at 3 AM
0 3 * * 0 /usr/sbin/logrotate /etc/logrotate.conf

# Sync data between servers every 15 minutes during business hours
*/15 9-17 * * 1-5 /usr/bin/rsync -avz /data/ user@backup:/backup/

# Run maintenance script on first day of month at midnight
0 0 1 * * /usr/local/bin/monthly_maintenance.sh

⚠️ cron Security & Best Practices

  • Use absolute paths: cron has minimal PATH
  • Redirect output: cron sends mail or use > /dev/null 2>&1
  • Test commands manually: Before adding to cron
  • Check cron logs: /var/log/cron or /var/log/syslog
  • Use locking: Prevent overlapping executions with flock
  • Set proper permissions: /etc/cron.allow and /etc/cron.deny
  • Monitor cron jobs: Failed cron jobs can cause issues

5. System Startup, Boot Targets, and Runlevels

What is the Linux Boot Process?

The Linux boot process involves multiple stages that initialize hardware, load the kernel, and start system services. Understanding this process is crucial for troubleshooting boot issues.

Linux Boot Process Stages

BIOS/UEFI
Hardware initialization
POST and device detection
Bootloader
GRUB2
Loads kernel and initramfs
Kernel
Linux kernel
Hardware initialization
Mounts root filesystem
Init System
systemd (or SysV init)
Starts system services
Reaches target/runlevel

systemd vs SysV Init

systemd (Modern)

  • Parallel startup (faster boot)
  • Socket activation
  • Dependency-based startup
  • Uses targets instead of runlevels
  • Logging via journald
  • Default on modern distributions
$ systemctl list-units
$ systemctl status nginx
$ systemctl enable nginx

SysV Init (Traditional)

  • Sequential startup
  • Uses runlevels (0-6)
  • Scripts in /etc/init.d/
  • Symbolic links in /etc/rc*.d/
  • Still found on older systems
  • Simpler but slower
$ service nginx status
$ /etc/init.d/nginx restart
$ chkconfig nginx on

systemd Targets (Modern Runlevels)

Target Purpose Traditional Runlevel Description
poweroff.target Shutdown 0 System shutdown
rescue.target Single User 1 Single-user mode, emergency
multi-user.target Multi-user 3 Text-based multi-user
graphical.target Graphical 5 Graphical multi-user
reboot.target Reboot 6 System reboot

Managing systemd Services

Service Management

# Start/stop services
$ sudo systemctl start nginx
$ sudo systemctl stop nginx
$ sudo systemctl restart nginx
$ sudo systemctl reload nginx

# Enable/disable at boot
$ sudo systemctl enable nginx
$ sudo systemctl disable nginx

Service Status & Info

# Check service status
$ systemctl status nginx
$ systemctl is-active nginx
$ systemctl is-enabled nginx

# View service logs
$ journalctl -u nginx
$ journalctl -u nginx --since "1 hour ago"

System Control

# Change system target
$ sudo systemctl isolate multi-user.target
# Switch to multi-user mode

# Power management
$ sudo systemctl reboot
$ sudo systemctl poweroff
$ sudo systemctl suspend

Boot Process Troubleshooting

Common Boot Issues & Solutions

# View boot messages
$ dmesg | less
$ journalctl -b
# Current boot
$ journalctl -b -1
# Previous boot

# Check failed services
$ systemctl --failed
$ systemctl list-units --state=failed

# Emergency/Rescue mode
# At GRUB menu, press 'e' to edit, add:
systemd.unit=rescue.target
# or for emergency shell:
systemd.unit=emergency.target

# Check boot time performance
$ systemd-analyze
$ systemd-analyze blame
$ systemd-analyze critical-chain

Complete DevOps Process Management Workflow

End-to-End Process Management

A typical DevOps workflow for managing application processes:

# 1. Deploy application with proper service file
$ sudo cp myapp.service /etc/systemd/system/
$ sudo systemctl daemon-reload
$ sudo systemctl enable myapp
$ sudo systemctl start myapp

# 2. Monitor application performance
$ watch -n 1 'ps aux | grep myapp | grep -v grep'
$ htop -p $(pgrep myapp)

# 3. Set up automated health checks
# In /etc/cron.d/myapp-monitor:
*/5 * * * * root /usr/local/bin/check_myapp.sh

# 4. Configure log rotation
# In /etc/logrotate.d/myapp:
/var/log/myapp/*.log {
  daily
  rotate 30
  compress
  delaycompress
  missingok
  notifempty
  create 644 myapp myapp
  postrotate
    systemctl reload myapp
  endscript
}

# 5. Set up automated restarts on failure
# In /etc/systemd/system/myapp.service:
[Service]
Restart=on-failure
RestartSec=10s
StartLimitIntervalSec=60
StartLimitBurst=3

Essential Commands Cheat Sheet

Process Management

$ ps aux # List all processes
$ top # Interactive process viewer
$ htop # Enhanced process viewer
$ pstree # Process tree
$ kill -9 PID # Force kill process
$ killall process_name # Kill by name

Job Control

$ command & # Start in background
$ jobs # List jobs
$ fg %1 # Bring job 1 to foreground
$ bg %2 # Resume job 2 in background
$ Ctrl+Z # Suspend foreground job

Task Scheduling

$ crontab -e # Edit cron jobs
$ crontab -l # List cron jobs
$ at 10:30 # Schedule one-time job
$ atq # List scheduled jobs
$ atrm 1 # Remove job 1

System Control

$ systemctl start service # Start service
$ systemctl stop service # Stop service
$ systemctl restart service # Restart service
$ systemctl status service # Service status
$ systemctl enable service # Enable at boot

Practice Scenarios for DevOps Engineers

  1. A web application is consuming 100% CPU. How would you identify the problematic process and safely restart it?
  2. You need to schedule a database backup every night at 2 AM that takes 3 hours to complete. How would you ensure it doesn't overlap with other jobs?
  3. A critical service keeps crashing. How would you configure it to automatically restart and alert you when it fails?
  4. You suspect a memory leak in an application. What tools would you use to monitor memory usage over time?
  5. How would you gracefully stop a long-running data processing job and resume it later?
  6. A server won't boot after a configuration change. What steps would you take to troubleshoot and recover?
  7. You need to run a resource-intensive report during off-hours without affecting production. How would you manage its priority?
  8. How would you set up monitoring to alert you when disk space reaches 90% or when critical services stop?

Key Takeaways

  • Everything is a process: Understanding processes is fundamental to Linux administration
  • Job control enables multitasking: Manage foreground/background processes effectively
  • Monitoring is proactive: Use ps, top, htop to identify issues before they cause problems
  • Automate with cron: Schedule repetitive tasks to run automatically
  • Master systemd: Modern Linux uses systemd for service management
  • Understand the boot process: Essential for troubleshooting startup issues
  • Prioritize processes: Use nice/renice to manage CPU allocation
  • Implement proper service management: Enable, start, stop, and monitor services correctly

Mastering process and system management will make you more effective at maintaining stable, performant Linux systems in DevOps environments. These skills enable you to proactively monitor systems, automate routine tasks, and quickly resolve issues when they arise.

No comments:

Post a Comment

Linux Security & Permissions for DevOps

Linux Security & Permissions - DevOps Security Guide Linux Security & Permissions ...