Local Setup - 2026 [On Arch Linux]

2026-02-20

As an Infrastructure engineer, my terminal isn’t just a shell — it’s my control plane. Production debugging, Kubernetes firefighting, ASG scaling checks, S3 hotfixes, SSH pivots — everything happens here. This post is generated from my actual ~/.zshrc on Arch Linux: what I really run, not a wishlist.

This setup has evolved over six years of continuous refinement, shaped by daily use across different environments & roles. It has traveled with me from Windows → Ubuntu → macOS → Arch (CachyOS), picking up useful patterns & improvements along the way.

Every function & shortcut exists because it made something faster, clearer, or more repeatable. Over time, the terminal naturally became a production cockpit - a place where context, tooling, & muscle memory come together.

The stack is intentionally simple & well-tested: Zsh, Oh My Zsh, Powerlevel10k, tmux, kubectl, k9s, fzf, AWS CLI, plus a growing set of custom functions. Nothing flashy — just efficient, consistent, & built to keep me moving quickly, the way DevOps systems usually end up... well they never end up, & always grow :).

1. Shell (Zsh), theme (Powerlevel10k), & plugins

Shell is zsh; theme is Powerlevel10k. Oh My Zsh loads the theme & plugins:

export ZSH="$HOME/.oh-my-zsh"
ZSH_THEME="powerlevel10k/powerlevel10k"

plugins=(
  git
  history-substring-search
  fzf
  kubectl
  kubectx
  helm
  docker
  docker-compose
  terraform
  aws
  systemd
  sudo
  archlinux
  ssh-agent
)
source $ZSH/oh-my-zsh.sh

2. SSH & SSM: secure access

SSM (recommended)

For hardened or private environments I use AWS Systems Manager instead of opening port 22. ssm_fuzzy lists running EC2 instances in a region (default $AWS_DEFAULT_REGION, fallback ap-south-1), optionally filters by a query string, then uses fzf with a small preview (InstanceId, Name, IP, AZ). Selecting one runs aws ssm start-session for that instance. Fallback without fzf: numbered list + read.

ssm_fuzzy() {
  local REGION="${AWS_DEFAULT_REGION:-ap-south-1}"
  local QUERY=""

  for arg in "$@"; do
    case "$arg" in
      --region=*) REGION="${arg#*=}" ;;
      *) QUERY="$arg" ;;
    esac
  done

  local INSTANCES
  INSTANCES=$(aws ec2 describe-instances \
    --region "$REGION" \
    --filters "Name=instance-state-name,Values=running" \
    --query 'Reservations[].Instances[].{
      ID: InstanceId,
      Name: Tags[?Key==`Name`]|[0].Value,
      IP: PrivateIpAddress,
      AZ: Placement.AvailabilityZone
    }' \
    --output text)

  [[ -z "$INSTANCES" ]] && echo "No running instances found." && return 1
  [[ -n "$QUERY" ]] && INSTANCES=$(echo "$INSTANCES" | grep -i "$QUERY")
  [[ -z "$INSTANCES" ]] && echo "No instances matched '$QUERY'" && return 1

  local SELECTED=""
  if command -v fzf >/dev/null && [[ -t 1 ]]; then
    SELECTED=$(echo "$INSTANCES" | \
      awk '{printf "%-20s %-30s %-15s %-10s\n", $1, $2, $3, $4}' | \
      fzf --prompt="Select EC2 for SSM > " \
          --preview "echo {} | awk '{print \"InstanceId: \" \$1 \"\nName: \" \$2 \"\nIP: \" \$3 \"\nAZ: \" \$4}'")
  else
    nl -w2 -s'. ' <<< "$INSTANCES"
    read -p "Select number: " idx
    SELECTED=$(echo "$INSTANCES" | sed -n "${idx}p")
  fi

  [[ -z "$SELECTED" ]] && return 0

  local INSTANCE_ID
  INSTANCE_ID=$(echo "$SELECTED" | awk '{print $1}')
  echo "Starting SSM session to $INSTANCE_ID in $REGION"
  aws ssm start-session --region "$REGION" --target "$INSTANCE_ID"
}

SSH (legacy only)

SSH is only used for legacy or old running setups; it’s not recommended for production security (prefer SSM above). Where SSH is still needed, I don’t type ssh -i ~/.ssh/<key-prod> ubuntu@10.x.x.x. I use sshprod 10.x.x.x. One key per environment, with a small helper:

__ssh_with_key() {
  local key="$1"
  shift

  if [[ "$1" == *"@"* ]]; then
    ssh -i "$key" "$@"
  else
    ssh -i "$key" ubuntu@"$@"
  fi
}

# Replace <key-dev>, <key-stag>, <key-prod>, <key-aws> with your key paths
sshdev()  { __ssh_with_key ~/.ssh/<key-dev> "$@"; }
sshstag() { __ssh_with_key ~/.ssh/<key-stag> "$@"; }
sshprod() { __ssh_with_key ~/.ssh/<key-prod> "$@"; }
sshmum()  { __ssh_with_key ~/.ssh/<key-aws> "$@"; }

Enforces the right key per env & a default user so I can’t accidentally hit prod with a dev key.

3. Incident mode: tmux war room

One command: incident. It creates a session, splits panes, & starts k9s + kubectl + a shell.

incident() {
  local SESSION="incident"

  tmux new-session -d -s $SESSION -n INCIDENT
  tmux split-window -v -t $SESSION
  tmux split-window -h -t $SESSION:.1
  tmux resize-pane -t $SESSION:.0 -y 20

  tmux send-keys -t $SESSION:.0 "k9s" C-m
  tmux send-keys -t $SESSION:.1 "kubectl get pods" C-m
  tmux send-keys -t $SESSION:.2 "echo 'Incident shell ready'" C-m

  tmux select-pane -t $SESSION:.2
  tmux attach -t $SESSION
}

Top: k9s. Bottom left: pod list. Bottom right: shell. No manual pane setup during an incident.

4. Kubernetes helpers

These are not in the default Oh My Zsh kubectl plugin — they’re custom helpers for debugging & cleanup.

Pods with node (pod name + node name):

kpn() {
  kubectl get pods -o wide | awk '{print $1, $7}'
}

All crashing pods (restart count > 0). Current namespace by default; -A or --all-namespaces for all:

kcrash() {
  if [[ "$1" == "-A" ]] || [[ "$1" == "--all-namespaces" ]]; then
    kubectl get pods -A -o wide | awk 'NR==1 || $5+0>0'
  else
    kubectl get pods -o wide | awk 'NR==1 || $4+0>0'
  fi
}

Pods not ready (anything not Running or Completed). Add -A for all namespaces:

knotready() {
  if [[ "$1" == "-A" ]] || [[ "$1" == "--all-namespaces" ]]; then
    kubectl get pods -A -o wide | awk 'NR==1 || ($4!="Running" && $4!="Completed")'
  else
    kubectl get pods -o wide | awk 'NR==1 || ($3!="Running" && $3!="Completed")'
  fi
}

Pods in bad states (ImagePullBackOff, CrashLoopBackOff, Error, Pending, ErrImagePull). -A for all namespaces:

kbad() {
  if [[ "$1" == "-A" ]] || [[ "$1" == "--all-namespaces" ]]; then
    kubectl get pods -A -o wide | awk 'NR==1 || $4~/ImagePullBackOff|CrashLoopBackOff|Error|Pending|ErrImagePull/'
  else
    kubectl get pods -o wide | awk 'NR==1 || $3~/ImagePullBackOff|CrashLoopBackOff|Error|Pending|ErrImagePull/'
  fi
}

Pods sorted by restart count (worst first). -A for all namespaces:

krestarts() {
  local wide
  if [[ "$1" == "-A" ]] || [[ "$1" == "--all-namespaces" ]]; then
    wide=$(kubectl get pods -A -o wide)
    echo "$wide" | head -1
    echo "$wide" | tail -n +2 | sort -t' ' -k5 -rn
  else
    wide=$(kubectl get pods -o wide)
    echo "$wide" | head -1
    echo "$wide" | tail -n +2 | sort -t' ' -k4 -rn
  fi
}

Logs from previous container instance (handy after a crash). Omz has kl but not --previous:

klprev() {
  kubectl logs "$1" --previous "${@:2}"
}

Delete evicted pods in current namespace (or -A for all). Safe cleanup after node pressure:

kevict() {
  if [[ "$1" == "-A" ]] || [[ "$1" == "--all-namespaces" ]]; then
    kubectl get pods -A | awk 'NR>1 && $4~/Evicted/{print $1, $2}' | while read -r ns name; do kubectl delete pod -n "$ns" "$name"; done
  else
    kubectl get pods | awk 'NR>1 && $3~/Evicted/{print $1}' | while read -r name; do kubectl delete pod "$name"; done
  fi
}

After all this too, I use k9s a lot — but k9s helps me see what’s happening; the shell is where I encode what needs to happen next & how I can find out things at a more ground level.

5. AWS: ASG & EC2

EC2 by name tag:

ec2_find() {
  local NAME="$1"
  local REGION="${AWS_DEFAULT_REGION:-ap-south-1}"

  if [[ -z "$NAME" ]]; then
    echo "Usage: ec2_find <name-substring> [region]"
    return 1
  fi

  aws ec2 describe-instances \
    --region "$REGION" \
    --filters "Name=tag:Name,Values=*${NAME}*" \
    --query 'Reservations[].Instances[].{
      ID: InstanceId,
      Name: Tags[?Key==`Name`]|[0].Value,
      State: State.Name,
      IP: PrivateIpAddress,
      AZ: Placement.AvailabilityZone
    }' \
    --output table
}

ASG instances (region from $AWS_DEFAULT_REGION, fallback ap-south-1):

asg_instances() {
  local ASG="$1"
  local REGION="${2:-${AWS_DEFAULT_REGION:-ap-south-1}}"

  if [[ -z "$ASG" ]]; then
    echo "Usage: asg_instances <asg-name> [region]"
    return 1
  fi

  aws ec2 describe-instances \
    --region "$REGION" \
    --filters "Name=tag:aws:autoscaling:groupName,Values=$ASG" \
    --query 'Reservations[].Instances[].{
      InstanceId: InstanceId,
      Name: Tags[?Key==`Name`]|[0].Value,
      PrivateIP: PrivateIpAddress,
      State: State.Name,
      LaunchTime: LaunchTime
    }' \
    --output table
}

ASG fuzzy (fzf):

asg_fuzzy() {
  local REGION="${AWS_DEFAULT_REGION:-ap-south-1}"
  local WATCH=""

  for arg in "$@"; do
    case "$arg" in
      --watch) WATCH=1 ;;
      --region=*) REGION="${arg#*=}" ;;
    esac
  done

  while true; do
    local ASG_LIST
    ASG_LIST=$(aws autoscaling describe-auto-scaling-groups --region "$REGION" \
      --query 'AutoScalingGroups[].AutoScalingGroupName' --output text | tr '\t' '\n')

    [[ -z "$ASG_LIST" ]] && echo "No ASGs in $REGION" && return 1

    local SELECTED
    if command -v fzf >/dev/null && [[ -t 1 ]]; then
      SELECTED=$(echo "$ASG_LIST" | fzf --prompt="ASG > " \
        --preview "aws autoscaling describe-auto-scaling-groups --region $REGION --auto-scaling-group-names {} \
          --query 'AutoScalingGroups[0].{Desired:DesiredCapacity,Min:MinSize,Max:MaxSize,Instances:length(Instances)}' --output text 2>/dev/null || echo {}")
    else
      echo "$ASG_LIST" | nl -w2 -s'. '
      read -p "Select number: " idx
      SELECTED=$(echo "$ASG_LIST" | sed -n "${idx}p")
    fi

    [[ -z "$SELECTED" ]] && return 0
    asg_instances "$SELECTED" "$REGION"

    [[ -z "$WATCH" ]] && return 0
    echo "Refreshing in 5s (Ctrl-C to exit)..."
    sleep 5
  done
}

asg_fuzzy in my rc: fzf over ASG names, optional --watch to poll asg_instances every 5s, & a preview that shows desired/min/max/instance count. Ctrl-C exits & (in watch mode) returns to the ASG list. Same region default.

6. S3: edit in place

s3edit() {
  local S3PATH="$1"

  if [[ -z "$S3PATH" ]]; then
    echo "Usage: s3edit s3://bucket/key"
    return 1
  fi

  if [[ "$S3PATH" != s3://* ]]; then
    echo "Invalid S3 path. Must start with s3://"
    return 1
  fi

  local TMPFILE
  TMPFILE=$(mktemp /tmp/s3edit.XXXXXX)

  echo "Downloading $S3PATH ..."
  if ! aws s3 cp "$S3PATH" "$TMPFILE"; then
    echo "Download failed."
    rm -f "$TMPFILE"
    return 1
  fi

  ${EDITOR:-vim} "$TMPFILE"

  echo "Uploading back to $S3PATH ..."
  aws s3 cp "$TMPFILE" "$S3PATH"

  rm -f "$TMPFILE"
  echo "Done."
}

Download → edit with $EDITOR (vim fallback) → upload back. No manual copy/paste.

Design Philosophy Behind This Setup

Speed — Less typing. Faster execution.

Context Awareness — Git, K8s, AWS context visible at all times.

Incident Efficiency — Pre-built layouts & crash detection.

Reduced Human Error — Environment-based SSH keys, region defaults, IAM-based SSM.

Console Independence — Almost no dependency on AWS Console, Kubernetes Dashboard, or S3 UI. Everything happens in the terminal.

A Note on How I Use AI (More on This Soon)

Beyond the shell itself, I rely on AI for a large part of my day-to-day work - from coding & refactoring to incident RCA, log exploration, & pattern detection during outages. I also use AI for metrics-driven analysis across databases & applications by connecting AI agents directly to the observability & APM stack from the terminal, turning raw telemetry into actionable insights much faster than manual dashboards ever could.

That said, this deserves a deep dive of its own — how AI fits into a DevOps workflow, how agents reason over metrics & traces, & how this changes incident response. I've intentionally kept that for a separate post.