Migrating from Nginx Proxy Manager to Traefik: A Journey Through Configuration Hell

Migrating from Nginx Proxy Manager to Traefik: A Journey Through Configuration Hell
Photo by Taylor Vick / Unsplash

After months of using Nginx Proxy Manager (NPM) to manage my homelab's reverse proxy setup, I decided it was time to level up. Don't get me wrong—NPM is fantastic for getting started quickly with its user-friendly web interface. But as my infrastructure grew, I found myself craving something more... code-based. Something that could be version-controlled, automated, and deployed without clicking through a web UI.

Enter Traefik.

Why Migrate?

Before diving into the technical details, let me explain why I chose Traefik over NPM:

Infrastructure as Code: Everything lives in Git. No more manual configuration through a web interface that can't be easily replicated or rolled back.

Automatic Service Discovery: Traefik automatically discovers new Docker containers and configures routes based on labels. No manual proxy host creation needed.

Native Docker Integration: Labels directly in docker-compose.yaml files make the configuration self-documenting and colocated with service definitions.

Automated SSL/TLS: Let's Encrypt certificate management with DNS challenges—perfect for internal services that aren't publicly accessible.

Better Monitoring: Built-in dashboard and metrics for observability.

The Starting Point

My infrastructure was organized into two main stacks:

Prism Stack (pixel/prism): The network/proxy layer running NPM

  • Nginx Proxy Manager on ports 80, 443, and 81 (admin UI)
  • Manually configured proxy hosts via web interface
  • Let's Encrypt certificates managed through NPM

Contentarium Stack (pixel/contentarium): Media server services

  • Sonarr, Radarr, Plex (all using network_mode: host)
  • Services accessed via domain name pointing to a local IP

All deployed automatically via GitHub Actions workflows using self-hosted runners. Clean, automated Infrastructure as Code... except for that NPM web UI part.

The Migration Plan

The goal was straightforward:

  1. Replace NPM with Traefik in the prism stack
  2. Add Traefik labels to all services in the contentarium stack
  3. Switch from network_mode: host to Docker bridge networks
  4. Migrate from manual proxy configuration to automatic service discovery
  5. Shorten domain names from *.home.example.com to *.h.example.com (because brevity)

Implementation

Step 1: Traefik Configuration

First, I created a static configuration file (traefik.yml):

# API and Dashboard
api:
  dashboard: true
  insecure: false

# Entry Points
entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
          permanent: true

  websecure:
    address: ":443"
    http:
      tls:
        certResolver: letsencrypt

# Docker Provider
providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: proxy-network

# Let's Encrypt with DNS Challenge
certificatesResolvers:
  letsencrypt:
    acme:
      email: your-email@example.com
      storage: /acme.json
      dnsChallenge:
        provider: cloudflare
        resolvers:
          - "1.1.1.1:53"
          - "8.8.8.8:53"

Step 2: Docker Compose for Traefik

Updated stacks/pixel/prism/compose.yaml:

services:
  traefik:
    image: 'traefik:v3.2'
    container_name: traefik
    restart: unless-stopped
    ports:
      - '80:80'
      - '443:443'
      - '8080:8080'  # Dashboard
    environment:
      - CF_DNS_API_TOKEN=${CF_DNS_API_TOKEN}
      - CF_API_EMAIL=${CF_API_EMAIL}
      - TZ=${TZ}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ${ROOT}/config/traefik/traefik.yml:/etc/traefik/traefik.yml:ro
      - ${ROOT}/config/traefik/acme.json:/acme.json
    networks:
      - proxy-network
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.dashboard.rule=Host(`proxy.example.com`)"
      - "traefik.http.routers.dashboard.entrypoints=websecure"
      - "traefik.http.routers.dashboard.tls.certresolver=letsencrypt"
      - "traefik.http.routers.dashboard.service=api@internal"

networks:
  proxy-network:
    external: true

Step 3: Service Labels

For each service in the contentarium stack, I added Traefik labels. Here's an example with Overseerr:

overseerr:
  container_name: overseerr
  image: lscr.io/linuxserver/overseerr:latest
  environment:
    - PUID=${PUID}
    - PGID=${PGID}
    - TZ=${TZ}
  volumes:
    - ${ROOT}/config/overseerr:/config
  ports:
    - 5055:5055
  restart: unless-stopped
  networks:
    - proxy-network
  labels:
    - "traefik.enable=true"
    - "traefik.http.routers.overseerr.rule=Host(`overseerr.example.com`)"
    - "traefik.http.routers.overseerr.entrypoints=websecure"
    - "traefik.http.routers.overseerr.tls.certresolver=letsencrypt"
    - "traefik.http.services.overseerr.loadbalancer.server.port=5055"

The Problems (Oh Boy, There Were Problems)

Problem #1: Docker Network Configuration

Error:

network proxy-network was found but has incorrect label com.docker.compose.network set to ""

Root Cause: I had manually created the proxy-network using docker network create, but Docker Compose expected to manage it.

Solution: Mark the network as external: true in the compose file:

networks:
  proxy-network:
    external: true

Problem #2: Port Conflict

Error:

Bind for 0.0.0.0:443 failed: port is already allocated

Root Cause: NPM was still running and holding ports 80 and 443. 

Solution: Manually stop NPM before deploying Traefik:

docker stop npm && docker rm npm

(Note to self: Add this to the deployment workflow for future migrations)


Problem #3: Configuration File Not Found

Error:

error="command traefik error: read /etc/traefik/traefik.yml: is a directory"

Root Cause: The traefik.yml file existed in my Git repository but wasn't on the server. The volume mount created a directory instead of mounting the file.

Solution: Copy the configuration file to the server manually:

Lesson learned: Ensure configuration files are deployed before container startup. This could be automated in the GitHub Actions workflow in the future.


Problem #4: SSL Certificate Generation Failed

Error:

Unable to obtain ACME certificate for domains 
error="...unable to parse email address"

Root Cause: In traefik.yml, I tried using email: "${CF_API_EMAIL}" expecting environment variable interpolation.

Solution: Hardcode the email directly in traefik.yml:

certificatesResolvers:
  letsencrypt:
    acme:
      email: your-email@example.com  # Hardcoded, not ${CF_API_EMAIL}
      storage: /acme.json
      dnsChallenge:
        provider: cloudflare

The Cloudflare API token can still be passed as an environment variable to the container (which Traefik reads at runtime).


The Results

After resolving these issues, everything worked beautifully:

✅ All services accessible via *.example.com with valid Let's Encrypt certificates

✅ Traefik dashboard running on proxy.example.com 

✅ Automatic service discovery, adding a new service is as simple as adding labels

✅ Full Infrastructure as Code, entire proxy configuration versioned in Git

✅ Zero-downtime certificate renewals

✅ Clean, readable configuration colocated with service definitions

Key Takeaways

1. Environment Variables in Traefik Static configuration files (.yml) don't support environment variable interpolation. Only runtime configuration (labels, dynamic config) supports it.

2. Docker Network Management Be consistent: either let Docker Compose manage networks (external: false) or manage them manually (external: true). Don't mix approaches.

3. File vs Directory Mounts When mounting configuration files, ensure they exist on the host before starting containers. Otherwise, Docker creates a directory instead.

4. Port Conflicts Always check for port conflicts before deploying. In this case, stopping the old proxy (NPM) first was essential.

5. Infrastructure as Code Benefits Despite the migration hurdles, having everything in Git makes rollbacks trivial:

git revert <commit-hash>
git push origin main

GitHub Actions automatically redeploys the previous working configuration.

Would I Do It Again?

Absolutely. The initial migration took a few hours (mostly troubleshooting), but the long-term benefits are worth it:

  • No more manual configuration through a web UI
  • Version-controlled infrastructure with full audit history
  • Automatic service discovery saves time when adding new services
  • Better monitoring with the built-in dashboard
  • Consistent deployment through GitHub Actions

If you're running a homelab with multiple services and already using Docker Compose, Traefik is a natural fit. The learning curve is steeper than NPM, but the payoff in automation and maintainability is significant.

Resources