Private APT Repository on Cloudflare Workers

The Problem

Every homelab needs package caching. Every production environment needs custom packages. Yet most of us update from public mirrors with no offline capability.

The real pain points:

Latency: 50MB download for every new node
Air-gapped: No internet = no packages
Custom tooling: Internal .debs need distribution

My setup solves all three: packages live in GitHub Releases (versioned storage), metadata builds in CI (on release), and the worker serves both.

Scale: ~10 machines pulling packages. 500 requests/day before caching kicks in.

Architecture

If you’re running Debian or Ubuntu systems — whether in production, at home, or across a fleet of machines — you’ve probably felt the pain of:

Waiting for packages to download from public mirrors
Needing to patch vulnerable packages urgently across all machines
Wanting to distribute custom-built packages to your infrastructure

Public mirrors are great, but sometimes you need your own. Maybe it’s:

Custom-built packages for internal tooling
Pinned versions for stability
Air-gapped environments that can’t reach the internet

I faced this when building my homelab. I wanted to distribute custom packages to all my Proxmox nodes and GitHub Actions runners without exposing them to the public internet.

The Architecture

Here’s the high-level picture:

  flowchart TB
    subgraph Producers
        Dev[Developer]
        Release[GitHub Release]
    end

    subgraph CI
        Workflow[rebuild-index]
        Branch[apt-metadata branch]
    end

    subgraph Runtime
        Auth[Auth Layer]
        Meta[Metadata API]
        Pkg[Package API]
    end

    subgraph Consumers
        Nodes[Proxmox Nodes]
        Actions[GitHub Actions]
        Apt[apt client]
    end

    Dev --> Release
    Release --> Workflow
    Workflow --> Branch
    Branch --> Meta
    Release --> Pkg
    Nodes --> Auth
    Actions --> Auth
    Auth --> Meta
    Auth --> Pkg

The key insight: static metadata in Git, dynamic packages from releases.

How It Works: The Full Story

1. Package Publishing (Manual + Automated)

When you release a .deb package, it goes to GitHub Releases:

# Upload your package
gh release upload v1.2.3 my-package_1.2.3_amd64.deb

But APT needs more than just the .deb file — it needs metadata. That’s where the rebuild workflow comes in.

2. Metadata Generation (The CI Pipeline)

A GitHub Actions workflow (rebuild-index) listens for releases and generates the metadata:

# scripts/build_index.py (from actual implementation)
async def build_metadata(releases: list[Release]):
    """Build complete APT metadata for all releases."""
    
    packages = []
    for release in releases:
        for asset in release.assets:
            if asset.name.endswith('.deb'):
                pkg = parse_deb_control(asset)
                packages.append(pkg)
    
    # Generate Packages.gz (compressed package index)
    packages_content = "\n\n".join(p.as_apt_control() for p in packages)
    packages_gz = gzip.compress(packages_content.encode())
    
    # Generate InRelease (inline release file)
    inrelease = generate_inrelease(packages_gz, len(packages))
    
    # Sign with GPG
    gpg_signature = gpg_sign(inrelease)
    
    return {
        "Packages.gz": packages_gz,
        "InRelease": inrelease,
        "Release.gpg": gpg_signature
    }

This runs in CI and pushes the generated metadata to a dedicated apt-metadata branch. The branch is never checked out locally — it’s just the persistence layer for the index.

3. Runtime: The Cloudflare Worker

The worker handles three types of requests:

# src/entry.py - Main request router
async def on_fetch(request, env):
    path = urlparse(request.url).path
    
    if path == "/public.key":
        return serve_public_key(env)
    
    if path.startswith("/dists/"):
        return await serve_metadata(request, path, env)
    
    if path.startswith("/pool/"):
        return await serve_package(request, path, env)
    
    return Response.new("Not Found", status=404)

Authentication: OIDC + Basic Auth

This is where the design got interesting. I needed two authentication modes:

GitHub Actions OIDC (Preferred)

For GitHub Actions runners and other automated systems:

# src/auth.py
async def validate_oidc_token(token: str, env) -> bool:
    """Validate GitHub Actions OIDC token."""
    
    # Fetch JWKS from GitHub
    jwks = await fetch_jwks("https://token.actions.githubusercontent.com/.well-known/jwks")
    
    # Verify signature and claims
    claims = jwt.decode(token, jwks, algorithms=["RS256"], audience=env.OIDC_AUDIENCE)
    
    # Check repository and organization claims
    repo = claims.get("repository", "")
    org = claims.get("organization", "")
    
    return org in env.ALLOWED_ORGS and repo in env.ALLOWED_REPOS

The trick: put the OIDC token in the password field of Basic Auth:

# /etc/apt/auth.conf (on runners)
machine apt.example.com
login github-action
password: <oidc_token>

This works because apt sends Basic Auth headers, and the worker detects login == "github-action" to trigger OIDC validation instead of regular Basic Auth.

Basic Auth (Fallback)

For developer machines that can’t use OIDC:

async def validate_basic_auth(auth_header: str, env) -> bool:
    """Validate username/password from Authorization header."""
    credentials = parse_basic_auth(auth_header)
    return (credentials.username == env.APT_USER and 
            credentials.password == env.APT_PASS)

The Package Serving Logic

Here’s how the worker fetches packages from GitHub:

# src/packages.py
async def serve_package(path: str, env, github_token: str):
    """Serve .deb file from GitHub Releases."""
    
    # Parse: /pool/main/a/awesome_1.0.0_amd64.deb
    # Extract: owner, repo, version, filename
    
    # Fetch from GitHub Release Assets
    release_url = f"https://api.github.com/repos/{owner}/{repo}/releases/tags/v{version}"
    release = await github_fetch(release_url, token=github_token)
    
    # Find the matching asset
    asset = next(a for a in release.assets if a.name == filename)
    
    # Stream directly to client
    return Response.new(asset.body, headers={
        "Content-Type": "application/x-debian-package",
        "Content-Length": str(asset.size)
    })

The worker acts as a proxy — it doesn’t store packages, just streams them from GitHub.

Design Considerations

Why Git Branch for Metadata?

Using a Git branch (apt-metadata) for metadata storage means:

Versioning — every index update is a commit
Audit trail — who changed what, when
No external storage — no database, no R2, just Git
Easy rollback — git revert to go back

Why Not Store Packages in Git?

Git LFS could work, but:

Release assets are already in GitHub Releases
Streaming from Releases avoids git clone overhead
Separates metadata from binary storage

Metadata Freshness

The metadata is generated on release. There’s no on-the-fly generation:

  flowchart LR
    Dev[Developer releases v1.2.3] --> CI[CI runs]
    CI --> Branch[Metadata pushed to apt-metadata branch]
    Branch --> Worker[Worker fetches from apt-metadata branch]
    Apt[apt client requests] --> Worker

This trades freshness for simplicity. If you need real-time, you’d need a different architecture.

Security Model

  sequenceDiagram
    participant APT as apt client
    participant Worker as Cloudflare Worker
    participant GitHub as GitHub API
    
    APT->>Worker: GET /public.key
    Worker-->>APT: GPG public key
    
    APT->>Worker: GET /dists/stable/Release (Basic Auth)
    alt GitHub Actions OIDC
        Worker->>Worker: Validate OIDC token
        Worker->>GitHub: Fetch metadata from apt-metadata
        Worker-->>APT: APT metadata
    else Basic Auth
        Worker->>Worker: Validate username/password
        Worker->>GitHub: Fetch metadata from apt-metadata
        Worker-->>APT: APT metadata
    end
    
    APT->>Worker: GET /pool/main/a/awesome_1.2.3.deb (Auth)
    Worker->>GitHub: Stream .deb from Release
    Worker-->>APT: .deb package

The client must authenticate for both metadata and package downloads. The GPG key is public and unauthenticated.

Constraints

Aspect	Limit
CPU Time	30s per request
Memory	128 MB
Bandwidth	Edge network to GitHub
Auth Rate	GitHub API limits apply

For high-volume scenarios, add caching headers. The worker already caches GitHub tokens — package caching could be added similarly.

On-Prem Deployment

Just like the Terraform registry, this runs on workerd:

# docker-compose.yml
services:
  apt-worker:
    image: cloudflare/workerd:latest
    ports:
      - "8787:8787"
    volumes:
      - ./config.workerd:/etc/workerd/config.capnp:ro
    environment:
      - APT_USER=admin
      - APT_PASS_FILE=/run/secrets/apt_password

# wrangler.toml for on-prem
name = "apt-repository"
main = "src/entry.py"

[vars]
GITHUB_OWNER = "your-org"
ALLOWED_ORGS = ["your-org"]

[secrets]
APT_USER = "admin"
# APT_PASS via secret put

Usage

# Add the repository
echo "deb [trusted=yes] https://apt.example.com stable main" | \
    sudo tee /etc/apt/sources.list.d/your-repo.list

# Add GPG key (unauthenticated)
curl -fsSL https://apt.example.com/public.key | \
    sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/your-repo.gpg

# Update and install
sudo apt update
sudo apt install my-internal-tool

The [trusted=yes] is needed because we’re self-signing. In production, you’d want proper GPG setup without the trusted flag.

What Most People Get Wrong

“Metadata refreshes on every request” — No. Metadata is generated on release, stored in Git. Fresh on release, stale until next release.
“Packages live in the worker” — They’re in GitHub Releases. The worker proxies them. No storage cost at the edge.
“OIDC is more complex than API tokens” — For CI systems, OIDC tokens are ephemeral and rotate automatically. Fewer secrets to manage.

When to Use / When NOT to Use

Use Private APT	Use Public Mirror
Air-gapped environments	Internet-connected systems
Custom packages	Standard OS packages
Version pinning required	Rolling releases OK

What’s Next

Both the Terraform registry and APT repository share the same architectural DNA:

Serverless on Cloudflare
Git-backed persistence
Optional on-prem via workerd
No external databases or storage