Build disposable cloud development environments with Docker, automatic TTL cleanup, and orchestration. Complete guide to spawning and destroying workstations on demand.
The Rise of Ephemeral Development
Gartner predicts that by 2026, 60% of cloud workloads will be built and deployed using cloud development environments. The era of "works on my machine" is ending.
Ephemeral workstations—disposable development environments that spin up instantly and destroy themselves—are transforming how teams develop software.
Companies like Spotify, Bloomberg, and Anthropic report up to 50% productivity increases after adopting ephemeral development environments.
What You'll Build
In this tutorial, we'll build a complete ephemeral workstation system from scratch. By the end, you'll have:
- Instant provisioning via Docker containers
- Automatic destruction after 1 hour (configurable TTL)
- WebSocket terminal access from the browser
- Resource limits for safe multi-tenancy
- Session persistence options
Prerequisites: Basic knowledge of Docker, TypeScript, and Node.js. You should have Docker installed on your development machine.
Architecture Overview
Before we dive into code, let's understand how all the pieces fit together:
How it works:
- User requests a workstation → The Web UI sends an authenticated request to our API
- Orchestrator spawns a container → Docker creates an isolated container with dev tools
- WebSocket bridge connects → xterm.js in the browser connects to the container's shell
- TTL timer starts → A cleanup job tracks when the container should be destroyed
- Auto-destruction → When TTL expires, the container is gracefully stopped and removed
Part 1: The Workstation Container
First, we need to create a Docker image that contains all the development tools your team needs. This image will be the foundation for every ephemeral workstation.
Understanding the Dockerfile
1# Dockerfile.workstation2FROM ubuntu:24.0434# Avoid interactive prompts during package installation5ENV DEBIAN_FRONTEND=noninteractive67# Install essential development tools8RUN apt-get update && apt-get install -y \9 curl \10 git \11 vim \12 neovim \13 tmux \14 zsh \15 build-essential \16 python3 \17 python3-pip \18 sudo \19 openssh-server \20 && rm -rf /var/lib/apt/lists/*2122# Install Node.js23RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \24 && apt-get install -y nodejs2526# Install Docker CLI (for Docker-in-Docker workflows)27RUN curl -fsSL https://get.docker.com | sh2829# Create non-root user30RUN useradd -m -s /bin/zsh developer \31 && echo "developer ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers3233# Install Oh My Zsh for the user34USER developer35WORKDIR /home/developer36RUN sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended3738# Switch back to root for entrypoint39USER root4041# Copy entrypoint script42COPY entrypoint.sh /entrypoint.sh43RUN chmod +x /entrypoint.sh4445EXPOSE 22 3000-39994647ENTRYPOINT ["/entrypoint.sh"]Let's break down what each section does:
| Section | Purpose |
|---|---|
FROM ubuntu:24.04 | Uses Ubuntu as our base—stable and well-supported |
DEBIAN_FRONTEND=noninteractive | Prevents apt from asking questions during build |
| Development tools | Installs git, vim, tmux, zsh—customize for your team |
| Node.js | Adds JavaScript/TypeScript runtime |
| Docker CLI | Enables Docker-in-Docker for container workflows |
| Non-root user | Creates developer user for security (never run as root!) |
| Oh My Zsh | Better shell experience with syntax highlighting |
| Exposed ports | SSH (22) and common dev ports (3000-3999) |
The Entrypoint Script
The entrypoint runs when the container starts. It sets up the environment and keeps the container alive:
1#!/bin/bash2# entrypoint.sh34# Start SSH server for remote connections5service ssh start67# Set up workspace directory with correct permissions8mkdir -p /home/developer/workspace9chown developer:developer /home/developer/workspace1011# Keep container running indefinitely12# This allows the container to stay alive while we connect via exec13exec tail -f /dev/nullWhy `tail -f /dev/null`? Docker containers exit when their main process exits. By running tail -f /dev/null, we keep a process running that does nothing but prevents the container from stopping. This lets us docker exec into it whenever we want.
Build and Test the Image
1# Build the workstation image2docker build -t workstation:latest -f Dockerfile.workstation .34# Test it manually5docker run -d --name test-workstation workstation:latest6docker exec -it test-workstation su - developer7# You should now be in a zsh shell as the developer user89# Clean up10docker rm -f test-workstationPart 2: Container Orchestration
The orchestrator is the brain of our system. It manages the complete lifecycle of workstations—creating them, tracking them, and destroying them when their time is up.
Setting Up the Project
1# Initialize a new Node.js project2mkdir workstation-orchestrator && cd workstation-orchestrator3npm init -y4npm install dockerode typescript @types/node5npx tsc --initUnderstanding the Orchestrator
Let's build the orchestrator step by step:
1// src/orchestrator.ts2import Docker from 'dockerode';3import { randomUUID } from 'crypto';45// Initialize Docker client - connects to local Docker daemon6const docker = new Docker();What is dockerode? It's a Node.js library that wraps the Docker API, letting us create, manage, and destroy containers programmatically.
Defining Our Data Types
1// Configuration for creating a new workstation2interface WorkstationConfig {3 userId: string; // Who owns this workstation4 ttlMinutes: number; // How long before auto-destruction (e.g., 60)5 memoryMB: number; // RAM limit (e.g., 2048 for 2GB)6 cpuCores: number; // CPU limit (e.g., 1.0)7}89// Represents a running workstation10interface Workstation {11 id: string; // Our unique ID (UUID)12 containerId: string; // Docker's container ID13 userId: string; // Owner14 createdAt: Date; // When it was spawned15 expiresAt: Date; // When it will be destroyed16 status: 'starting' | 'running' | 'stopping' | 'stopped';17}1819// In-memory store for tracking workstations20// IMPORTANT: Use Redis or a database in production!21const workstations = new Map<string, Workstation>();Why track workstations ourselves? Docker tracks containers, but we need additional metadata like TTL expiration, user ownership, and our custom status. This lets us build features like "list my workstations" or "time remaining until destruction."
The Spawn Function
This is the core function that creates a new workstation:
1export async function spawnWorkstation(2 config: WorkstationConfig3): Promise<Workstation> {4 // Generate a unique ID for this workstation5 const id = randomUUID();67 // Calculate when this workstation should be destroyed8 const expiresAt = new Date(Date.now() + config.ttlMinutes * 60 * 1000);910 // Create the Docker container with security and resource limits11 const container = await docker.createContainer({12 Image: 'workstation:latest',13 name: `workstation-${id}`,1415 HostConfig: {16 // Memory limit in bytes17 Memory: config.memoryMB * 1024 * 1024,1819 // CPU limit in nanoseconds (1 core = 1e9 nanoseconds)20 NanoCpus: config.cpuCores * 1e9,2122 // Auto-remove container when it stops (cleanup)23 AutoRemove: true,2425 // Use bridge networking for isolation26 NetworkMode: 'bridge',2728 // SECURITY: Drop all Linux capabilities by default29 CapDrop: ['ALL'],3031 // Only add back the minimum required capabilities32 CapAdd: ['CHOWN', 'SETUID', 'SETGID'],3334 // Prevent privilege escalation attacks35 SecurityOpt: ['no-new-privileges'],36 },3738 // Labels for querying and cleanup39 Labels: {40 'workstation.id': id,41 'workstation.user': config.userId,42 'workstation.expires': expiresAt.toISOString(),43 },4445 // Environment variables available inside the container46 Env: [47 `WORKSTATION_ID=${id}`,48 `USER_ID=${config.userId}`,49 ],50 });5152 // Start the container53 await container.start();5455 // Create our workstation record56 const workstation: Workstation = {57 id,58 containerId: container.id,59 userId: config.userId,60 createdAt: new Date(),61 expiresAt,62 status: 'running',63 };6465 // Track it in our store66 workstations.set(id, workstation);6768 // Schedule automatic destruction69 scheduleDestruction(id, config.ttlMinutes);7071 return workstation;72}Security Deep Dive:
| Setting | What It Does | Why It Matters |
|---|---|---|
CapDrop: ['ALL'] | Removes all Linux capabilities | Prevents container from doing privileged operations |
CapAdd: ['CHOWN', 'SETUID', 'SETGID'] | Adds back only what's needed | Allows changing file ownership and switching users |
no-new-privileges | Prevents privilege escalation | Even if code runs setuid, it can't gain root |
AutoRemove: true | Deletes container on stop | No orphaned containers wasting disk space |
The Destroy Function
When TTL expires or a user requests it, we need to gracefully destroy the workstation:
1export async function destroyWorkstation(id: string): Promise<void> {2 const workstation = workstations.get(id);3 if (!workstation) return;45 try {6 const container = docker.getContainer(workstation.containerId);78 // Graceful stop: send SIGTERM, wait 10 seconds, then SIGKILL9 // This gives processes time to save state and exit cleanly10 await container.stop({ t: 10 });1112 // Remove the container (might already be removed due to AutoRemove)13 await container.remove({ force: true });14 } catch (error) {15 // Container might already be gone - that's OK16 console.error(`Error destroying workstation ${id}:`, error);17 } finally {18 // Always remove from our tracking, even if Docker operations failed19 workstations.delete(id);20 }21}Why graceful shutdown? If a developer is in the middle of saving a file when TTL expires, we want to give their editor time to finish. The 10-second timeout balances user experience with resource cleanup.
Scheduling Destruction
1function scheduleDestruction(id: string, minutes: number): void {2 setTimeout(async () => {3 console.log(`TTL expired for workstation ${id}, destroying...`);4 await destroyWorkstation(id);5 }, minutes * 60 * 1000);6}Important caveat: This simple setTimeout approach works for single-server deployments. For production with multiple servers, use a distributed scheduler like:
- Redis-based job queues (Bull, BullMQ)
- Kubernetes TTL controllers
- AWS Step Functions with Wait states
Query Functions
1// Get a specific workstation by ID2export async function getWorkstation(id: string): Promise<Workstation | null> {3 return workstations.get(id) || null;4}56// List all workstations for a user7export async function listUserWorkstations(8 userId: string9): Promise<Workstation[]> {10 return Array.from(workstations.values())11 .filter(w => w.userId === userId);12}Part 3: WebSocket Terminal Bridge
Now we need to connect the browser to the container's shell. This is where the magic happens—users type in their browser, and keystrokes flow into the container.
How the Bridge Works
The Terminal Bridge Implementation
1// src/terminal-bridge.ts2import Docker from 'dockerode';3import { WebSocket } from 'ws';45const docker = new Docker();67export async function attachTerminal(8 containerId: string,9 ws: WebSocket10): Promise<void> {11 // Get reference to the container12 const container = docker.getContainer(containerId);1314 // Create an "exec" instance - this is like running docker exec15 const exec = await container.exec({16 Cmd: ['/bin/zsh'], // Command to run (the shell)17 AttachStdin: true, // We'll send input18 AttachStdout: true, // We want output19 AttachStderr: true, // We want error output too20 Tty: true, // Allocate a pseudo-TTY (enables colors, etc.)21 User: 'developer', // Run as our non-root user22 WorkingDir: '/home/developer/workspace', // Start in workspace23 });2425 // Start the exec and get a bidirectional stream26 const stream = await exec.start({27 hijack: true, // Take over the connection for raw data28 stdin: true, // Enable stdin29 Tty: true, // TTY mode30 });3132 // PIPE 1: Container output → WebSocket → Browser33 stream.on('data', (chunk: Buffer) => {34 // Only send if WebSocket is still open35 if (ws.readyState === WebSocket.OPEN) {36 // Send as JSON message with type for client parsing37 ws.send(JSON.stringify({38 type: 'output',39 data: chunk.toString()40 }));41 }42 });4344 // PIPE 2: Browser → WebSocket → Container input45 ws.on('message', (message: Buffer) => {46 try {47 const msg = JSON.parse(message.toString());4849 if (msg.type === 'input') {50 // User typed something - send to container51 stream.write(msg.data);52 } else if (msg.type === 'resize') {53 // Terminal window resized - adjust PTY size54 exec.resize({ h: msg.rows, w: msg.cols });55 }56 } catch (error) {57 console.error('Error processing message:', error);58 }59 });6061 // CLEANUP: When WebSocket closes, end the stream62 ws.on('close', () => {63 stream.end();64 });6566 // CLEANUP: When stream ends, close WebSocket67 stream.on('end', () => {68 ws.close();69 });70}Key concepts explained:
| Concept | What It Means |
|---|---|
docker exec | Runs a command inside a running container |
TTY | Pseudo-terminal—makes the shell think it's connected to a real terminal |
hijack: true | Raw mode—bytes flow directly without HTTP framing |
resize | When user resizes browser window, we tell the shell to adjust columns/rows |
Message Protocol
The browser and server communicate using a simple JSON protocol:
1// Browser → Server messages2{ type: 'input', data: 'ls -la\n' } // User typed command3{ type: 'resize', cols: 120, rows: 40 } // Terminal resized45// Server → Browser messages6{ type: 'output', data: 'file1.txt file2.txt\n' } // Shell outputPart 4: Kubernetes TTL Controller
For production deployments at scale, Docker alone isn't enough. Kubernetes provides built-in TTL management that's more robust than our setTimeout approach.
Why Kubernetes for Production?
Kubernetes Job with TTL
In Kubernetes, we use Jobs instead of raw containers. Jobs have built-in TTL support:
1# workstation-job.yaml2apiVersion: batch/v13kind: Job4metadata:5 name: workstation-WORKSTATION_ID6 labels:7 app: workstation8 user: USER_ID9spec:10 # KEY FEATURE: Automatically delete job 1 hour after it completes11 # This handles cleanup even if our orchestrator crashes12 ttlSecondsAfterFinished: 36001314 # Don't retry on failure - just let it die15 backoffLimit: 01617 template:18 metadata:19 labels:20 app: workstation21 workstation-id: WORKSTATION_ID22 spec:23 # Never restart the container24 restartPolicy: Never2526 containers:27 - name: workstation28 image: workstation:latest2930 # Resource requests and limits31 resources:32 requests:33 memory: "512Mi" # Minimum guaranteed memory34 cpu: "250m" # 0.25 CPU cores minimum35 limits:36 memory: "2Gi" # Maximum 2GB RAM37 cpu: "1000m" # Maximum 1 CPU core3839 ports:40 - containerPort: 224142 # Security context43 securityContext:44 runAsNonRoot: true # Must run as non-root45 runAsUser: 1000 # UID of developer user46 allowPrivilegeEscalation: false47 capabilities:48 drop:49 - ALL # Drop all capabilitiesWhat `ttlSecondsAfterFinished` does: Kubernetes will automatically garbage-collect the Job (and its Pod) 3600 seconds after it completes. This is cluster-level TTL—it works even if your orchestrator crashes.
Active Deadline for Running Time Limit
The TTL above only kicks in after the job finishes. What if we want to kill it while it's still running?
1spec:2 # HARD LIMIT: Kill the workstation after 1 hour regardless of state3 # This prevents runaway workstations that never finish4 activeDeadlineSeconds: 360056 # Combined with ttlSecondsAfterFinished for full lifecycle:7 # - activeDeadlineSeconds: Forces completion at 1 hour8 # - ttlSecondsAfterFinished: Cleans up 1 hour after completion9 ttlSecondsAfterFinished: 3600Part 5: Container Lifecycle State Machine
To build a robust system, we need to think about all the states a workstation can be in and how it transitions between them.
Implementing the State Machine
1// Define all possible states2type WorkstationState =3 | 'pending' // Request received, waiting for resources4 | 'creating' // Container is being created5 | 'running' // Container is running, user can connect6 | 'stopping' // Graceful shutdown in progress7 | 'stopped' // Container removed successfully8 | 'failed'; // Something went wrong910// Define valid state transitions11interface StateTransition {12 from: WorkstationState;13 to: WorkstationState;14 action: string; // What triggered this transition15}1617// Only these transitions are allowed18const validTransitions: StateTransition[] = [19 { from: 'pending', to: 'creating', action: 'start_creation' },20 { from: 'pending', to: 'failed', action: 'resource_unavailable' },21 { from: 'creating', to: 'running', action: 'container_ready' },22 { from: 'creating', to: 'failed', action: 'creation_error' },23 { from: 'running', to: 'stopping', action: 'ttl_expired' },24 { from: 'running', to: 'stopping', action: 'user_stop' },25 { from: 'running', to: 'failed', action: 'health_check_failed' },26 { from: 'stopping', to: 'stopped', action: 'cleanup_complete' },27];2829// Check if a transition is valid30function canTransition(31 current: WorkstationState,32 target: WorkstationState33): boolean {34 return validTransitions.some(35 t => t.from === current && t.to === target36 );37}Why use a state machine? It prevents bugs like:
- Trying to connect to a workstation that's still creating
- Destroying a workstation twice
- Transitioning from 'stopped' back to 'running'
Part 6: Resource Pool Management
Cold-starting containers takes time (typically 2-5 seconds). For instant workstation provisioning, we can pre-warm a pool of containers.
How Pooling Works
Pool Manager Implementation
1// src/pool-manager.ts2interface PoolConfig {3 minSize: number; // Always keep this many warm containers4 maxSize: number; // Never exceed this many total containers5 warmupCount: number; // How many to create at startup6}78class WorkstationPool {9 private available: string[] = []; // Container IDs ready to use10 private inUse: Map<string, string> = new Map(); // workstationId -> containerId1112 constructor(private config: PoolConfig) {}1314 // Initialize pool at startup15 async initialize(): Promise<void> {16 console.log(`Initializing pool with ${this.config.warmupCount} containers...`);1718 // Create warmup containers in parallel for speed19 const promises = [];20 for (let i = 0; i < this.config.warmupCount; i++) {21 promises.push(this.createWarmContainer());22 }2324 const containerIds = await Promise.all(promises);25 this.available.push(...containerIds);2627 console.log(`Pool initialized with ${this.available.length} containers`);28 }2930 // Get a container for a workstation (instant!)31 async acquire(workstationId: string): Promise<string> {32 // Try to get from pool first33 let containerId = this.available.pop();3435 if (!containerId) {36 // Pool empty - check if we can create more37 if (this.inUse.size >= this.config.maxSize) {38 throw new Error('Pool exhausted - maximum containers reached');39 }4041 // Create a new one (slower path)42 containerId = await this.createWarmContainer();43 }4445 // Track that this container is now in use46 this.inUse.set(workstationId, containerId);4748 // Replenish pool in background (don't await)49 this.replenish();5051 return containerId;52 }5354 // Return a container when workstation is destroyed55 async release(workstationId: string): Promise<void> {56 const containerId = this.inUse.get(workstationId);57 if (!containerId) return;5859 this.inUse.delete(workstationId);6061 // IMPORTANT: Destroy the container, don't reuse it!62 // Reusing containers is a security risk - previous user's data might remain63 await this.destroyContainer(containerId);64 }6566 // Background replenishment67 private async replenish(): Promise<void> {68 while (this.available.length < this.config.minSize) {69 const containerId = await this.createWarmContainer();70 this.available.push(containerId);71 }72 }7374 private async createWarmContainer(): Promise<string> {75 const container = await docker.createContainer({76 Image: 'workstation:latest',77 // Created but not started - ready to go78 });79 return container.id;80 }8182 private async destroyContainer(containerId: string): Promise<void> {83 const container = docker.getContainer(containerId);84 await container.remove({ force: true });85 }86}Security note: We destroy containers after use instead of recycling them. Recycling could leak data between users—one developer might see another's files or environment variables.
Part 7: Cleanup Strategies
Even with TTL and pooling, things can go wrong. Containers might get orphaned if our orchestrator crashes. We need a backup cleanup system.
Cron-Based Cleanup
This runs every 5 minutes and cleans up any expired workstations:
1// src/cleanup.ts2import cron from 'node-cron';3import Docker from 'dockerode';45const docker = new Docker();67// Run every 5 minutes8cron.schedule('*/5 * * * *', async () => {9 console.log('Running workstation cleanup scan...');1011 // Get all containers with our label12 const containers = await docker.listContainers({13 all: true, // Include stopped containers14 filters: {15 label: ['workstation.id'], // Only our workstations16 },17 });1819 const now = new Date();20 let cleaned = 0;2122 for (const containerInfo of containers) {23 // Check if this container has expired24 const expires = containerInfo.Labels['workstation.expires'];25 if (!expires) continue;2627 const expiresAt = new Date(expires);28 if (now > expiresAt) {29 console.log(`Cleaning up expired workstation: ${containerInfo.Id.slice(0, 12)}`);3031 const container = docker.getContainer(containerInfo.Id);3233 // Try graceful stop first34 await container.stop({ t: 10 }).catch(() => {35 // Might already be stopped36 });3738 // Force remove39 await container.remove({ force: true });40 cleaned++;41 }42 }4344 console.log(`Cleanup complete. Removed ${cleaned} expired workstations.`);45});Why labels? By labeling our containers with workstation.id and workstation.expires, the cleanup job can find and evaluate them without needing access to our orchestrator's database.
Kubernetes CronJob Alternative
For Kubernetes deployments:
1apiVersion: batch/v12kind: CronJob3metadata:4 name: workstation-cleanup5spec:6 schedule: "*/5 * * * *" # Every 5 minutes7 jobTemplate:8 spec:9 template:10 spec:11 restartPolicy: OnFailure12 serviceAccountName: workstation-cleanup # Needs pod delete permissions13 containers:14 - name: cleanup15 image: bitnami/kubectl:latest16 command:17 - /bin/sh18 - -c19 - |20 # Delete completed workstation pods21 kubectl get pods -l app=workstation \22 --field-selector=status.phase=Succeeded \23 -o name | xargs -r kubectl delete2425 # Delete failed workstation pods26 kubectl get pods -l app=workstation \27 --field-selector=status.phase=Failed \28 -o name | xargs -r kubectl deletePart 8: Security Considerations
Security isn't optional for workstations—you're giving users shell access to your infrastructure.
Defense in Depth
| Layer | Implementation | What It Prevents |
|---|---|---|
| Process | Non-root user, dropped capabilities | Kernel exploits, privilege escalation |
| Network | Isolated bridge network | Access to other containers, host network |
| Filesystem | Read-only root, tmpfs workspace | Persistent malware, disk exhaustion |
| Resources | CPU/memory limits, PID limits | Fork bombs, resource starvation |
| Time | TTL enforcement, active deadlines | Orphaned resources, crypto mining |
Production Security Configuration
1const securityConfig = {2 HostConfig: {3 // RESOURCE LIMITS4 Memory: 2 * 1024 * 1024 * 1024, // 2GB max5 NanoCpus: 1 * 1e9, // 1 CPU max6 PidsLimit: 100, // Max 100 processes (stops fork bombs)78 // FILESYSTEM9 ReadonlyRootfs: true, // Can't modify system files10 Tmpfs: {11 // Writable workspace in memory12 '/home/developer/workspace': 'rw,size=1g',13 '/tmp': 'rw,size=500m',14 },1516 // CAPABILITIES17 CapDrop: ['ALL'], // Remove all capabilities18 CapAdd: ['CHOWN', 'SETUID', 'SETGID'], // Add only what's needed19 SecurityOpt: ['no-new-privileges'], // Prevent escalation2021 // NETWORK22 NetworkMode: 'workstation-network', // Isolated network23 },24};Performance Results
Companies using ephemeral environments report significant improvements across key metrics:
| Metric | Improvement |
|---|---|
| Onboarding time | 90% reduction (weeks to minutes) |
| Dev productivity | 50% increase |
| Cloud costs | 50% reduction |
| Deployment frequency | 10x increase |
Source: Coder State of Development Environments 2025
Putting It All Together
Here's how all the pieces connect:
Request flow:
- User clicks "New Workstation" → REST API
- API calls Orchestrator → Orchestrator gets container from Pool
- Pool provides pre-warmed container → Orchestrator configures and starts it
- User connects terminal → WebSocket Server
- WebSocket bridges to container shell
- TTL expires → Cleanup destroys container
Brisbane Cloud Development Services
At Buun Group, we help Queensland businesses build cloud development infrastructure:
- Custom workstation images for your tech stack
- Kubernetes orchestration with auto-scaling
- Browser-based IDEs with integrated terminals
- Secure multi-tenant environments
We've deployed ephemeral workstation systems for teams across Australia.
Ready to modernize your dev environments?
Topics
Comments
Sign in to join the conversation
LoginNo comments yet. Be the first to share your thoughts!
Found an issue with this article?
