Building Scalable SaaS Platforms: Technical Guide from 0 to 100K Users | Isaac Paha

Introduction: The Scale Challenge

When I started building my first platform in 2023, I never imagined it would eventually serve over 100,000 users across three different companies. What began as simple web applications have evolved into robust, scalable SaaS platforms powering businesses across the UK and Ghana.

This journey has taught me invaluable lessons about scalability, performance, and the technical decisions that can make or break a growing platform. In this deep dive, I'll share the architectural patterns, technology choices, and lessons learned from scaling platforms that now process thousands of transactions daily.

The Foundation: Technology Stack Decisions

Choosing the right technology stack is crucial for long-term scalability. Here's the evolution of my technology choices and the reasoning behind each decision:

Frontend Architecture

// Initial Stack (2023)
Frontend: React.js + Create React App
State: Local component state
Styling: Basic CSS

// Current Stack (2024-2025)
Frontend: Next.js 14 with App Router
State: Zustand + React Query
Styling: Tailwind CSS + shadcn/ui
Type Safety: TypeScript

Why Next.js? The transition from Create React App to Next.js was driven by several factors:

Server-Side Rendering (SSR): Critical for SEO, especially for oKadwuma's job listings
API Routes: Simplified backend architecture for smaller services
Image Optimization: Automatic optimization reduced load times by 40%
Bundle Optimization: Automatic code splitting improved performance

Backend Architecture Evolution

// Monolithic Start (2023)
- Single Node.js + Express server
- SQLite database
- File-based storage

// Microservices Transition (2024)
- Service-oriented architecture
- API Gateway (Kong)
- Container orchestration (Docker + Kubernetes)
- Database per service pattern

// Current Architecture (2025)
- Event-driven microservices
- Message queues (Redis + Bull)
- Distributed caching
- Multi-region deployment

Database Strategy

Database architecture has been one of the most critical scaling decisions:

Phase 1: Single Database (0-1K users)

// Simple setup
- MySQL 8.0
- Single read/write instance
- Basic indexing

Phase 2: Read Replicas (1K-10K users)

// Introduced read scaling
- Master-slave configuration
- Read replicas for queries
- Write operations to master only

// Example configuration
const dbConfig = {
  master: {
    host: 'master.db.cluster',
    user: 'admin',
    database: 'production'
  },
  slaves: [
    { host: 'slave1.db.cluster' },
    { host: 'slave2.db.cluster' }
  ]
};

Phase 3: Sharding & Distribution (10K+ users)

// Database sharding strategy
const getShardKey = (userId) => {
  return userId % TOTAL_SHARDS;
};

const getDatabase = (shardKey) => {
  // Use a regular string instead of a template literal in Markdown code blocks to avoid TypeScript parsing errors
  return databases['shard_' + shardKey];
};

// Geographic distribution
const getRegionalDB = (userLocation) => {
  return userLocation.startsWith('GH') ? 
    'ghana_cluster' : 'uk_cluster';
};

Scaling Challenges and Solutions

Challenge 1: Database Performance Bottlenecks

The Problem: As oKadwuma grew to 10,000+ users, database queries became increasingly slow, especially job search functionality.

Symptoms:

Search queries taking 3-5 seconds
High CPU usage on database server
User complaints about slow loading

The Solution:

// Before: Inefficient query
SELECT * FROM jobs 
WHERE LOWER(title) LIKE CONCAT('%', ?, '%') 
OR LOWER(description) LIKE CONCAT('%', ?, '%')
ORDER BY created_at DESC;

// After: Optimized with full-text search
SELECT j.*, 
       MATCH(title, description) AGAINST(? IN NATURAL LANGUAGE MODE) as relevance
FROM jobs j
WHERE MATCH(title, description) AGAINST(? IN NATURAL LANGUAGE MODE)
ORDER BY relevance DESC, created_at DESC
LIMIT 20 OFFSET ?;

// Added proper indexing
ALTER TABLE jobs ADD FULLTEXT(title, description);
CREATE INDEX idx_jobs_location_created ON jobs(location, created_at);
CREATE INDEX idx_jobs_category_salary ON jobs(category, salary_min);

Results: Query time reduced from 3-5 seconds to 200-400ms, supporting 10x more concurrent users.

Challenge 2: Real-time Notifications at Scale

The Problem: With thousands of users expecting real-time job alerts and marketplace notifications, our simple polling system was overwhelming the servers.

Evolution of Notification System:

Version 1: Database Polling

// Inefficient polling approach
setInterval(() => {
  fetch('/api/check-notifications')
    .then(response => response.json())
    .then(notifications => {
      updateUI(notifications);
    });
}, 30000); // Check every 30 seconds

Version 2: WebSocket Implementation

// Real-time WebSocket solution
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

// Connection management
const userConnections = new Map();

wss.on('connection', (ws, req) => {
  const userId = getUserIdFromAuth(req);
  userConnections.set(userId, ws);
  
  ws.on('close', () => {
    userConnections.delete(userId);
  });
});

// Notification broadcasting
const broadcastToUser = (userId, notification) => {
  const connection = userConnections.get(userId);
  if (connection && connection.readyState === WebSocket.OPEN) {
    connection.send(JSON.stringify(notification));
  }
};

Version 3: Event-Driven Architecture

// Current approach using Redis and message queues
const notificationService = {
  async sendJobAlert(userId, jobData) {
    // Publish event
    await redis.publish('job-alerts', JSON.stringify({
      userId,
      type: 'job_alert',
      data: jobData,
      timestamp: Date.now()
    }));
  },

  async sendOrderUpdate(userId, orderData) {
    await redis.publish('order-updates', JSON.stringify({
      userId,
      type: 'order_update',
      data: orderData,
      timestamp: Date.now()
    }));
  }
};

// Subscriber service
redis.subscribe('job-alerts', 'order-updates');
redis.on('message', (channel, message) => {
  const notification = JSON.parse(message);
  broadcastToUser(notification.userId, notification);
});

Challenge 3: Payment Processing Reliability

The Problem: okDdwa's e-commerce platform required handling payments across multiple providers (Stripe for international, Mobile Money for Ghana) with 99.9% reliability.

Solution: Robust Payment Architecture

// Payment abstraction layer
class PaymentProcessor {
  constructor() {
    this.providers = {
      stripe: new StripeProvider(),
      momo: new MobileMoneyProvider(),
      bank: new BankTransferProvider()
    };
  }

  async processPayment(paymentData) {
    const provider = this.selectProvider(paymentData);
    
    try {
      // Attempt primary payment
      const result = await this.providers[provider].charge(paymentData);
      
      // Log successful transaction
      await this.logTransaction(result, 'success');
      
      return result;
    } catch (error) {
      // Fallback to secondary provider
      const fallbackProvider = this.getFallbackProvider(provider);
      
      if (fallbackProvider) {
        try {
          const result = await this.providers[fallbackProvider].charge(paymentData);
          await this.logTransaction(result, 'success_fallback');
          return result;
        } catch (fallbackError) {
          await this.logTransaction(paymentData, 'failed', fallbackError);
          throw new PaymentFailedError('All payment methods failed');
        }
      }
      
      throw error;
    }
  }

  selectProvider(paymentData) {
    // Intelligent provider selection
    if (paymentData.currency === 'GHS') return 'momo';
    if (paymentData.amount > 10000) return 'bank';
    return 'stripe';
  }
}

Performance Optimization Strategies

Caching Implementation

Implementing a multi-layer caching strategy was crucial for performance:

// 1. Browser caching with proper headers
app.use((req, res, next) => {
  if (req.url.match(/.(css|js|png|jpg|jpeg|gif|svg)$/)) {
    res.setHeader('Cache-Control', 'public, max-age=31536000'); // 1 year
  } else if (req.url.includes('/api/')) {
    res.setHeader('Cache-Control', 'no-cache');
  }
  next();
});

// 2. Redis caching for API responses
const cacheMiddleware = (duration = 300) => {
  return async (req, res, next) => {
    const key = 'cache:' + req.originalUrl;
    const cached = await redis.get(key);
    
    if (cached) {
      return res.json(JSON.parse(cached));
    }
    
    res.sendResponse = res.json;
    res.json = (body) => {
      redis.setex(key, duration, JSON.stringify(body));
      res.sendResponse(body);
    };
    
    next();
  };
};

// 3. Database query caching
class QueryCache {
  static async getJobs(filters) {
    const cacheKey = 'jobs:' + JSON.stringify(filters);
    let jobs = await redis.get(cacheKey);
    
    if (!jobs) {
      jobs = await database.query(buildJobQuery(filters));
      await redis.setex(cacheKey, 300, JSON.stringify(jobs)); // 5 min cache
    } else {
      jobs = JSON.parse(jobs);
    }
    
    return jobs;
  }
}

Database Optimization Techniques

// Connection pooling
const mysql = require('mysql2/promise');

const pool = mysql.createPool({
  host: process.env.DB_HOST,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  database: process.env.DB_NAME,
  waitForConnections: true,
  connectionLimit: 20,
  queueLimit: 0,
  acquireTimeout: 60000,
  timeout: 60000
});

// Query optimization with explain plans
const analyzeQuery = async (query) => {
  const [rows] = await pool.execute('EXPLAIN ' + query);
  console.log('Query execution plan:', rows);
  
  // Alert if full table scan detected
  const hasFullScan = Array.isArray(rows) && rows.some((row: any) => row.type === 'ALL');
  if (hasFullScan) {
    console.warn('Full table scan detected! Consider adding indexes.');
  }
};

// Batch operations for better performance
const batchInsertUsers = async (users) => {
  const values = users.map(user => [user.name, user.email, user.phone]);
  const query = 'INSERT INTO users (name, email, phone) VALUES ?';
  
  await pool.execute(query, [values]);
};

Monitoring and Observability

Monitoring became crucial as the platforms grew. Here's our observability stack:

Application Performance Monitoring

// Custom metrics collection
class MetricsCollector {
  static async trackApiCall(endpoint, duration, statusCode) {
    const metric = {
      timestamp: Date.now(),
      endpoint,
      duration,
      statusCode,
      memory: process.memoryUsage(),
      cpu: process.cpuUsage()
    };
    
    // Send to monitoring service
    await this.sendMetric('api_call', metric);
    
    // Alert on slow responses
    if (duration > 2000) {
      await this.sendAlert('slow_response', metric);
    }
  }

  static async trackUserAction(userId, action, metadata = {}) {
    const event = {
      userId,
      action,
      metadata,
      timestamp: Date.now(),
      sessionId: metadata.sessionId
    };
    
    await this.sendMetric('user_action', event);
  }
}

// Usage in API routes
app.get('/api/jobs', async (req, res) => {
  const startTime = Date.now();
  
  try {
    const jobs = await JobService.getJobs(req.query);
    const duration = Date.now() - startTime;
    
    await MetricsCollector.trackApiCall('/api/jobs', duration, 200);
    res.json(jobs);
  } catch (error) {
    const duration = Date.now() - startTime;
    await MetricsCollector.trackApiCall('/api/jobs', duration, 500);
    throw error;
  }
});

Health Checks and Alerting

// Health check endpoint
app.get('/health', async (req, res) => {
  const health = {
    timestamp: Date.now(),
    status: 'healthy',
    checks: {}
  };

  // Database health
  try {
    await pool.execute('SELECT 1');
    health.checks.database = 'healthy';
  } catch (error) {
    health.checks.database = 'unhealthy';
    health.status = 'degraded';
  }

  // Redis health
  try {
    await redis.ping();
    health.checks.redis = 'healthy';
  } catch (error) {
    health.checks.redis = 'unhealthy';
    health.status = 'degraded';
  }

  // External services health
  try {
    await axios.get('https://api.stripe.com/v1/charges/limit=1', {
      headers: { Authorization: 'Bearer ' + process.env.STRIPE_SECRET },
      timeout: 5000
    });
    health.checks.stripe = 'healthy';
  } catch (error) {
    health.checks.stripe = 'unhealthy';
  }

  const statusCode = health.status === 'healthy' ? 200 : 503;
  res.status(statusCode).json(health);
});

Security at Scale

Security becomes more complex as you scale. Here are the key strategies I implemented:

Authentication and Authorization

// JWT with refresh token pattern
const generateTokens = (user) => {
  const accessToken = jwt.sign(
    { userId: user.id, role: user.role },
    process.env.JWT_SECRET,
    { expiresIn: '15m' }
  );

  const refreshToken = jwt.sign(
    { userId: user.id, tokenVersion: user.tokenVersion },
    process.env.REFRESH_SECRET,
    { expiresIn: '7d' }
  );

  return { accessToken, refreshToken };
};

// Rate limiting implementation
const rateLimit = require('express-rate-limit');

const apiLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // limit each IP to 100 requests per windowMs
  message: 'Too many requests from this IP',
  standardHeaders: true,
  legacyHeaders: false,
});

const strictLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 5, // stricter limit for sensitive operations
  skipSuccessfulRequests: true,
});

app.use('/api/', apiLimiter);
app.use('/api/auth/', strictLimiter);

Data Protection and Privacy

// Data encryption for sensitive information
const crypto = require('crypto');

class DataProtection {
  static encrypt(text) {
    const algorithm = 'aes-256-gcm';
    const key = Buffer.from(process.env.ENCRYPTION_KEY, 'hex');
    const iv = crypto.randomBytes(16);
    
    const cipher = crypto.createCipher(algorithm, key);
    cipher.setAAD(Buffer.from('additional-data'));
    
    let encrypted = cipher.update(text, 'utf8', 'hex');
    encrypted += cipher.final('hex');
    
    const tag = cipher.getAuthTag();
    
    return {
      encrypted,
      iv: iv.toString('hex'),
      tag: tag.toString('hex')
    };
  }

  static decrypt(encryptedData) {
    const algorithm = 'aes-256-gcm';
    const key = Buffer.from(process.env.ENCRYPTION_KEY, 'hex');
    
    const decipher = crypto.createDecipher(algorithm, key);
    decipher.setAAD(Buffer.from('additional-data'));
    decipher.setAuthTag(Buffer.from(encryptedData.tag, 'hex'));
    
    let decrypted = decipher.update(encryptedData.encrypted, 'hex', 'utf8');
    decrypted += decipher.final('utf8');
    
    return decrypted;
  }

  // GDPR compliance helpers
  static async anonymizeUserData(userId) {
    const anonymizedData = {
      name: 'User_' + crypto.randomBytes(4).toString('hex'),
      email: 'deleted_' + crypto.randomBytes(8).toString('hex') + '@deleted.com',
      phone: null,
      address: null,
      deletedAt: new Date()
    };

    await database.query(
      'UPDATE users SET ? WHERE id = ?',
      [anonymizedData, userId]
    );
  }
}

Deployment and DevOps at Scale

As the platforms grew, deployment strategies became increasingly important:

Container Orchestration

# Dockerfile for production
FROM node:18-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .
RUN npm run build

FROM node:18-alpine AS runner
WORKDIR /app

# Create non-root user
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs

COPY --from=builder /app/public ./public
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static

USER nextjs

EXPOSE 3000
ENV PORT 3000

CMD ["node", "server.js"]

# Kubernetes deployment configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: okadwuma-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: okadwuma-api
  template:
    metadata:
      labels:
        app: okadwuma-api
    spec:
      containers:
      - name: api
        image: okadwuma/api:latest
        ports:
        - containerPort: 3000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: database-url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: redis-url
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5

CI/CD Pipeline


# See .github/workflows/deploy.yml for the GitHub Actions workflow configuration.

Lessons Learned and Best Practices

1. Start Simple, Scale Gradually

The biggest mistake I made early on was over-engineering solutions before they were needed. Start with the simplest solution that works, then scale based on actual requirements.

2. Monitor Everything

You can't optimize what you don't measure. Implement comprehensive monitoring from day one:

Application performance metrics
Business metrics (user engagement, conversion rates)
Infrastructure metrics (CPU, memory, disk usage)
User experience metrics (page load times, error rates)

3. Plan for Failure

Systems will fail. Design for resilience:

Implement circuit breakers for external services
Use retry mechanisms with exponential backoff
Design graceful degradation strategies
Maintain comprehensive backup and recovery procedures

4. Security is Not Optional

Security considerations must be built in from the beginning:

Use HTTPS everywhere
Implement proper authentication and authorization
Validate and sanitize all input
Keep dependencies updated
Regular security audits

5. Documentation and Team Scaling

As your platforms grow, so does your team. Invest in:

Comprehensive API documentation
Code comments and architectural decision records
Onboarding processes for new developers
Standardized coding practices and style guides

Performance Metrics and Results

Here are the concrete results achieved through these scaling strategies:

oKadwuma.com Metrics

User Growth: 0 to 10,000+ active users in 12 months
Response Time: Average API response time under 400ms
Uptime: 99.9% availability over the past 6 months
Search Performance: Job search results in under 500ms
Mobile Performance: Page load times under 2 seconds on 3G

okDdwa.com Metrics

Transaction Volume: Processing $50,000+ monthly
Payment Success Rate: 99.5% payment completion rate
Vendor Growth: 1,200+ active vendors
Order Processing: Average order processing time under 30 seconds
Mobile Money Integration: 95% success rate for MoMo payments

iPaha Platform Metrics

Client Satisfaction: 150+ satisfied enterprise clients
System Reliability: 99.9% uptime across all client deployments
Performance: Sub-second response times for 95% of requests
Scalability: Successfully handling 10x traffic growth

Future Scaling Considerations

As we continue to grow, several areas require ongoing attention:

1. Global Expansion

// Multi-region deployment strategy
const regions = {
  'eu-west': {
    primary: true,
    database: 'eu-west-db-cluster',
    cdn: 'cloudflare-eu',
    users: ['UK', 'EU']
  },
  'africa-west': {
    primary: false,
    database: 'africa-west-db-cluster', 
    cdn: 'cloudflare-africa',
    users: ['GH', 'NG', 'SN']
  }
};

const routeRequest = (userCountry) => {
  const region = findOptimalRegion(userCountry);
  return regions[region];
};

2. AI and Machine Learning Integration

Implementing AI-powered features for better user experience:

Intelligent job matching algorithms
Personalized product recommendations
Automated customer support
Predictive analytics for business insights

3. Edge Computing

Moving computation closer to users for better performance:

CDN-based API responses
Edge-side caching strategies
Distributed database replicas
Regional processing nodes

Conclusion: The Journey Continues

Building scalable SaaS platforms serving 100,000+ users has been both challenging and rewarding. The journey from simple web applications to robust, distributed systems has taught me that scalability is not just about handling more users—it's about maintaining performance, reliability, and user experience while growing.

Key takeaways from this journey:

Technology is an enabler, not the solution. Focus on solving real problems for users.
Scale incrementally. Don't over-engineer early, but plan for growth.
Monitor everything. Data-driven decisions are crucial for optimization.
Invest in your team and processes. Technical scaling requires human scaling too.
Security and reliability are non-negotiable. Users trust you with their data and business.

As I continue to scale these platforms and work toward serving millions of users, the principles remain the same: build with purpose, scale thoughtfully, and never stop learning.

The future holds exciting possibilities—from expanding across Africa to incorporating AI and machine learning capabilities. Each challenge is an opportunity to learn and improve, both as a developer and as an entrepreneur.

For fellow developers and entrepreneurs on similar journeys, remember that every large-scale platform started with a single user. Focus on delivering value, and scale will follow naturally.

Isaac Paha is a Computing & IT graduate and founder of three tech companies: iPaha Ltd, iPahaStores Ltd, and Okpah Ltd. His platforms serve over 100,000 users across the UK and Ghana. He specializes in building scalable SaaS solutions and can be reached at pahaisaac@gmail.com or LinkedIn.

Building Scalable SaaS Platforms: Lessons from Serving 100K+ Users