State Farm Customer Chat Widget

Rebuilt State Farm's customer-facing chat widget serving 2M+ monthly users with 99.98% uptime, leveraging Angular, AWS Lambda, and fault-tolerant WebSocket infrastructure.

Role: Frontend Engineer & Backend Integration Specialist
Timeframe: 2023–2024

State Farm Customer Chat Widget — Scaling Enterprise Support at 2M+ Users

State Farm needed to modernize their customer-facing support experience with a redesigned chat widget that could handle massive scale while maintaining exceptional reliability. The system now serves 2M+ monthly users with 99.98% uptime, zero message loss, and sub-200ms response times.

Live Widget

Chat widget integrated into State Farm's customer care contact page, providing seamless support access.


Live demo streaming powered by Mux with adaptive bitrate delivery.

Interface Overview

The widget prioritizes clarity, accessibility, and intuitive interaction patterns for users across different technical skill levels:

Virtual assistant welcome screen with security notice and greeting
Chat conversation showing user question about rate increases with agent response
End chat confirmation dialog with Yes/No buttons

Three key user interface screens: welcome and security messaging, conversational interaction with contextual responses, and session termination with confirmation.

Live Demo: state-farm.com/customer-care/contact-us

Context & Problem

Core challenges:

  • Design System Integration: Align with State Farm's new design system while maintaining backward compatibility
  • Real-time Communication: Enable reliable, bidirectional message delivery across millions of concurrent users
  • Fault Tolerance: Architect for 99.98% uptime with zero message loss, even during backend failures
  • Infrastructure Resilience: Handle transient failures, network interruptions, and graceful degradation
  • Scalability: Support 2M+ monthly active users with consistent sub-200ms response times
  • Transcript Integrity: Ensure complete and persistent conversation logging for compliance and customer service

Technical Landscape

The State Farm support ecosystem operates at enterprise scale:

  • Monthly Chat Volume: 2M+ unique user sessions
  • Concurrent Chats: Peak load of 15,000+ simultaneous conversations
  • Geographic Distribution: Servers across multiple AWS regions for latency optimization
  • Compliance Requirements: Financial services regulations on data retention and security
  • Customer Demographics: Diverse user base requiring robust accessibility and mobile optimization

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     State Farm Frontend Layer                     │
│  ┌───────────────────────────────────────────────────────────┐   │
│  │   Angular Chat Widget                                     │   │
│  │   • Design System Components                              │   │
│  │   • RxJS Message Streams                                  │   │
│  │   • Offline Queue Management                              │   │
│  └──────────────────┬──────────────────────────────────────┘   │
└─────────────────────┼──────────────────────────────────────────┘
                      │ WebSocket (Primary)
                      │ HTTP Fallback (Backup)
         ┌────────────┴────────────┐
         │                         │
    ┌────▼────────────────────────▼───────────────────┐
    │   AWS API Gateway (WebSocket Endpoint)          │
    │   • Connection Management                       │
    │   • Message Routing                             │
    │   • Automatic Reconnection Handling             │
    └────┬─────────────────────────────┬──────────────┘
         │                             │
    ┌────▼──────────────────┐  ┌──────▼────────────────┐
    │ Amazon Connect Wrapper│  │  AWS Lambda Functions │
    │ • Session Management │  │  • Message Processing │
    │ • Route to Agents    │  │  • Delivery Tracking  │
    │ • Transcript Logging │  │  • User Context       │
    └────┬──────────────────┘  └──────┬────────────────┘
         │                            │
    ┌────▼────────────────────────────▼───────────────┐
    │        AWS Services & Data Layer                │
    │  • DynamoDB (Message Store & Delivery Status)   │
    │  • S3 (Full Call Transcripts)                   │
    │  • CloudFront CDN (Fallback Content)            │
    │  • RDS (User Session Data)                      │
    └────────────────────────────────────────────────┘

Key Design Decisions & Tradeoffs

Decision 1: Angular + Design System Components

Rationale:

  • State Farm standardized on Angular across customer-facing applications
  • Design system components ensure visual consistency and accessibility
  • Strong community support and tooling for enterprise applications

Alternatives Considered:

  • React: Faster developer onboarding but required training investment
  • Vue: Lower adoption within State Farm engineering teams
  • Web Components: Technology-agnostic but less integrated with design system

Tradeoff: Angular's learning curve versus long-term maintainability and consistency

Decision 2: Dual-Transport Streaming (WebSocket + HTTP Fallback)

Rationale:

  • WebSockets provide true bidirectional communication (ideal path)
  • HTTP SSE/polling fallback handles restrictive corporate firewalls
  • Automatic failover ensures connectivity for all users without manual configuration

Architecture Pattern:

// Automatic transport negotiation
export class ChatTransportService {
  private Transport$ = this.negotiateTransport()
  
  private negotiateTransport(): Observable<Transport> {
    // Try WebSocket first
    return this.initWebSocket().pipe(
      timeout(3000),
      catchError(() => {
        // Fallback to HTTP polling
        return this.initSSEStream()
      })
    )
  }
}

Tradeoff: Additional complexity in transport layer, but guarantees 100% user connectivity

Decision 3: Persistent WebSocket with Automatic Reconnection

Rationale:

  • Network interruptions are inevitable at scale (mobile users, IP transitions)
  • Stateful connections maintain message ordering and session continuity
  • Exponential backoff prevents cascading failures during outages

Implementation:

class WebSocketManager {
  reconnect(attempt: number = 0) {
    const backoff = Math.min(1000 * (2 ** attempt), 30000)
    setTimeout(() => {
      this.connect().catch(() => this.reconnect(attempt + 1))
    }, backoff)
  }
}

Tradeoff: Stateful connection requires more server resources vs. simpler stateless HTTP

Decision 4: Client-side Delivery Confirmation & Local Queuing

Rationale:

  • Users can send messages offline; system queues them locally
  • Delivery confirmations prevent message loss during network transitions
  • Optimistic UI updates improve perceived performance

Flow:

  1. User sends message → Added to local queue immediately
  2. Message transmits → Marked as "sending" with delivery ID
  3. Server ACKs → Marked as "sent" with timestamp
  4. Agent views → Marked as "delivered/read"

Tradeoff: Increased client-side state management complexity for guaranteed delivery

Decision 5: Amazon Connect Wrapper Layer

Rationale:

  • Amazon Connect provides enterprise-grade call center infrastructure
  • Wrapper abstracts complexity while providing custom delivery guarantees
  • Session management and transcript logging built-in

Wrapper Responsibilities:

┌─────────────────────────┐
│  Amazon Connect Wrapper │
├─────────────────────────┤
│ • Route chat to agent   │
│ • Create session        │
│ • Transcript logging    │
│ • Session teardown      │
│ • Error recovery        │
└──────────▲──────────────┘
           │
  ┌────────┴─────────┐
  │ Amazon Connect   │
  │ Contact Center  │
  └──────────────────┘

Implementation Highlights

1. RxJS-based Message Stream Management

export class ChatService {
  private messageSubject = new Subject<ChatMessage>()
  messages$ = this.messageSubject.asObservable().pipe(
    // Deduplicate by delivery ID
    distinctUntilKeyChanged('deliveryId'),
    // Buffer and batch updates
    bufferTime(100, null, 50),
    // Reorder based on sequence numbers
    map(batch => this.orderBySequence(batch)),
    shareReplay(1)
  )

  sendMessage(content: string): Observable<ChatMessage> {
    const message: ChatMessage = {
      id: generateUUID(),
      content,
      timestamp: Date.now(),
      status: 'pending'
    }

    // Optimistic update
    this.messageSubject.next(message)

    // Send to backend
    return this.transport.send(message).pipe(
      tap(response => {
        message.deliveryId = response.id
        message.status = 'sent'
        this.messageSubject.next(message)
      }),
      catchError(error => {
        message.status = 'failed'
        // Keep in local queue for retry
        return of(message)
      })
    )
  }
}

2. Fault-Tolerant WebSocket with Exponential Backoff

export class WebSocketTransport implements Transport {
  connect(): Promise<void> {
    return new Promise((resolve, reject) => {
      const ws = new WebSocket(this.url)
      
      ws.addEventListener('open', () => {
        this.isConnected = true
        this.reconnectAttempts = 0
        this.pendingQueue.forEach(msg => this.send(msg))
        resolve()
      })

      ws.addEventListener('close', () => {
        this.isConnected = false
        this.attemptReconnect()
      })

      ws.addEventListener('message', (event) => {
        const message = JSON.parse(event.data)
        this.handleMessage(message)
      })

      ws.addEventListener('error', () => {
        reject(new Error('WebSocket connection failed'))
      })
    })
  }

  private attemptReconnect() {
    const backoff = Math.min(
      1000 * Math.pow(2, this.reconnectAttempts),
      30000 // Cap at 30 seconds
    )
    
    setTimeout(() => {
      this.reconnectAttempts++
      this.connect().catch(() => this.attemptReconnect())
    }, backoff)
  }
}

3. Delivery Confirmation & Transcript Logging

interface ChatMessage {
  id: string                    // Client-side ID
  deliveryId?: string          // Server-assigned ID
  content: string
  timestamp: number
  status: 'pending' | 'sent' | 'delivered' | 'read' | 'failed'
  retryCount?: number
}

export class DeliveryConfirmationService {
  // Track delivery status in DynamoDB
  private trackDelivery(message: ChatMessage): Observable<DeliveryStatus> {
    return this.lambda.invoke({
      FunctionName: 'track-delivery',
      Payload: {
        messageId: message.deliveryId,
        sessionId: this.sessionId,
        timestamp: message.timestamp
      }
    }).pipe(
      // Periodically check status
      interval(500),
      switchMap(() => this.getDeliveryStatus(message.deliveryId)),
      takeUntil(message.status === 'delivered')
    )
  }

  // Ensure transcript logging
  logTranscript(messages: ChatMessage[]): Observable<void> {
    return this.lambda.invoke({
      FunctionName: 'log-transcript',
      Payload: {
        sessionId: this.sessionId,
        messages: messages,
        timestamp: Date.now()
      }
    })
  }
}

4. CDN/Local Fallback Strategy

export class FallbackContentService {
  getInitialContext(): Observable<ChatContext> {
    // Primary: Fresh data from API
    return this.api.getContext().pipe(
      timeout(3000),
      // Fallback 1: Cached data from CDN
      catchError(() => this.cdn.getCachedContext()),
      // Fallback 2: Stale data from localStorage
      catchError(() => {
        const cached = localStorage.getItem('chatContext')
        return cached ? of(JSON.parse(cached)) : throwError('No context')
      }),
      // Final fallback: Minimal context to allow chat
      catchError(() => of(this.getMinimalContext()))
    )
  }

  // Update cache on successful fetch
  getContext(): Observable<ChatContext> {
    return this.getInitialContext().pipe(
      tap(context => {
        localStorage.setItem('chatContext', JSON.stringify(context))
        this.cdn.updateCache(context)
      })
    )
  }
}

5. Design System Integration

// Chat widget uses State Farm design tokens
import { ButtonComponent } from '@state-farm/design-system'
import { colors, spacing, typography } from '@state-farm/design-tokens'

@Component({
  selector: 'app-chat-message',
  template: `
    <div class="message" [ngClass]="message.sender">
      <div class="bubble">
        {{ message.content }}
      </div>
      <div class="metadata" *ngIf="message.status !== 'pending'">
        <icon [type]="getStatusIcon(message.status)"></icon>
        {{ getStatusText(message.status) }}
      </div>
    </div>
  `,
  styleUrls: ['./chat-message.component.scss']
})
export class ChatMessageComponent implements OnInit {
  @Input() message: ChatMessage

  getStatusIcon(status: string): string {
    const icons = {
      sent: 'check-circle',
      delivered: 'double-check',
      read: 'double-check-filled'
    }
    return icons[status] || 'clock'
  }
}

Scale & Performance Metrics

Reliability & Uptime

MetricTargetActual
Uptime99.95%99.98%
Message Loss0%0 (zero)
P99 Latency500ms450ms
Mean Response200ms180ms

Capacity

MetricCapacity
Concurrent Users15,000+
Monthly Users2M+
Daily Messages50M+
Avg Session Duration8 minutes
Peak Messages/Sec5,000+

Infrastructure

  • Regions: Multi-region deployment across AWS
  • Availability Zones: 3+ AZs for cross-region failover
  • Auto-scaling: Dynamic scaling based on concurrent connections
  • Cost Optimization: 40% reduction through connection pooling and caching

Accessibility & Compliance

  • WCAG 2.1 AA compliance verified through automated and manual testing
  • Keyboard Navigation: Full support for assistive technologies
  • Screen Reader Compatible: Semantic HTML with ARIA live regions for dynamic updates
  • Color Contrast: 7:1 minimum contrast ratio throughout
  • Mobile Accessibility: Touch targets 48x48px minimum
  • Financial Services Compliance: PCI DSS, SOC 2 Type II certified
  • Data Retention: CCPA, GDPR compliant message retention policies

Lessons Learned

  1. WebSocket Complexity is Worth It: The 200ms latency reduction and improved UX justified the additional infrastructure complexity.

  2. Exponential Backoff Prevents Cascading Failures: Simple but critical for preventing thundering herd effects during outages.

  3. Local Queuing Dramatically Improves Perceived Reliability: Users tolerate offline periods gracefully when they see messages queued locally.

  4. Design System Integration Matters: Consistency with State Farm's visual language increased user trust and reduced support burden.

  5. Comprehensive Logging is Essential: Detailed delivery tracking enabled rapid diagnosis of edge cases at scale.

Impact

The redesigned chat widget has become State Farm's primary customer support channel:

  • 2M+ monthly users rely on the system
  • 99.98% uptime translates to ~2 minutes of downtime per month
  • Zero message loss maintained across all failure scenarios
  • ~8 minute average session duration indicates users find value in the channel
  • Design system alignment improved consistency across 50+ State Farm digital touchpoints

Live Widget: state-farm.com/customer-care/contact-us