What is Latency?
Latency refers to the time delay between when a request is initiated and when a response is received. In advertising technology and AI systems, latency is measured in milliseconds (ms) and represents how quickly your ad tech stack can process information, make decisions, and deliver results.
Think of latency as the time it takes for a conversation to happen. If you ask a question and wait three seconds for an answer, that three-second gap is latency. In advertising, those milliseconds matter tremendously.
Why Latency Matters in Ad Tech
Real-Time Bidding (RTB) happens in microseconds. When a user loads a webpage, advertisers have roughly 100ms to decide whether to bid on that impression. If your AI model has high latency – say, 150ms – you've already missed the opportunity. Your bid won't even be submitted.
User Experience suffers with high latency. When personalization engines are slow, ads take longer to load, pages feel sluggish, and users bounce. Google's algorithms penalise slow-loading pages in rankings, making latency a direct SEO concern.
Campaign Performance depends on split-second decisions. AI models that optimise bids, target audiences, or adjust creative must operate with minimal delay. A 200ms delay in a programmatic campaign across millions of impressions compounds into missed conversions and wasted budget.
Types of Latency
Network latency is the physical time data takes to travel across the internet. This depends on server location, bandwidth, and infrastructure.
Processing latency is the time an AI model needs to analyse data and generate a prediction. Complex machine learning models naturally have higher processing latency than simple rule-based systems.
End-to-end latency measures the total time from initial request to final delivery, including all network and processing delays combined.
Practical Examples
Scenario 1: Programmatic Display Advertising A user visits a news site. The publisher's ad server receives a bid request. Your AI-powered demand-side platform (DSP) needs to: - Identify the user - Check brand safety rules - Query your ML model for bid price - Submit the bid
If this takes 80ms, you're competitive. If it takes 300ms, you're out. The winner is usually the fastest.
Scenario 2: Dynamic Creative Optimisation Your AI personalises ad creative in real-time. A user who previously viewed running shoes sees a different ad than someone who viewed winter coats. If the system has 500ms latency, the page loads before the personalised ad appears, defeating the purpose.
Scenario 3: Fraud Detection Your AI system screens for bot traffic and invalid impressions. High latency here means fraudulent clicks get counted before detection happens. Low latency (under 50ms) catches fraud before billing occurs.
How to Optimise Latency
Use edge computing: Move processing closer to users geographically to reduce network latency.
Simplify AI models: More complex models = higher latency. Sometimes a lightweight model is better than a highly accurate slow one.
Cache predictions: Pre-calculate common decisions (e.g., bid prices for frequent user segments) rather than computing in real-time.
Monitor continuously: Track latency metrics across your stack. Identify bottlenecks regularly.
Acceptable Latency Benchmarks
- RTB bidding: Under 100ms (ideal: under 50ms)
- Personalisation engines: Under 200ms
- Fraud detection: Under 100ms
- Creative rendering: Under 500ms
- General analytics: Under 1,000ms
These vary by use case. Your latency requirements depend on your specific advertising goals.