Testing Azure Application Insights at Scale

We've got a trust issue with our logging strategy.

Right now we're logging everything to blob storage. Exceptions, usage metrics, sync reports. It works, but querying JSON files in blob storage when something breaks isn't great and it's reactive. Not good when you want to be aware of the problems before your customers report them.

Azure Application Insights seems like the obvious solution, but we've tried it before and still lost data to sampling despite configuring it to be disabled.

So I'm building a proof of concept to test it properly and potentially build a reference project for the rest of my team.

The Plan

Send 10,000+ events (exceptions, information, trace etc) over several hours and query Application Insights to confirm every single one made it through.

I am setting up the POC so that it can test realistic patterns too as I want to get it as close to what we'd expect in a real world scenario.

The app will be interactive. You give it a test identifier, then build a sequence of patterns:

  • Pattern type (steady rate or burst)
  • Telemetry types and percentages (exceptions, traces, metrics, events, dependencies)
  • Normal operations or exception spike (for testing Grafana alerts)
  • Volume and duration

Example sequence:

  1. Steady: 5,000 events over 30 minutes
  2. Burst: 500 exceptions over 2 minutes
  3. Steady: 5,000 events over 30 minutes

Every piece of telemetry gets tagged with TestRunId, PatternType, PatternIndex, TelemetryType, and EventId. Query by TestRunId and compare the count. If I sent 10,000 and got 10,000 back, we're good.

The Architecture

The pipeline goes:

  1. Application Insights
  2. Log Analytics Workspace
  3. Fabric for analytics, with Grafana for real-time dashboards and alerting.

Different telemetry types need different structures. Custom events use flattened properties so Fabric and Grafana can query efficiently. Custom metrics are pure numbers for time-series graphs. Exceptions include flattened alertable fields plus rich JSON context for investigation.

The mock data system generates realistic failures: database timeouts, API errors, validation issues. Proper stack traces, inner exceptions, process logs. The data for the POC will be following our actual data structure of the current logs.

The purpose of doing this is that sampling works by removing a handful of events out of hundreds or thousands where they're extremely similar or near identical to each other. As we are logging when certain edge cases occur or just a summary of what has been updated, the data in these events is going to be very similar. This is where I need to prove sampling is not having an impact.

If this becomes our reference implementation, the mock data shows the team how to structure telemetry data properly.

The Goal

If this works, we migrate off blob storage entirely. Query from Fabric for analytics, build Grafana dashboards for real-time monitoring, set up alerts that fire within seconds of an unhandled exception.

If it doesn't work, at least we'll know Application Insights can't handle our use case and we can look into using Log Analytics directly or something else such as Azure Data Explorer or something self-hosted like Clickhouse.

I'll write a follow-up post with the results once it's done.