Skip to main content

Replay jobs and audit

The current console exposes both replay execution and administrative history.

What replay is for

Replay lets you re-run historical traffic through a chosen encoding version and rule version.

This is useful when you want to answer questions like:

  • What would have happened if this rule had existed earlier?
  • Did my encoding change improve or break interpretation of old traffic?
  • Can I test safely before depending only on new live events?

Replay jobs

Replay jobs are listed with:

  • job_id
  • job_type
  • encoding_version
  • rule_version
  • time_range_start
  • time_range_end
  • status
  • events_processed
  • events_failed
  • created_at
  • completed_at

What each replay field means

FieldPlain-English meaningWhy you care
job_idThe unique ID for this replay runUseful when discussing a specific job
job_typeWhat kind of replay work Esper should performControls the purpose of the run
encoding_versionWhich encoding plan version to useLets you test a specific translation layer
rule_versionWhich rule set version to useLets you test a specific policy state
time_range_startBeginning of the historical windowDefines which past traffic is included
time_range_endEnd of the historical windowDefines where the replay stops
statusCurrent state of the replayTells you whether to wait, inspect, or retry
events_processedHow many events were handled successfullyMain progress and throughput signal
events_failedHow many events could not be processedMain signal that something may be wrong
created_atWhen the replay was requestedUseful for operator history
completed_atWhen the replay finished, if it has finishedHelps measure duration

Supported job types:

  • DryRun
  • StateRebuild
  • DecisionOnly

How to think about them:

Job typeBest use
DryRunSafest first test of a proposed change
StateRebuildRecompute saved state from historical traffic
DecisionOnlyFocus on decision outcomes rather than broader state rebuilds

Supported statuses:

  • Pending
  • Running
  • Complete
  • Failed

Launching a replay job

The current form submits:

POST /tenants/{tenant_id}/replay-jobs

Required fields:

  • job_type
  • encoding_version
  • rule_version
  • time_range_start
  • time_range_end

The app currently initializes the form with UTC timestamps and version 1 for both encoding and rules.

Good first replay habit:

  • use a short time window first
  • prefer DryRun before heavier replay modes
  • compare events_processed and events_failed before trusting the results

Audit history

The audit page lists tenant-scoped entries with:

  • audit_entry_id
  • actor
  • action
  • recorded_at

The overview page also uses audit entry count as part of the tenant’s operational posture summary.

What each audit field means

FieldPlain-English meaningWhy you care
audit_entry_idUnique record of an admin actionUseful when tracing specific changes
actorWho performed the actionAnswers ownership questions
actionWhat changedQuick summary of the mutation
recorded_atWhen it happenedHelps line up a change with later outcomes

How to use these together

In the current product flow:

  1. Create or revise encoding and rule definitions.
  2. Launch a replay job against a historical time window.
  3. Review resulting decisions and events.
  4. Confirm the administrative changes in audit history.

This is the cleanest way for a non-technical operator to explain a change:

  • what changed
  • when it changed
  • what historical data was replayed
  • what outcome changed as a result