Skip to content

🐞 fix: streamline OTLP exporter initialization and fallback handling#231

Open
ahsanfarooq210 wants to merge 2 commits intoDeutscheModelUnitedNations:mainfrom
ahsanfarooq210:otel-check
Open

🐞 fix: streamline OTLP exporter initialization and fallback handling#231
ahsanfarooq210 wants to merge 2 commits intoDeutscheModelUnitedNations:mainfrom
ahsanfarooq210:otel-check

Conversation

@ahsanfarooq210
Copy link
Copy Markdown
Contributor

@ahsanfarooq210 ahsanfarooq210 commented May 24, 2025

🐞 Add Fallback Mechanism for OpenTelemetry Server
Problem
When the OpenTelemetry (OTEL) server becomes unresponsive, the application could potentially hang or crash due to lack of proper fallback mechanisms. This creates reliability issues in production environments.

Solution
Implemented a robust fallback system for OpenTelemetry tracing that ensures the application continues to function even when the OTEL server is unavailable:

Added a console-based fallback exporter that's always active
Implemented timeout and concurrency limits for OTEL operations
Added graceful degradation to console logging
Improved error handling and logging
Key Changes
Created PrettyConsoleSpanExporter as a reliable fallback
Added configuration for OTEL timeouts (5 seconds) and concurrency limits
Implemented environment-aware processing (Batch vs Simple)
Added dual processing setup with automatic fallback
Improved error handling and logging
Testing
To verify the changes:

Start the application normally
Verify spans are being exported to OTEL
Disable OTEL server
Verify application continues running with console logging
Verify no data loss during OTEL outage
Impact
Improved system reliability
No service disruption during OTEL outages
Better observability through fallback logging
Zero downtime during telemetry issues
Related Issues
Fixes #158 - Application crashes when OTEL server is unresponsive


This is part 1 of 2 in a stack made with GitButler:

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced fallback console logging mechanism for tracing when the primary exporter is unavailable or initialization fails.
    • Added support for header handling and timeout/concurrency configuration options for enhanced flexibility.
  • Bug Fixes

    • Improved error handling during tracer initialization with automatic fallback to ensure tracing continues even when primary configuration fails.

✏️ Tip: You can customize this high-level summary in your review settings.

@m1212e
Copy link
Copy Markdown
Member

m1212e commented Aug 12, 2025

Thank you very much for your contribution, the whole OTEL stuff is on hold for now, since we'll have to work out some implementation details which might be subject to change. As soon as it comes to fully implementing and using OTEL as it is intended, we will get back to this!

@Strehk
Copy link
Copy Markdown
Member

Strehk commented Jan 18, 2026

@ahsanfarooq210 "Gut Ding will Weile haben" (german saying for "good things take time")

I just reviseted this PR and built on it to finally fix and implement how we use OTEL. Since this PR is quite stale, I opened a new PR at #354. If you are quite keen on having this PR merged as proof of your work, I can do that - otherwise I would close it for now and advance with #354.

Let me know if thats okay for you!

@Strehk
Copy link
Copy Markdown
Member

Strehk commented Jan 18, 2026

Ah nevermind - just saw that my git tool wants me to merge this first anyways. In this case: @m1212e can you please have a look at this PR and #354 afterwards? :)

From my perspective everything works just fine, but I'd like your opinion and testing on this topic as well.

@Strehk Strehk requested a review from m1212e January 18, 2026 22:43
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 19, 2026

📝 Walkthrough

Walkthrough

The tracer initialization logic was refactored to add resilience against OTLP exporter failures. A try/catch wrapper now handles OTLP endpoint initialization, with automatic fallback to console exporter if the endpoint is unavailable or initialization fails. Processor selection was reorganized to conditionally register OTLP or console processors based on environment and initialization success.

Changes

Cohort / File(s) Summary
OTEL Tracer Resilience
src/api/resolvers/tracer.ts
Refactored tracer initialization with try/catch error handling for OTLP exporter. Introduced fallback console exporter/processor and conditional processor wiring. Changed processor selection logic: BatchSpanProcessor for production, SimpleSpanProcessor for non-production, with console fallback on initialization failure. Added headers, timeout, and concurrency configuration options.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 When telemetry falters and endpoints go dark,
A resilient tracer leaves logs like a spark,
With fallbacks and guards on the initialization road,
The app keeps on tracing, unburdened by load!

🚥 Pre-merge checks | ✅ 6 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Boy Scout Scope ⚠️ Warning PR includes opportunistic changes without documenting them in a 'Boy Scout changes' section as required. Add a 'Boy Scout changes' section to the PR description documenting the blank line removal in user.ts, or remove the opportunistic change entirely.
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: streamlining OTLP exporter initialization and improving fallback handling, which directly aligns with the core modifications in the tracer.ts file.
Linked Issues check ✅ Passed The changes directly address issue #158 by implementing robust OTLP initialization with fallback console exporter, timeout/concurrency controls, and error handling to ensure application resilience during OTEL outages.
Out of Scope Changes check ✅ Passed All modifications focus on OTLP exporter initialization, fallback handling, and processor selection in the tracer configuration, which are directly aligned with the PR objectives and linked issue #158.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
German Gender-Inclusive Language ✅ Passed The PR only modified src/api/resolvers/tracer.ts (+38/-29 lines) with no changes to messages/de.json, so the German gender-inclusive language check is not applicable and passes by default.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Strehk
Copy link
Copy Markdown
Member

Strehk commented Jan 19, 2026

@ahsanfarooq210 would you mind merging the upstream otel-check branch into your forked branch please? I got confused how github handels forks haha

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/api/resolvers/tracer.ts`:
- Around line 80-86: The console fallback (fallbackProcessor /
PrettyConsoleSpanExporter) is always added via provider.addSpanProcessor which
can leak PII and flood production logs; change the logic in tracer setup to only
register the fallbackProcessor when running in non-production or when an
explicit config flag is enabled (e.g., process.env.NODE_ENV !== 'production' ||
config.enableConsoleExporter) and/or apply a sanitization/redaction step to span
attributes/GraphQL args before they reach PrettyConsoleSpanExporter; update the
conditional around provider.addSpanProcessor(fallbackProcessor) and ensure
activeProcessor logic (activeProcessor !== fallbackProcessor) still applies.

Comment on lines +80 to +86
// Always add fallback processor first
provider.addSpanProcessor(fallbackProcessor);

// Add OTLP processor if available and different from fallback
if (activeProcessor && activeProcessor !== fallbackProcessor) {
provider.addSpanProcessor(activeProcessor);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Always-on console fallback can leak PII and flood logs.

PrettyConsoleSpanExporter logs user email and GraphQL args. Because the fallback processor is added unconditionally, this will emit in production even when OTLP is healthy, which is a privacy/compliance and log-volume risk. Consider gating the console fallback to non-production or an explicit flag, or redact sensitive fields.

🔧 Suggested guard to limit console fallback in production
-// Always add fallback processor first
-provider.addSpanProcessor(fallbackProcessor);
+// Add console fallback only when OTLP isn't active (or outside production)
+const enableConsoleFallback =
+  !activeProcessor || activeProcessor === fallbackProcessor || configPrivate.NODE_ENV !== 'production';
+if (enableConsoleFallback) {
+  provider.addSpanProcessor(fallbackProcessor);
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Always add fallback processor first
provider.addSpanProcessor(fallbackProcessor);
// Add OTLP processor if available and different from fallback
if (activeProcessor && activeProcessor !== fallbackProcessor) {
provider.addSpanProcessor(activeProcessor);
}
// Add console fallback only when OTLP isn't active (or outside production)
const enableConsoleFallback =
!activeProcessor || activeProcessor === fallbackProcessor || configPrivate.NODE_ENV !== 'production';
if (enableConsoleFallback) {
provider.addSpanProcessor(fallbackProcessor);
}
// Add OTLP processor if available and different from fallback
if (activeProcessor && activeProcessor !== fallbackProcessor) {
provider.addSpanProcessor(activeProcessor);
}
🤖 Prompt for AI Agents
In `@src/api/resolvers/tracer.ts` around lines 80 - 86, The console fallback
(fallbackProcessor / PrettyConsoleSpanExporter) is always added via
provider.addSpanProcessor which can leak PII and flood production logs; change
the logic in tracer setup to only register the fallbackProcessor when running in
non-production or when an explicit config flag is enabled (e.g.,
process.env.NODE_ENV !== 'production' || config.enableConsoleExporter) and/or
apply a sanitization/redaction step to span attributes/GraphQL args before they
reach PrettyConsoleSpanExporter; update the conditional around
provider.addSpanProcessor(fallbackProcessor) and ensure activeProcessor logic
(activeProcessor !== fallbackProcessor) still applies.

@ahsanfarooq210
Copy link
Copy Markdown
Contributor Author

@ahsanfarooq210 would you mind merging the upstream otel-check branch into your forked branch please? I got confused how github handels forks haha

sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check if otel is able to crash the application at runtime

3 participants