🐞 fix: streamline OTLP exporter initialization and fallback handling#231
🐞 fix: streamline OTLP exporter initialization and fallback handling#231ahsanfarooq210 wants to merge 2 commits intoDeutscheModelUnitedNations:mainfrom
Conversation
|
Thank you very much for your contribution, the whole OTEL stuff is on hold for now, since we'll have to work out some implementation details which might be subject to change. As soon as it comes to fully implementing and using OTEL as it is intended, we will get back to this! |
|
@ahsanfarooq210 "Gut Ding will Weile haben" (german saying for "good things take time") I just reviseted this PR and built on it to finally fix and implement how we use OTEL. Since this PR is quite stale, I opened a new PR at #354. If you are quite keen on having this PR merged as proof of your work, I can do that - otherwise I would close it for now and advance with #354. Let me know if thats okay for you! |
📝 WalkthroughWalkthroughThe tracer initialization logic was refactored to add resilience against OTLP exporter failures. A try/catch wrapper now handles OTLP endpoint initialization, with automatic fallback to console exporter if the endpoint is unavailable or initialization fails. Processor selection was reorganized to conditionally register OTLP or console processors based on environment and initialization success. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 6 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@ahsanfarooq210 would you mind merging the upstream otel-check branch into your forked branch please? I got confused how github handels forks haha |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/api/resolvers/tracer.ts`:
- Around line 80-86: The console fallback (fallbackProcessor /
PrettyConsoleSpanExporter) is always added via provider.addSpanProcessor which
can leak PII and flood production logs; change the logic in tracer setup to only
register the fallbackProcessor when running in non-production or when an
explicit config flag is enabled (e.g., process.env.NODE_ENV !== 'production' ||
config.enableConsoleExporter) and/or apply a sanitization/redaction step to span
attributes/GraphQL args before they reach PrettyConsoleSpanExporter; update the
conditional around provider.addSpanProcessor(fallbackProcessor) and ensure
activeProcessor logic (activeProcessor !== fallbackProcessor) still applies.
| // Always add fallback processor first | ||
| provider.addSpanProcessor(fallbackProcessor); | ||
|
|
||
| // Add OTLP processor if available and different from fallback | ||
| if (activeProcessor && activeProcessor !== fallbackProcessor) { | ||
| provider.addSpanProcessor(activeProcessor); | ||
| } |
There was a problem hiding this comment.
Always-on console fallback can leak PII and flood logs.
PrettyConsoleSpanExporter logs user email and GraphQL args. Because the fallback processor is added unconditionally, this will emit in production even when OTLP is healthy, which is a privacy/compliance and log-volume risk. Consider gating the console fallback to non-production or an explicit flag, or redact sensitive fields.
🔧 Suggested guard to limit console fallback in production
-// Always add fallback processor first
-provider.addSpanProcessor(fallbackProcessor);
+// Add console fallback only when OTLP isn't active (or outside production)
+const enableConsoleFallback =
+ !activeProcessor || activeProcessor === fallbackProcessor || configPrivate.NODE_ENV !== 'production';
+if (enableConsoleFallback) {
+ provider.addSpanProcessor(fallbackProcessor);
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Always add fallback processor first | |
| provider.addSpanProcessor(fallbackProcessor); | |
| // Add OTLP processor if available and different from fallback | |
| if (activeProcessor && activeProcessor !== fallbackProcessor) { | |
| provider.addSpanProcessor(activeProcessor); | |
| } | |
| // Add console fallback only when OTLP isn't active (or outside production) | |
| const enableConsoleFallback = | |
| !activeProcessor || activeProcessor === fallbackProcessor || configPrivate.NODE_ENV !== 'production'; | |
| if (enableConsoleFallback) { | |
| provider.addSpanProcessor(fallbackProcessor); | |
| } | |
| // Add OTLP processor if available and different from fallback | |
| if (activeProcessor && activeProcessor !== fallbackProcessor) { | |
| provider.addSpanProcessor(activeProcessor); | |
| } |
🤖 Prompt for AI Agents
In `@src/api/resolvers/tracer.ts` around lines 80 - 86, The console fallback
(fallbackProcessor / PrettyConsoleSpanExporter) is always added via
provider.addSpanProcessor which can leak PII and flood production logs; change
the logic in tracer setup to only register the fallbackProcessor when running in
non-production or when an explicit config flag is enabled (e.g.,
process.env.NODE_ENV !== 'production' || config.enableConsoleExporter) and/or
apply a sanitization/redaction step to span attributes/GraphQL args before they
reach PrettyConsoleSpanExporter; update the conditional around
provider.addSpanProcessor(fallbackProcessor) and ensure activeProcessor logic
(activeProcessor !== fallbackProcessor) still applies.
sure |
🐞 Add Fallback Mechanism for OpenTelemetry Server
Problem
When the OpenTelemetry (OTEL) server becomes unresponsive, the application could potentially hang or crash due to lack of proper fallback mechanisms. This creates reliability issues in production environments.
Solution
Implemented a robust fallback system for OpenTelemetry tracing that ensures the application continues to function even when the OTEL server is unavailable:
Added a console-based fallback exporter that's always active
Implemented timeout and concurrency limits for OTEL operations
Added graceful degradation to console logging
Improved error handling and logging
Key Changes
Created PrettyConsoleSpanExporter as a reliable fallback
Added configuration for OTEL timeouts (5 seconds) and concurrency limits
Implemented environment-aware processing (Batch vs Simple)
Added dual processing setup with automatic fallback
Improved error handling and logging
Testing
To verify the changes:
Start the application normally
Verify spans are being exported to OTEL
Disable OTEL server
Verify application continues running with console logging
Verify no data loss during OTEL outage
Impact
Improved system reliability
No service disruption during OTEL outages
Better observability through fallback logging
Zero downtime during telemetry issues
Related Issues
Fixes #158 - Application crashes when OTEL server is unresponsive
This is part 1 of 2 in a stack made with GitButler:
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.