Skip to content

[Bug] IOBuf TLS block pool: double-return of a Block creates a self-loop in portal_next linked list, causing thread hang #3243

@walterzhaoJR

Description

@walterzhaoJR

Describe the bug
release_tls_block() and release_tls_block_chain() in the IOBuf TLS block caching layer do not guard against a block being returned to TLS when it is already the TLS list head. This can create a self-referencing cycle (b->portal_next == b), causing any subsequent traversal of the TLS chain — such as remove_tls_block_chain() (registered via thread_atexit) or share_tls_block() — to loop infinitely, hanging the thread permanently.

In src/butil/iobuf_inl.h, release_tls_block():

Image

When b is already tls_data->block_head, the assignment b->u.portal_next = tls_data->block_head becomes b->u.portal_next = b, forming a single-node cycle.

Similarly, in src/butil/iobuf.cpp, release_tls_block_chain():

Image

If the chain being returned contains blocks that overlap with the existing TLS head, last_b->portal_next can point back to first_b (which may be last_b itself), again forming an infinite cycle.

How the Double-Return Happens
IOBufAsZeroCopyOutputStream::BackUp() calls iobuf::release_tls_block(_cur_block) to eagerly return the block to TLS so other code can reuse it:

Image

After BackUp(), the block is now tls_data.block_head. If a subsequent operation (e.g., _release_block() during destruction of IOBufAsZeroCopyOutputStream, or a BackUp in IOBufAsSnappySink) calls release_tls_block() again with the same block pointer (obtained from a still-live BlockRef), the block is returned a second time — triggering the self-loop.

Impact

  • Thread hangs permanently in remove_tls_block_chain() (called at thread exit via thread_atexit), or in share_tls_block() / release_tls_block_chain() during normal I/O.
  • The hang is silent — no crash, no log, no error — making it extremely difficult to diagnose in production.
  • Any brpc application using protobuf serialization over IOBuf (which internally uses IOBufAsZeroCopyOutputStream) is potentially affected.

To Reproduce

Expected behavior

Versions
OS:
Compiler:
brpc:
protobuf:

Additional context/screenshots

** Suggested Fix **

  1. Guard release_tls_block() against double-return
Image
  1. Guard release_tls_block_chain() against self-loop after linking
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions