⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@sgerbino
Copy link
Collaborator

@sgerbino sgerbino commented Jan 22, 2026

Add working resolver implementation for Linux using a worker thread approach. Since POSIX getaddrinfo() is blocking and cannot be monitored with epoll, each resolution spawns a dedicated thread that runs the blocking call and posts completion back to the scheduler.

Implementation details:

  • epoll_resolve_op inherits from scheduler_op (not epoll_op) since DNS resolution doesn't use epoll file descriptors
  • Worker threads are detached and self-managing; shared_ptr capture keeps the impl alive until completion
  • Cancellation uses an atomic flag checked after getaddrinfo() returns (the call itself cannot be interrupted)
  • Follows the same shared_ptr lifetime pattern as sockets.hpp for safe cleanup during shutdown

Files changed:

  • src/corosio/src/detail/epoll/resolver_service.hpp: Major rewrite with epoll_resolve_op structure, updated epoll_resolver_impl with enable_shared_from_this, and epoll_resolver_service with scheduler integration
  • src/corosio/src/detail/epoll/resolver_service.cpp: New implementation file with flags_to_hints(), convert_results(), make_gai_error() helpers and the worker thread logic

Also adds comprehensive cross-platform unit tests for the resolver:

  • Construction and move semantics
  • Basic resolution (localhost, numeric IPv4/IPv6, service names)
  • Error handling (invalid hosts, flag violations)
  • Cancellation behavior
  • Sequential resolution operations
  • io_result and structured binding usage
  • resolve_flags bitwise operators
  • resolver_results and resolver_entry APIs

Tests use localhost and numeric IPs to avoid external DNS dependencies in CI environments.

Resolves #53.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added asynchronous DNS resolver with epoll support for non-blocking hostname and service resolution. Includes cancellation support, comprehensive error handling with mapping of DNS errors to standard error codes, and background worker thread execution.
  • Tests

    • Added comprehensive unit test suite for resolver covering construction, move semantics, resolution scenarios, error handling, cancellation behavior, and result validation.

✏️ Tip: You can customize this high-level summary in your review settings.

Add working resolver implementation for Linux using a worker thread
approach. Since POSIX getaddrinfo() is blocking and cannot be monitored
with epoll, each resolution spawns a dedicated thread that runs the
blocking call and posts completion back to the scheduler.

Implementation details:
- epoll_resolve_op inherits from scheduler_op (not epoll_op) since DNS
  resolution doesn't use epoll file descriptors
- Worker threads are detached and self-managing; shared_ptr capture
  keeps the impl alive until completion
- Cancellation uses an atomic flag checked after getaddrinfo() returns
  (the call itself cannot be interrupted)
- Follows the same shared_ptr lifetime pattern as sockets.hpp for safe
  cleanup during shutdown

Files changed:
- src/corosio/src/detail/epoll/resolver_service.hpp: Major rewrite with
  epoll_resolve_op structure, updated epoll_resolver_impl with
  enable_shared_from_this, and epoll_resolver_service with scheduler
  integration
- src/corosio/src/detail/epoll/resolver_service.cpp: New implementation
  file with flags_to_hints(), convert_results(), make_gai_error()
  helpers and the worker thread logic

Also adds comprehensive cross-platform unit tests for the resolver:
- Construction and move semantics
- Basic resolution (localhost, numeric IPv4/IPv6, service names)
- Error handling (invalid hosts, flag violations)
- Cancellation behavior
- Sequential resolution operations
- io_result and structured binding usage
- resolve_flags bitwise operators
- resolver_results and resolver_entry APIs

Tests use localhost and numeric IPs to avoid external DNS dependencies
in CI environments.
@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Implements epoll-based asynchronous DNS resolution by introducing resolver operation structs, service classes, and worker thread coordination. Uses getaddrinfo on worker threads with results posted to the epoll scheduler. Includes comprehensive unit tests covering construction, resolution scenarios, error handling, and cancellation.

Changes

Cohort / File(s) Summary
Resolver Service Core Implementation
src/corosio/src/detail/epoll/resolver_service.hpp, src/corosio/src/detail/epoll/resolver_service.cpp
Introduces epoll_resolve_op struct managing per-resolution state (coroutine, executor, host/service, flags, cancellation). Implements epoll_resolver_impl coordinating resolve operations with worker thread spawning and result handling. Defines epoll_resolver_service managing resolver lifetimes, scheduler integration (post, work_started, work_finished), and shutdown. Includes utility functions for converting resolve_flags to getaddrinfo hints and mapping getaddrinfo errors to system_error_codes.
Unit Tests
test/unit/resolver.cpp
New comprehensive test suite validating resolver construction/move semantics, localhost/numeric IPv4/IPv6 resolution, service name lookup, host/service metadata, error handling, cancellation behavior, sequential resolutions, io_result semantics, and resolver_results/entry behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant epoll_resolver_impl
    participant epoll_resolver_service
    participant Worker Thread
    participant Scheduler
    participant Coroutine

    Caller->>epoll_resolver_impl: resolve(host, service, flags, token)
    epoll_resolver_impl->>epoll_resolver_service: work_started()
    epoll_resolver_impl->>epoll_resolve_op: reset() and configure
    epoll_resolver_impl->>epoll_resolve_op: start(stop_token)
    epoll_resolve_op->>Worker Thread: spawn detached thread
    epoll_resolve_op-->>epoll_resolver_impl: return

    Worker Thread->>Worker Thread: build addrinfo hints from flags
    Worker Thread->>Worker Thread: call getaddrinfo(host, service)
    alt Resolution succeeds
        Worker Thread->>Worker Thread: convert_results to resolver_results
    else Resolution fails
        Worker Thread->>Worker Thread: make_gai_error mapping
    end
    Worker Thread->>epoll_resolve_op: complete with result/error
    
    epoll_resolve_op->>epoll_resolve_op: apply result to ec_out/out
    epoll_resolve_op->>epoll_resolver_service: post(this)
    epoll_resolve_op->>epoll_resolver_service: work_finished()
    epoll_resolver_service->>Scheduler: enqueue completion operation
    
    Scheduler->>epoll_resolve_op: operator()() invoke
    epoll_resolve_op->>Coroutine: resume with results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The implementation involves intricate async coordination between worker threads, coroutine lifecycle management, cancellation semantics, and scheduler integration. Multiple interacting classes with non-trivial state machines and error handling require careful validation of thread-safety, lifecycle correctness, and getaddrinfo result conversion logic.

Possibly related PRs

Poem

🐰 A resolver hops through epoll fields,
Where getaddrinfo's promise yields,
Worker threads with DNS quest,
Async names put to the test,
IPv6 addresses hopping west! 🌐

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement async DNS resolution for epoll backend' clearly and concisely summarizes the main change: adding asynchronous DNS resolution support for the epoll-based COROSIO backend.
Linked Issues check ✅ Passed The implementation comprehensively addresses all coding requirements from issue #53: epoll_resolve_op with required fields, epoll_resolver_impl with resolve/cancel methods, flags_to_hints and convert_results utilities, complete worker-thread flow with work tracking, cancellation via atomic flag, and comprehensive tests covering all specified scenarios.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the epoll resolver as specified in issue #53: the two header/implementation files (resolver_service.hpp and .cpp) contain only resolver-related code, and tests are confined to resolver functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://54.corosio.prtest3.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-01-22 15:48:17 UTC

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/corosio/src/detail/epoll/resolver_service.cpp`:
- Around line 169-184: In epoll_resolve_op::operator() ensure ec_out is
explicitly cleared on success: when ec_out is non-null and the operation is not
cancelled and gai_error == 0, set *ec_out = {}; then continue to move
stored_results into *out; keep checks using cancelled.load(...) and gai_error
as-is so only successful resolution paths clear the error code (refer to ec_out,
cancelled, gai_error, out, stored_results and epoll_resolve_op::operator()).
- Around line 250-285: The thread creation can throw and work_started() must be
balanced and post() must happen before work_finished(); wrap the std::thread
construction (the worker lambda that captures shared_from_this()) in a
try-catch(std::system_error) and in the catch call svc_.post(&op_) then
svc_.work_finished() to ensure the operation completes; also change the order
inside the worker lambda to call svc_.post(&op_) before svc_.work_finished()
after resolving addresses (where you set op_.stored_results via convert_results,
op_.gai_error, freeaddrinfo, and inspect op_.cancelled).

In `@src/corosio/src/detail/epoll/resolver_service.hpp`:
- Around line 145-158: The resolver implementation reuses a single
epoll_resolve_op (op_) and lacks protection against concurrent resolve calls;
update epoll_resolver_impl to enforce the single-inflight contract by adding a
guard (e.g., an atomic<bool> op_in_flight or a simple assert) checked at the
start of resolve() and cleared in cancel() and when the operation completes;
reference op_, resolve(), cancel(), and epoll_resolve_op when implementing so
you set the flag before starting the op and reset it on cancellation/completion
(or replace with assert(!op_in_flight) if you prefer a debug-only check), and
add a short class-level comment stating the single-inflight requirement.
🧹 Nitpick comments (1)
test/unit/resolver.cpp (1)

294-357: Keep invalid-host test hermetic.

Even with .invalid, getaddrinfo can still hit external DNS and be flaky on network-restricted CI. Consider using a deterministic failure that avoids DNS (e.g., numeric-host with an invalid literal) or marking this as an integration test.

Comment on lines +169 to +184
void
epoll_resolve_op::
operator()()
{
stop_cb.reset(); // Disconnect stop callback

if (ec_out)
{
if (cancelled.load(std::memory_order_acquire))
*ec_out = capy::error::canceled;
else if (gai_error != 0)
*ec_out = make_gai_error(gai_error);
}

if (out && !cancelled.load(std::memory_order_acquire) && gai_error == 0)
*out = std::move(stored_results);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure ec_out is cleared on success.

If the caller reuses the same system::error_code, a successful resolve can leave a stale error code behind. Consider explicitly clearing it on success.

✅ Proposed fix
     if (ec_out)
     {
         if (cancelled.load(std::memory_order_acquire))
             *ec_out = capy::error::canceled;
         else if (gai_error != 0)
             *ec_out = make_gai_error(gai_error);
+        else
+            *ec_out = {};
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void
epoll_resolve_op::
operator()()
{
stop_cb.reset(); // Disconnect stop callback
if (ec_out)
{
if (cancelled.load(std::memory_order_acquire))
*ec_out = capy::error::canceled;
else if (gai_error != 0)
*ec_out = make_gai_error(gai_error);
}
if (out && !cancelled.load(std::memory_order_acquire) && gai_error == 0)
*out = std::move(stored_results);
void
epoll_resolve_op::
operator()()
{
stop_cb.reset(); // Disconnect stop callback
if (ec_out)
{
if (cancelled.load(std::memory_order_acquire))
*ec_out = capy::error::canceled;
else if (gai_error != 0)
*ec_out = make_gai_error(gai_error);
else
*ec_out = {};
}
if (out && !cancelled.load(std::memory_order_acquire) && gai_error == 0)
*out = std::move(stored_results);
🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/resolver_service.cpp` around lines 169 - 184, In
epoll_resolve_op::operator() ensure ec_out is explicitly cleared on success:
when ec_out is non-null and the operation is not cancelled and gai_error == 0,
set *ec_out = {}; then continue to move stored_results into *out; keep checks
using cancelled.load(...) and gai_error as-is so only successful resolution
paths clear the error code (refer to ec_out, cancelled, gai_error, out,
stored_results and epoll_resolve_op::operator()).

Comment on lines +250 to +285
svc_.work_started();

auto self = shared_from_this();
std::thread worker([this, self = std::move(self)]() {
struct addrinfo hints{};
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = flags_to_hints(op_.flags);

struct addrinfo* ai = nullptr;
int result = ::getaddrinfo(
op_.host.empty() ? nullptr : op_.host.c_str(),
op_.service.empty() ? nullptr : op_.service.c_str(),
&hints, &ai);

if (!op_.cancelled.load(std::memory_order_acquire))
{
if (result == 0 && ai)
{
op_.stored_results = convert_results(ai, op_.host, op_.service);
op_.gai_error = 0;
}
else
{
op_.gai_error = result;
}
}

if (ai)
::freeaddrinfo(ai);

svc_.work_finished();
svc_.post(&op_);
});
worker.detach();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "resolver_service.cpp" -type f 2>/dev/null

Repository: cppalliance/corosio

Length of output: 165


🏁 Script executed:

# Check repository structure
ls -la | head -20
git ls-files | grep -i resolver | head -20

Repository: cppalliance/corosio

Length of output: 1542


🏁 Script executed:

wc -l ./src/corosio/src/detail/epoll/resolver_service.cpp

Repository: cppalliance/corosio

Length of output: 118


🏁 Script executed:

sed -n '240,295p' ./src/corosio/src/detail/epoll/resolver_service.cpp

Repository: cppalliance/corosio

Length of output: 1412


🏁 Script executed:

rg "work_started|work_finished" ./src/corosio/src/detail/epoll/resolver_service.cpp -B2 -A2

Repository: cppalliance/corosio

Length of output: 422


🏁 Script executed:

rg "svc_\s*=" ./src/corosio/src/detail/epoll/resolver_service.cpp -B2 -A2

Repository: cppalliance/corosio

Length of output: 45


🏁 Script executed:

cat -n ./src/corosio/src/detail/epoll/resolver_service.hpp | head -100

Repository: cppalliance/corosio

Length of output: 3927


🏁 Script executed:

rg "work_started|work_finished" ./src/corosio/src/detail/epoll/resolver_service.hpp

Repository: cppalliance/corosio

Length of output: 131


🏁 Script executed:

rg "work_started|work_finished" ./include/ -l

Repository: cppalliance/corosio

Length of output: 146


🏁 Script executed:

rg "class io_context|class resolver_service" ./include/ -A10

Repository: cppalliance/corosio

Length of output: 1457


🏁 Script executed:

wc -l ./src/corosio/src/detail/epoll/resolver_service.cpp
head -50 ./src/corosio/src/detail/epoll/resolver_service.cpp

Repository: cppalliance/corosio

Length of output: 1508


🏁 Script executed:

sed -n '240,300p' ./src/corosio/src/detail/epoll/resolver_service.cpp | cat -n

Repository: cppalliance/corosio

Length of output: 2048


🏁 Script executed:

cat ./src/corosio/src/detail/epoll/resolver_service.cpp

Repository: cppalliance/corosio

Length of output: 9066


Guard thread-creation failure and reorder post() before work_finished().

std::thread construction can throw std::system_error when system resources are exhausted; if unhandled, work_started() remains unbalanced and the operation never completes. Additionally, the current code calls work_finished() before post(), creating a race where the scheduler may exit after detecting all work is done but before the completion is enqueued.

Wrap the std::thread creation in try-catch, and reorder the calls in the thread lambda so post(&op_) is invoked before work_finished(). The catch handler should call both post() and work_finished() in the same order to maintain consistency.

✅ Proposed fix
     svc_.work_started();

     auto self = shared_from_this();
-    std::thread worker([this, self = std::move(self)]() {
-        struct addrinfo hints{};
-        hints.ai_family = AF_UNSPEC;
-        hints.ai_socktype = SOCK_STREAM;
-        hints.ai_flags = flags_to_hints(op_.flags);
-
-        struct addrinfo* ai = nullptr;
-        int result = ::getaddrinfo(
-            op_.host.empty() ? nullptr : op_.host.c_str(),
-            op_.service.empty() ? nullptr : op_.service.c_str(),
-            &hints, &ai);
-
-        if (!op_.cancelled.load(std::memory_order_acquire))
-        {
-            if (result == 0 && ai)
-            {
-                op_.stored_results = convert_results(ai, op_.host, op_.service);
-                op_.gai_error = 0;
-            }
-            else
-            {
-                op_.gai_error = result;
-            }
-        }
-
-        if (ai)
-            ::freeaddrinfo(ai);
-
-        svc_.work_finished();
-        svc_.post(&op_);
-    });
-    worker.detach();
+    try
+    {
+        std::thread worker([this, self = std::move(self)]() {
+            struct addrinfo hints{};
+            hints.ai_family = AF_UNSPEC;
+            hints.ai_socktype = SOCK_STREAM;
+            hints.ai_flags = flags_to_hints(op_.flags);
+
+            struct addrinfo* ai = nullptr;
+            int result = ::getaddrinfo(
+                op_.host.empty() ? nullptr : op_.host.c_str(),
+                op_.service.empty() ? nullptr : op_.service.c_str(),
+                &hints, &ai);
+
+            if (!op_.cancelled.load(std::memory_order_acquire))
+            {
+                if (result == 0 && ai)
+                {
+                    op_.stored_results = convert_results(ai, op_.host, op_.service);
+                    op_.gai_error = 0;
+                }
+                else
+                {
+                    op_.gai_error = result;
+                }
+            }
+
+            if (ai)
+                ::freeaddrinfo(ai);
+
+            svc_.post(&op_);
+            svc_.work_finished();
+        });
+        worker.detach();
+    }
+    catch (...)
+    {
+        op_.gai_error = EAI_AGAIN;
+        svc_.post(&op_);
+        svc_.work_finished();
+    }
🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/resolver_service.cpp` around lines 250 - 285,
The thread creation can throw and work_started() must be balanced and post()
must happen before work_finished(); wrap the std::thread construction (the
worker lambda that captures shared_from_this()) in a
try-catch(std::system_error) and in the catch call svc_.post(&op_) then
svc_.work_finished() to ensure the operation completes; also change the order
inside the worker lambda to call svc_.post(&op_) before svc_.work_finished()
after resolving addresses (where you set op_.stored_results via convert_results,
op_.gai_error, freeaddrinfo, and inspect op_.cancelled).

Comment on lines 145 to 158
void resolve(
std::coroutine_handle<>,
capy::executor_ref,
std::string_view /*host*/,
std::string_view /*service*/,
resolve_flags /*flags*/,
std::string_view host,
std::string_view service,
resolve_flags flags,
std::stop_token,
system::error_code*,
resolver_results*) override
{
throw std::logic_error("epoll resolver resolve not implemented");
}
resolver_results*) override;

void cancel() noexcept;

void cancel() noexcept { /* stub */ }
epoll_resolve_op op_;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's understand the structure of the resolver implementation
find . -type f -name "*.hpp" -o -name "*.cpp" | grep -i resolver | head -20

Repository: cppalliance/corosio

Length of output: 406


🏁 Script executed:

# Look at the file in question and its context
head -200 src/corosio/src/detail/epoll/resolver_service.hpp

Repository: cppalliance/corosio

Length of output: 5901


🏁 Script executed:

# Search for resolver_impl class definition and usage
rg "class epoll_resolver_impl|class.*resolver_impl" -t cpp -A 10

Repository: cppalliance/corosio

Length of output: 3555


🏁 Script executed:

# Look for resolve() method definitions and their usage
rg "void resolve\(" -t cpp -B 2 -A 5

Repository: cppalliance/corosio

Length of output: 1807


🏁 Script executed:

# Search for any documentation about resolver concurrency or single-inflight
rg "single.*inflight|in-flight|concurrent.*resolve|resolve.*concurrent|one.*resolve" -t cpp -t hpp -t md -i

Repository: cppalliance/corosio

Length of output: 90


🏁 Script executed:

# Look for other resolver backends to see how they handle this
find . -type f \( -name "*resolver*" -o -name "*epoll*" \) | grep -E "\.(hpp|cpp)$" | head -30

Repository: cppalliance/corosio

Length of output: 406


🏁 Script executed:

# Look at the epoll resolver implementation
cat src/corosio/src/detail/epoll/resolver_service.cpp

Repository: cppalliance/corosio

Length of output: 9066


🏁 Script executed:

# Look at the IOCP (Windows) resolver to see how it handles multiple ops
cat src/corosio/src/detail/iocp/resolver_service.hpp

Repository: cppalliance/corosio

Length of output: 5088


🏁 Script executed:

# Check the public resolver API
cat include/boost/corosio/resolver.hpp

Repository: cppalliance/corosio

Length of output: 9485


🏁 Script executed:

# Look at test code for usage patterns
grep -n "resolve\|cancel" test/unit/resolver.cpp -B 2 -A 5 | head -100

Repository: cppalliance/corosio

Length of output: 2463


Add documentation or assertion for single-inflight resolve contract.

The epoll (and Windows) resolver implementations reuse a single operation object. While the public API documentation requires non-concurrent resolve calls on the same resolver instance, the implementation lacks defensive measures. Add either a clear comment on the epoll_resolver_impl class documenting the single-inflight contract, or implement an assertion/flag in resolve() to catch violations (e.g., assert(!op_in_flight) or use an atomic flag to detect overlapping calls).

🤖 Prompt for AI Agents
In `@src/corosio/src/detail/epoll/resolver_service.hpp` around lines 145 - 158,
The resolver implementation reuses a single epoll_resolve_op (op_) and lacks
protection against concurrent resolve calls; update epoll_resolver_impl to
enforce the single-inflight contract by adding a guard (e.g., an atomic<bool>
op_in_flight or a simple assert) checked at the start of resolve() and cleared
in cancel() and when the operation completes; reference op_, resolve(),
cancel(), and epoll_resolve_op when implementing so you set the flag before
starting the op and reset it on cancellation/completion (or replace with
assert(!op_in_flight) if you prefer a debug-only check), and add a short
class-level comment stating the single-inflight requirement.

@sgerbino sgerbino closed this Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement epoll resolver

2 participants