Skip to content

test_dtrace can hang indefinitely when bpftrace usability check times out #151496

@StanFromIreland

Description

@StanFromIreland

Bug report

Bug description:

From this CentOS Buildbot run:

0:25:18 load avg: 2.68 [504/505] test_zipfile passed (3 min 58 sec) -- running (1): test_dtrace (22 min 49 sec)
0:25:48 load avg: 1.63 running (1): test_dtrace (23 min 19 sec)
0:26:18 load avg: 0.98 running (1): test_dtrace (23 min 49 sec)
0:26:48 load avg: 0.60 running (1): test_dtrace (24 min 19 sec)
0:27:18 load avg: 0.36 running (1): test_dtrace (24 min 49 sec)
0:27:48 load avg: 0.22 running (1): test_dtrace (25 min 19 sec)
0:28:18 load avg: 0.13 running (1): test_dtrace (25 min 49 sec)
0:28:48 load avg: 0.08 running (1): test_dtrace (26 min 19 sec)
0:29:18 load avg: 0.05 running (1): test_dtrace (26 min 49 sec)
0:29:48 load avg: 0.03 running (1): test_dtrace (27 min 19 sec)
0:30:18 load avg: 0.02 running (1): test_dtrace (27 min 49 sec)
0:30:48 load avg: 0.01 running (1): test_dtrace (28 min 19 sec)
0:31:18 load avg: 0.00 running (1): test_dtrace (28 min 49 sec)
0:31:48 load avg: 0.00 running (1): test_dtrace (29 min 19 sec)
0:32:18 load avg: 0.00 running (1): test_dtrace (29 min 49 sec)
0:32:48 load avg: 0.00 running (1): test_dtrace (30 min 19 sec)
0:33:18 load avg: 0.00 running (1): test_dtrace (30 min 49 sec)
0:33:48 load avg: 0.14 running (1): test_dtrace (31 min 19 sec)
0:34:18 load avg: 0.08 running (1): test_dtrace (31 min 49 sec)
0:34:48 load avg: 0.05 running (1): test_dtrace (32 min 19 sec)
0:35:18 load avg: 0.03 running (1): test_dtrace (32 min 49 sec)
0:35:48 load avg: 0.02 running (1): test_dtrace (33 min 19 sec)
0:36:18 load avg: 0.01 running (1): test_dtrace (33 min 49 sec)
0:36:48 load avg: 0.00 running (1): test_dtrace (34 min 19 sec)
0:37:18 load avg: 0.00 running (1): test_dtrace (34 min 49 sec)
0:37:48 load avg: 0.00 running (1): test_dtrace (35 min 19 sec)
0:38:18 load avg: 0.00 running (1): test_dtrace (35 min 49 sec)
0:38:48 load avg: 0.00 running (1): test_dtrace (36 min 19 sec)
0:39:18 load avg: 0.00 running (1): test_dtrace (36 min 49 sec)
0:39:48 load avg: 0.45 running (1): test_dtrace (37 min 19 sec)
0:40:18 load avg: 2.92 running (1): test_dtrace (37 min 49 sec)
0:40:48 load avg: 3.79 running (1): test_dtrace (38 min 19 sec)
0:41:18 load avg: 4.79 running (1): test_dtrace (38 min 49 sec)
0:41:48 load avg: 4.97 running (1): test_dtrace (39 min 19 sec)
0:42:18 load avg: 5.75 running (1): test_dtrace (39 min 49 sec)
0:42:48 load avg: 6.48 running (1): test_dtrace (40 min 19 sec)
0:43:18 load avg: 6.59 running (1): test_dtrace (40 min 49 sec)
0:43:48 load avg: 7.63 running (1): test_dtrace (41 min 19 sec)
0:44:18 load avg: 8.18 running (1): test_dtrace (41 min 49 sec)
0:44:48 load avg: 8.63 running (1): test_dtrace (42 min 19 sec)
0:45:18 load avg: 9.51 running (1): test_dtrace (42 min 49 sec)
0:45:48 load avg: 9.46 running (1): test_dtrace (43 min 19 sec)
0:46:18 load avg: 9.73 running (1): test_dtrace (43 min 49 sec)
0:46:48 load avg: 9.57 running (1): test_dtrace (44 min 19 sec)
0:47:18 load avg: 10.71 running (1): test_dtrace (44 min 49 sec)
0:47:29 load avg: 10.19 [505/505/1] test_dtrace worker non-zero exit code (Exit code 1)
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
setUpClass (test.test_dtrace.BPFTraceNormalTests) ... skipped 'bpftrace(1) failed with code 1: ERROR: bpftrace currently only supports running as the root user.\n'
Timeout (0:45:00)!
Thread 0x00007fc88325ab80 [python] (most recent call first):
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/selectors.py", line 398 in select
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/subprocess.py", line 291 in _communicate_io_posix
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/subprocess.py", line 2385 in _communicate
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/subprocess.py", line 1389 in communicate
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/test_dtrace.py", line 255 in assert_usable
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/test_dtrace.py", line 296 in setUpClass
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 166 in _handleClassSetUp
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 114 in run
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/runner.py", line 256 in run
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 84 in _run_suite
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 42 in run_unittest
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 162 in test_func
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/refleak.py", line 145 in runtest_refleak
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 114 in regrtest_runner
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 165 in _load_run_test
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 210 in _runtest_env_changed_exc
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 319 in _runtest
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 348 in run_single_test
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/worker.py", line 99 in worker_process
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/worker.py", line 137 in main
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/worker.py", line 141 in <module>
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/runpy.py", line 87 in _run_code
  File "/home/buildbot/buildarea/3.15.itamaro-centos-aws.refleak.nogil/build/Lib/runpy.py", line 201 in _run_module_as_main

Looking at the test:

# Check if bpftrace is available and can attach to USDT probes
program = f'usdt:{sys.executable}:python:function__entry {{ printf("probe: success\\n"); exit(); }}'
try:
proc = subprocess.Popen(
["bpftrace", "-e", program, "-c", f"{sys.executable} -c pass"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)
stdout, stderr = proc.communicate(timeout=10)
except subprocess.TimeoutExpired:
proc.kill()
proc.communicate() # Clean up
raise unittest.SkipTest("bpftrace timed out during usability check")
except OSError as e:
raise unittest.SkipTest(f"bpftrace not available: {e}")

I'm guessing, after proc.communicate(timeout=10) times out, proc.kill() only kills bpftrace while its forked child survives holding the pipe open, so the unbounded cleanup proc.communicate() blocks until the test times out.

Likely introduced by 9a268e3, CC @pablogsal

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    testsTests in the Lib/test dirtype-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions