Rework get_ports.sh to make it not use a lock file
The get_ports.sh script is used for determining the range of ports a
given system test should use. It first determines the start of the port
range to return (the base port); it can either be specified explicitly
by the caller or chosen randomly. Subsequent ports are picked
sequentially, starting from the base port. To ensure no single port is
used by multiple tests, a state file (get_ports.state) containing the
last assigned port is maintained by the script. Concurrent access to
the state file is protected by a lock file (get_ports.lock); if one
instance of the script holds the lock file while another instance tries
to acquire it, the latter retries its attempt to acquire the lock file
after sleeping for 1 second; this retry process can be repeated up to 10
times before the script returns an error.
There are some problems with this approach:
- the sleep period in case of failure to acquire the lock file is
fixed, which leads to a "thundering herd" type of problem, where
(depending on how processes are scheduled by the operating system)
multiple system tests try to acquire the lock file at the same time
and subsequently sleep for 1 second, only for the same situation to
likely happen the next time around,
- the lock file is being locked and then unlocked for every single
port assignment made, not just once for the entire range of ports a
system test should use; in other words, the lock file is currently
locked and unlocked 13 times per system test; this increases the
odds of the "thundering herd" problem described above preventing a
system test from getting one or more ports assigned before the
maximum retry count is reached (assuming multiple system tests are
run in parallel); it also enables the range of ports used by a given
system test to be non-sequential (which is a rather cosmetic issue,
but one that can make log interpretation harder than necessary when
test failures are diagnosed),
- both issues described above cause unnecessary delays when multiple
system tests are started in parallel (due to high lock file
contention among the system tests being started),
- maintaining a state file requires ensuring proper locking, which
complicates the script's source code.
Rework the get_ports.sh script so that it assigns non-overlapping port
ranges to its callers without using a state file or a lock file:
- add a new command line switch, "-t", which takes the name of the
system test to assign ports for,
- ensure every instance of get_ports.sh knows how many ports all
system tests which form the test suite are going to need in total
(based on the number of subdirectories found in bin/tests/system/),
- in order to ensure all instances of get_ports.sh work on the same
global port range (so that no port range collisions happen), a
stable (throughout the expected run time of a single system test
suite) base port selection method is used instead of the random one;
specifically, the base port, unless specified explicitly using the
"-p" command line switch, is derived from the number of hours which
passed since the Unix Epoch time,
- use the name of the system test to assign ports for (passed via the
new "-t" command line switch) as a unique index into the global
system test range, to ensure all system tests use disjoint port
ranges.
This commit is contained in:
@@ -241,7 +241,4 @@ AM_LOG_FLAGS = -r
|
||||
|
||||
$(TESTS): run.sh
|
||||
|
||||
clean-local:
|
||||
-rm -f get_ports.state get_ports.lock
|
||||
|
||||
test-local: check
|
||||
|
||||
Reference in New Issue
Block a user