Defrag if slab 1/8 full to fix defrag didn't stop issue (#2656)

**Issue History:**
1. The flaky test issue "defrag didn't stop" was originally detected in
February 2025: https://github.com/valkey-io/valkey/issues/1746
Solution for 1746: https://github.com/valkey-io/valkey/pull/1762
2. Similar issue occurred recently:
https://github.com/valkey-io/valkey/actions/runs/16585350083/job/46909359496#step:5:7640

**Investigation:** 
1. First, the issue occurs specifically to Active Defrag stream in
cluster mode.
2. After investigating `test_stream` in `memefficiency.tcl`, I found the
root cause is in defrag logic rather than the test itself - There are
still failed tests with the same error even if I tried different
parameters for the test.
3. Then I looked at malloc-stats and identified potential defrag issues,
particularly in the 80B bin where utilization only reaches ~75% after
defrag instead of the expected near 100%, while other bins show proper
defrag behavior - 80B actually is the size of a new stream(confirmed in
`t_stream.c`) that we add during test.
4. For 80B, after adding 200000 streams and fragmenting, `curregs `=
100030, after a lot of defrag cycles, there are still 122
nonfull-slabs/511 slabs with the remaining 446 items not defragged
(average 4/nonfull-slab).

**Detailed malloc-stats:**
- Total slabs: 511
- Non-full slabs: 122
- Full slabs: 511-122=389
- Theoretical maximum per slab: 256 items
- Allocated items in non-full slabs: 100030-389*256=446
- Average items per non-full slab: 446/122=3.66

**Root Cause:** 
**There are some immovable items which  prevent complete defrag**

**Problems in old defrag logic:**
1. The previous condition (we don't defrag if slab utilization > 'avg
utilization' * 1.125), the 12.5% threshold doesn’t work well with low
utilizations.

- Let's imagine we have 446 items in 122 nonfull-slabs (avg 3.66
items/nonfull-slab), let's say, e.g. we have 81 slabs with 5 items each
+41 slabs with 1 item each)
- 12.5% threshold: 3.66*1.125=4.11
- If those 41 single items are immovable, they actually lower the
average, so the rest 81 slabs will be above the threshold (5>4.11) and
will not be defragged - defrag didn't stop.

2. Distribution of immovable items across slabs was causing inconsistent
defragmentation and flaky test outcome.

- If those 41 single items are movable, they will be moved and the avg
will be 5, then 12.5% threshold: 5*1.125=5.625, so the rest 81 slabs
will fall below the threshold (5<5.625) and will be defragged - defrag
success.
- This can explain why we got flaky defrag tests.


**Final solution :** 
1. Add one more condition before the old logic in `makeDefragDecision
`to trigger defragmentation when slab is less than 1/8 full (1/8
threshold (12.5%) chosen to align with existing utilization threshold
factor) - Ensures no low-utilization slabs left without defragged, and
stabilize the defrag behavior.
2. The reason why we have immovable items and how to handle them is
going to be investigate later.
3. Be sure to rebuild Valkey before testing it.


**Local test result:**
- Before fix: 
pass rate 80.8% (63/78)
- After fix:
Test only stream: pass rate 100% (200/200)
Test the whole memefficiency.tcl: pass rate 100% (100/100)

Resolves #2398 , the "defrag didn't stop" issue, with help from @JimB123
@madolson

---------

Signed-off-by: Alina Liu <liusalisa6363@gmail.com>
Signed-off-by: asagegeLiu <liusalisa6363@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
This commit is contained in:
Alina Liu 2025-10-01 20:37:15 -07:00 committed by GitHub
parent a9a65abc85
commit 307397904f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 15 additions and 5 deletions

View File

@ -349,16 +349,26 @@ unsigned long allocatorDefragGetFragSmallbins(void) {
* defragmentation is not necessary as moving regions is guaranteed not to change the fragmentation ratio.
* 2. If the number of non-full slabs (bin_usage->curr_nonfull_slabs) is less than 2, defragmentation is not performed
* because there is no other slab to move regions to.
* 3. If slab utilization < 'avg utilization'*1.125 [code 1.125 == (1000+UTILIZATION_THRESHOLD_FACTOR_MILLI)/1000]
* 3. Defrag if the slab is less than 1/8 full to ensure small slabs get defragmented even when average utilization is low.
* This also handles the case when there are items that aren't defragmented skewing the average utilization. The 1/8
* threshold (12.5%) was chosen to align with existing utilization threshold factor.
* 4. If slab utilization < 'avg utilization'*1.125 [code 1.125 == (1000+UTILIZATION_THRESHOLD_FACTOR_MILLI)/1000]
* than we should defrag. This is aligned with previous je_defrag_hint implementation.
*/
static inline int makeDefragDecision(jeBinInfo *bin_info, jemallocBinUsageData *bin_usage, unsigned long nalloced) {
unsigned long curr_full_slabs = bin_usage->curr_slabs - bin_usage->curr_nonfull_slabs;
size_t allocated_nonfull = bin_usage->curr_regs - curr_full_slabs * bin_info->nregs;
if (bin_info->nregs == nalloced || bin_usage->curr_nonfull_slabs < 2 ||
1000 * nalloced * bin_usage->curr_nonfull_slabs > (1000 + UTILIZATION_THRESHOLD_FACTOR_MILLI) * allocated_nonfull) {
return 0;
}
/* Don't defrag if the slab is full or if there's only 1 nonfull slab */
if (bin_info->nregs == nalloced || bin_usage->curr_nonfull_slabs < 2) return 0;
/* Defrag if the slab is less than 1/8 full */
if (1000 * nalloced < bin_info->nregs * UTILIZATION_THRESHOLD_FACTOR_MILLI) return 1;
/* Don't defrag if the slab usage is greater than the average usage (+ 12.5%) */
if (1000 * nalloced * bin_usage->curr_nonfull_slabs > (1000 + UTILIZATION_THRESHOLD_FACTOR_MILLI) * allocated_nonfull) return 0;
/* Otherwise, defrag! */
return 1;
}