Defrag if slab 1/8 full to fix defrag didn't stop issue (#2656)

**Issue History:** 1. The flaky test issue "defrag didn't stop" was originally detected in February 2025: https://github.com/valkey-io/valkey/issues/1746 Solution for 1746: https://github.com/valkey-io/valkey/pull/1762 2. Similar issue occurred recently: https://github.com/valkey-io/valkey/actions/runs/16585350083/job/46909359496#step:5:7640 **Investigation:** 1. First, the issue occurs specifically to Active Defrag stream in cluster mode. 2. After investigating `test_stream` in `memefficiency.tcl`, I found the root cause is in defrag logic rather than the test itself - There are still failed tests with the same error even if I tried different parameters for the test. 3. Then I looked at malloc-stats and identified potential defrag issues, particularly in the 80B bin where utilization only reaches ~75% after defrag instead of the expected near 100%, while other bins show proper defrag behavior - 80B actually is the size of a new stream(confirmed in `t_stream.c`) that we add during test. 4. For 80B, after adding 200000 streams and fragmenting, `curregs `= 100030, after a lot of defrag cycles, there are still 122 nonfull-slabs/511 slabs with the remaining 446 items not defragged (average 4/nonfull-slab). **Detailed malloc-stats:** - Total slabs: 511 - Non-full slabs: 122 - Full slabs: 511-122=389 - Theoretical maximum per slab: 256 items - Allocated items in non-full slabs: 100030-389*256=446 - Average items per non-full slab: 446/122=3.66 **Root Cause:** **There are some immovable items which prevent complete defrag** **Problems in old defrag logic:** 1. The previous condition (we don't defrag if slab utilization > 'avg utilization' * 1.125), the 12.5% threshold doesn’t work well with low utilizations. - Let's imagine we have 446 items in 122 nonfull-slabs (avg 3.66 items/nonfull-slab), let's say, e.g. we have 81 slabs with 5 items each +41 slabs with 1 item each) - 12.5% threshold: 3.66*1.125=4.11 - If those 41 single items are immovable, they actually lower the average, so the rest 81 slabs will be above the threshold (5>4.11) and will not be defragged - defrag didn't stop. 2. Distribution of immovable items across slabs was causing inconsistent defragmentation and flaky test outcome. - If those 41 single items are movable, they will be moved and the avg will be 5, then 12.5% threshold: 5*1.125=5.625, so the rest 81 slabs will fall below the threshold (5<5.625) and will be defragged - defrag success. - This can explain why we got flaky defrag tests. **Final solution :** 1. Add one more condition before the old logic in `makeDefragDecision `to trigger defragmentation when slab is less than 1/8 full (1/8 threshold (12.5%) chosen to align with existing utilization threshold factor) - Ensures no low-utilization slabs left without defragged, and stabilize the defrag behavior. 2. The reason why we have immovable items and how to handle them is going to be investigate later. 3. Be sure to rebuild Valkey before testing it. **Local test result:** - Before fix: pass rate 80.8% (63/78) - After fix: Test only stream: pass rate 100% (200/200) Test the whole memefficiency.tcl: pass rate 100% (100/100) Resolves #2398 , the "defrag didn't stop" issue, with help from @JimB123 @madolson --------- Signed-off-by: Alina Liu <liusalisa6363@gmail.com> Signed-off-by: asagegeLiu <liusalisa6363@gmail.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2025-10-01 20:37:15 -07:00 · 2025-10-01 20:37:15 -07:00 · 307397904f
parent a9a65abc85
commit 307397904f
1 changed files with 15 additions and 5 deletions
--- a/src/allocator_defrag.c
+++ b/src/allocator_defrag.c
@ -349,16 +349,26 @@ unsigned long allocatorDefragGetFragSmallbins(void) {
 *    defragmentation is not necessary as moving regions is guaranteed not to change the fragmentation ratio.
 * 2. If the number of non-full slabs (bin_usage->curr_nonfull_slabs) is less than 2, defragmentation is not performed
 *    because there is no other slab to move regions to.
- * 3. If slab utilization < 'avg utilization'*1.125 [code 1.125 == (1000+UTILIZATION_THRESHOLD_FACTOR_MILLI)/1000]
+ * 3. Defrag if the slab is less than 1/8 full to ensure small slabs get defragmented even when average utilization is low.
+ *    This also handles the case when there are items that aren't defragmented skewing the average utilization. The 1/8
+ *    threshold (12.5%) was chosen to align with existing utilization threshold factor.
+ * 4. If slab utilization < 'avg utilization'*1.125 [code 1.125 == (1000+UTILIZATION_THRESHOLD_FACTOR_MILLI)/1000]
 *    than we should defrag. This is aligned with previous je_defrag_hint implementation.
 */
 static inline int makeDefragDecision(jeBinInfo *bin_info, jemallocBinUsageData *bin_usage, unsigned long nalloced) {
    unsigned long curr_full_slabs = bin_usage->curr_slabs - bin_usage->curr_nonfull_slabs;
    size_t allocated_nonfull = bin_usage->curr_regs - curr_full_slabs * bin_info->nregs;
-    if (bin_info->nregs == nalloced || bin_usage->curr_nonfull_slabs < 2 ||
-        1000 * nalloced * bin_usage->curr_nonfull_slabs > (1000 + UTILIZATION_THRESHOLD_FACTOR_MILLI) * allocated_nonfull) {
-        return 0;
-    }
+
+    /* Don't defrag if the slab is full or if there's only 1 nonfull slab */
+    if (bin_info->nregs == nalloced || bin_usage->curr_nonfull_slabs < 2) return 0;
+
+    /* Defrag if the slab is less than 1/8 full */
+    if (1000 * nalloced < bin_info->nregs * UTILIZATION_THRESHOLD_FACTOR_MILLI) return 1;
+
+    /* Don't defrag if the slab usage is greater than the average usage (+ 12.5%) */
+    if (1000 * nalloced * bin_usage->curr_nonfull_slabs > (1000 + UTILIZATION_THRESHOLD_FACTOR_MILLI) * allocated_nonfull) return 0;
+
+    /* Otherwise, defrag! */
    return 1;
 }