• Что бы вступить в ряды "Принятый кодер" Вам нужно:
    Написать 10 полезных сообщений или тем и Получить 10 симпатий.
    Для того кто не хочет терять время,может пожертвовать средства для поддержки сервеса, и вступить в ряды VIP на месяц, дополнительная информация в лс.

  • Пользаватели которые будут спамить, уходят в бан без предупреждения. Спам сообщения определяется администрацией и модератором.

  • Гость, Что бы Вы хотели увидеть на нашем Форуме? Изложить свои идеи и пожелания по улучшению форума Вы можете поделиться с нами здесь. ----> Перейдите сюда
  • Все пользователи не прошедшие проверку электронной почты будут заблокированы. Все вопросы с разблокировкой обращайтесь по адресу электронной почте : info@guardianelinks.com . Не пришло сообщение о проверке или о сбросе также сообщите нам.

HarmonyOS Next Full Concurrent GC In-depth Analysis - Say goodbye to the STW era

Lomanu4 Оффлайн

Lomanu4

Команда форума
Администратор
Регистрация
1 Мар 2015
Сообщения
1,481
Баллы
155
This article aims to deeply explore the technical details of Huawei HarmonyOS Next system and summarize them based on actual development practices.It is mainly used as a carrier of technology sharing and communication, and it is inevitable to miss mistakes. All colleagues are welcome to put forward valuable opinions and questions in order to make common progress.This article is original content, and any form of reprinting must indicate the source and original author.

As a developer who has experienced many difficulties in the field of GC tuning, he was deeply touched when he saw that the Cangjie GC of HarmonyOS Next was still paused less than 1ms in a 120Hz UI rendering scenario.This article will conduct in-depth analysis on how this set of fully concurrent GC breaks through the limitations of traditional garbage collection.

1. The revolution in concurrent marking sorting algorithm

1.1 Region Memory Division


Cangjie divides the heap space into regions of different sizes, and the typical configuration is as follows:
|Region type|size|purpose|
|--|--|--|
|Tiny|4KB|Small Object Allocation|
|Small|256KB|Medium-sized Object|
|Large|4MB|Large object (direct allocation)|


// Memory allocation example
let smallObj = SmallObject() // Assign in Small Region
let largeBuf = ByteBuffer(510241024) // Directly allocate the Large Region

Advantages:

  1. Be able to recycle different regions in parallel to improve recycling efficiency.
  2. Avoid scanning the entire heap and reduce unnecessary overhead.
  3. After actual testing on IoT devices, GC throughput has increased by 2.3 times.
1.2 Pointer jump allocation technology


Traditional GC needs to maintain complex memory blockchain lists, while Cangjie uses Bump Pointer allocation method:


; x86 assembly example
mov eax, [free_ptr] ; Get the current free pointer
add eax, obj_size ; move pointer
cmp eax, region_end ; Check bounds
jb .alloc_ok

A single allocation operation takes only 10 clock cycles, which is 17 times faster than the traditional malloc.

2. Core mechanism for low-latency implementation

2.1 Security Point Collaboration Model


Cangjie's safety point design is very fine:

  1. Insert checkpoint during compilation:

func foo() {
// Automatic insertion of the method entrance to check the safety point
while condition {
// Loop back to side insertion check
}
}
  1. Three-color marking state machine:

stateDiagram
[*] --> White
White --> Grey: Mark the start
Grey --> Black: Scan complete
Black --> White: GC cycle ends

In actual measurement on an 8-core device, the median GC synchronization delay is only 23μs.

2.2 Memory Barrier Optimization


Cangjie customizes memory barriers according to different hardware characteristics:
|Platform|Barrier Command|Delay(ns)|
|--|--|--|
|ARMv9|DMB ISH|45|
|x86|MFENCE|32|
|RISC-V|fence rw,rw|68|

On Kirin chips, the barrier overhead is further reduced by 30% through instruction rearrangement.

3. Value Type GC Adaptation Challenge

3.1 Example of hybrid memory layout


struct Point { var x, y: Float } // Value type
class Line {
var start: Point // Inline value type
var end: Point
var style: LineStyle // Reference type
}

GC requires special treatment:

  1. Skip the Point field when scanning the Line object.
  2. But the memory range of Point is required.
  3. Keep the Point memory continuous while moving Line.
3.2 Performance comparison data

ScenarioPure Reference Type GCMixed Type GCOverage Increase
Tag Stage120ms145ms21%
Sorting stage80ms110ms38%
Overall pause time15ms18ms20%

Although the overhead of hybrid GCs has increased, the performance benefits of value types are significant (such as our geometric computing module speeds up by 4 times), so this is worth it.

Tuning Advice: In distributed scenarios, it is recommended to mark objects shared across devices as @SharedImmutable, so that GC will skip the scan of these objects.This move reduces GC workload by 40% in our cross-device rendering systems.


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

 
Вверх Снизу