One Way Allocator

This is a very fast single-threaded-only memory allocator, that minimized system calls when a lot of memory allocations needs to be made to perform a task, which all of them can be freed together when the task finishes.

It has been designed to be used for netdata context queries.

For netdata to perform a context query, it builds a virtual chart, a chart that contains all the dimensions of the charts having the same context. This process requires allocating several structures for each of the dimensions to attach them to the virtual chart. All these data can be freed immediately after the query finishes.

How it works

The caller calls ONEWAYALLOC *owa = onewayalloc_create(sizehint) to create an OWA. Internally this allocates the first memory buffer with size >= sizehint. If sizehint is zero, it will allocate 1 hardware page (usually 4kb). No need to check for success or failure. As with mallocz() in netdata, a fatal() will be called if the allocation fails - although this will never fail, since Linux does not really check if there is memory available for mmap() calls.
The caller can then perform any number of the following calls to acquire memory:
- onewayalloc_mallocz(owa, size), similar to mallocz()
- onewayalloc_callocz(owa, nmemb, size), similar to callocz()
- onewayalloc_strdupz(owa, string), similar to strdupz()
- onewayalloc_memdupz(owa, ptr, size), similar to mallocz() and then memcpy()
Once the caller has done all the work with the allocated buffers, all memory allocated can be freed with onewayalloc_destroy(owa).

How faster it is?

On modern hardware, for any single query the performance improvement is marginal and not noticeable at all.

We performed the following tests using the same huge context query (1000 charts, 100 dimensions each = 100k dimensions)

using mallocz(), 1 caller, 256 queries (sequential)
using mallocz(), 256 callers, 1 query each (parallel)
using OWA, 1 caller, 256 queries (sequential)
using OWA, 256 callers, 1 query each (parallel)

Netdata was configured to use 24 web threads on the 24 core server we used.

The results are as follows:

sequential test

branch	transactions	time to complete	transaction rate	average response time	min response time	max response time
`malloc()`	256	322.35s	0.79/sec	1.26s	1.01s	1.87s
`OWA`	256	310.19s	0.83/sec	1.21s	1.04s	1.63s

For a single query, the improvement is just marginal and not noticeable at all.

parallel test

branch	transactions	time to complete	transaction rate	average response time	min response time	max response time
`malloc()`	256	84.72s	3.02/sec	68.43s	50.20s	84.71s
`OWA`	256	39.35s	6.51/sec	34.48s	20.55s	39.34s

For parallel workload, like the one executed by netdata.cloud, OWA provides a 54% overall speed improvement (more than double the overall user-experienced speed, including the data query itself).

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.

How it works​

How faster it is?​

sequential test​

parallel test​

How it works

How faster it is?

sequential test

parallel test