log2journal

log2journal and systemd-cat-native can be used to convert a structured log file, such as the ones generated by web servers, into systemd-journal entries.

By combining these tools you can create advanced log processing pipelines sending any kind of structured text logs to systemd-journald. This is a simple, but powerful and efficient way to handle log processing.

The process involves the usual piping of shell commands, to get and process the log files in realtime.

The result is like this: nginx logs into systemd-journal:

The overall process looks like this:

tail -F /var/log/nginx/*.log       |\  # outputs log lines
  log2journal 'PATTERN'            |\  # outputs Journal Export Format
  systemd-cat-native                   # send to local/remote journald

These are the steps:

tail -F /var/log/nginx/*.log
this command will tail all *.log files in /var/log/nginx/. We use -F instead of -f to ensure that files will still be tailed after log rotation.
log2joural is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of log2journal is in Systemd Journal Export Format, and it looks like this:
```
KEY1=VALUE1 # << start of the first log line
KEY2=VALUE2
           # << log lines separator
KEY1=VALUE1 # << start of the second log line
KEY2=VALUE2
```
systemd-cat-native is a Netdata program. I can send the logs to a local systemd-journald (journal namespaces supported), or to a remote systemd-journal-remote.

Processing pipeline

The sequence of processing in Netdata's log2journal is designed to methodically transform and prepare log data for export in the systemd Journal Export Format. This transformation occurs through a pipeline of stages, each with a specific role in processing the log entries. Here's a description of each stage in the sequence:

Input
The tool reads one log line at a time from the input source. It supports different input formats such as JSON, logfmt, and free-form logs defined by PCRE2 patterns.
Extract Fields and Values
Based on the input format (JSON, logfmt, or custom pattern), it extracts fields and their values from each log line. In the case of JSON and logfmt, it automatically extracts all fields. For custom patterns, it uses PCRE2 regular expressions, and fields are extracted based on sub-expressions defined in the pattern.
Transliteration
Extracted fields are transliterated to the limited character set accepted by systemd-journal: capitals A-Z, digits 0-9, underscores.
Apply Optional Prefix
If a prefix is specified, it is added to all keys. This happens before any other processing so that all subsequent matches and manipulations take the prefix into account.
Rename Fields
Renames fields as specified in the configuration. This is used to change the names of the fields to match desired or required naming conventions.
Inject New Fields
New fields are injected into the log data. This can include constants or values derived from other fields, using variable substitution.
Rewrite Field Values
Applies rewriting rules to alter the values of the fields. This can involve complex transformations, including regular expressions and variable substitutions. The rewrite rules can also inject new fields into the data.
Filter Fields
Fields are filtered based on include and exclude patterns. This stage selects which fields are to be sent to the journal, allowing for selective logging.
Output
Finally, the processed log data is output in the Journal Export Format. This format is compatible with systemd's journaling system and can be sent to local or remote systemd journal systems, by piping the output of log2journal to systemd-cat-native.

This pipeline ensures a flexible and comprehensive approach to log processing, allowing for a wide range of modifications and customizations to fit various logging requirements. Each stage builds upon the previous one, enabling complex log transformations and enrichments before the data is exported to the systemd journal.

Real-life example

We have an nginx server logging in this standard combined log format:

        log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

Extracting fields with a pattern

First, let's find the right pattern for log2journal. We ask ChatGPT:

My nginx log uses this log format:

log_format access '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

I want to use `log2joural` to convert this log for systemd-journal.
`log2journal` accepts a PCRE2 regular expression, using the named groups
in the pattern as the journal fields to extract from the logs.

Please give me the PCRE2 pattern to extract all the fields from my nginx
log files.

ChatGPT replies with this:

  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

Let's see what the above says:

(?x): enable PCRE2 extended mode. In this mode spaces and newlines in the pattern are ignored. To match a space you have to use \s. This mode allows us to split the pattern is multiple lines and add comments to it.
^: match the beginning of the line
(?<remote_addr[^ ]+): match anything up to the first space ([^ ]+), and name it remote_addr.
\s: match a space
-: match a hyphen
and so on...

We edit nginx.yaml and add it, like this:

pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

Let's test it with a sample line (instead of tail):

# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
BODY_BYTES_SENT=4172
HTTP_REFERER=-
HTTP_USER_AGENT=Go-http-client/1.1
REMOTE_ADDR=1.2.3.4
REMOTE_USER=-
REQUEST=GET /index.html HTTP/1.1
REQUEST_METHOD=GET
REQUEST_URI=/index.html
SERVER_PROTOCOL=HTTP/1.1
STATUS=200
TIME_LOCAL=19/Nov/2023:00:24:43 +0000

As you can see, it extracted all the fields and made them capitals, as systemd-journal expects them.

Prefixing field names

To make sure the fields are unique for nginx and do not interfere with other applications, we should prefix them with NGINX_:

pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

prefix: 'NGINX_'                    # <<< we added this

And let's try it:

# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST=GET /index.html HTTP/1.1
NGINX_REQUEST_METHOD=GET
NGINX_REQUEST_URI=/index.html
NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000

Renaming fields

Now, all fields start with NGINX_ but we want NGINX_REQUEST to be the MESSAGE of the log line, as we will see it by default in journalctl and the Netdata dashboard. Let's rename it:

pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

prefix: 'NGINX_'

rename:                         # <<< we added this
  - new_key: MESSAGE            # <<< we added this
    old_key: NGINX_REQUEST      # <<< we added this

Let's test it:

# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1          # <<< renamed !
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_METHOD=GET
NGINX_REQUEST_URI=/index.html
NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000

Injecting new fields

To have a complete message in journals we need 3 fields: MESSAGE, PRIORITY and SYSLOG_IDENTIFIER. We have already added MESSAGE by renaming NGINX_REQUEST. We can also inject a SYSLOG_IDENTIFIER and PRIORITY.

Ideally, we would want the 5xx errors to be red in our journalctl output and the dashboard. To achieve that we need to set the PRIORITY field to the right log level. Log priorities are numeric and follow the syslog priorities. Checking /usr/include/sys/syslog.h we can see these:

#define LOG_EMERG       0       /* system is unusable */
#define LOG_ALERT       1       /* action must be taken immediately */
#define LOG_CRIT        2       /* critical conditions */
#define LOG_ERR         3       /* error conditions */
#define LOG_WARNING     4       /* warning conditions */
#define LOG_NOTICE      5       /* normal but significant condition */
#define LOG_INFO        6       /* informational */
#define LOG_DEBUG       7       /* debug-level messages */

Avoid setting priority to 0 (LOG_EMERG), because these will be on your terminal (the journal uses wall to let you know of such events). A good priority for errors is 3 (red), or 4 (yellow).

To set the PRIORITY field in the output, we can use NGINX_STATUS. We will do this in 2 steps: a) inject the priority field as a copy is NGINX_STATUS and then b) use a pattern on its value to rewrite it to the priority level we want.

First, let's inject SYSLOG_IDENTIFIER and PRIORITY:

pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

prefix: 'NGINX_'

rename:                         
  - new_key: MESSAGE            
    old_key: NGINX_REQUEST      

inject:                         # <<< we added this
  - key: PRIORITY               # <<< we added this
    value: '${NGINX_STATUS}'    # <<< we added this
    
  - key: SYSLOG_IDENTIFIER      # <<< we added this
    value: 'nginx-log'          # <<< we added this

Let's see what this does:

# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_METHOD=GET
NGINX_REQUEST_URI=/index.html
NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
PRIORITY=200                         # <<< PRIORITY added
SYSLOG_IDENTIFIER=nginx-log          # <<< SYSLOG_IDENTIFIER added

Rewriting field values

Now we need to rewrite PRIORITY to the right syslog level based on its value (NGINX_STATUS). We will assign the priority 6 (info) when the status is 1xx, 2xx, 3xx, priority 5 (notice) when status is 4xx, priority 3 (error) when status is 5xx and anything else will go to priority 4 (warning). Let's do it:

pattern: |
  (?x) # Enable PCRE2 extended mode
  ^
  (?<remote_addr>[^ ]+) \s - \s
  (?<remote_user>[^ ]+) \s
  \[
    (?<time_local>[^\]]+)
  \]
  \s+ "
  (?<request>
    (?<request_method>[A-Z]+) \s+
    (?<request_uri>[^ ]+) \s+
    (?<server_protocol>[^"]+)
  )
  " \s+
  (?<status>\d+) \s+
  (?<body_bytes_sent>\d+) \s+
  "(?<http_referer>[^"]*)" \s+
  "(?<http_user_agent>[^"]*)"

prefix: 'NGINX_'

rename:                         
  - new_key: MESSAGE            
    old_key: NGINX_REQUEST      

inject:                         
  - key: PRIORITY               
    value: '${NGINX_STATUS}'    

rewrite:                        # <<< we added this
  - key: PRIORITY               # <<< we added this
    match: '^[123]'             # <<< we added this
    value: 6                    # <<< we added this

  - key: PRIORITY               # <<< we added this
    match: '^4'                 # <<< we added this
    value: 5                    # <<< we added this

  - key: PRIORITY               # <<< we added this
    match: '^5'                 # <<< we added this
    value: 3                    # <<< we added this

  - key: PRIORITY               # <<< we added this
    match: '.*'                 # <<< we added this
    value: 4                    # <<< we added this

Rewrite rules are processed in order and the first matching a field, stops by default processing for this field. This is why the last rule, that matches everything does not always change the priority to 4.

Let's test it:

# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_METHOD=GET
NGINX_REQUEST_URI=/index.html
NGINX_SERVER_PROTOCOL=HTTP/1.1
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
PRIORITY=6                           # <<< PRIORITY rewritten here
SYSLOG_IDENTIFIER=nginx-log

Rewrite rules are powerful. You can have named groups in them, like in the main pattern, to extract sub-fields from them, which you can then use in variable substitution. You can use rewrite rules to anonymize the URLs, e.g to remove customer IDs or transaction details from them.

Sending logs to systemd-journal

Now the message is ready to be sent to a systemd-journal. For this we use systemd-cat-native. This command can send such messages to a journal running on the localhost, a local journal namespace, or a systemd-journal-remote running on another server. By just appending | systemd-cat-native to the command, the message will be sent to the local journal.

# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 "-" "Go-http-client/1.1"' | log2journal -f nginx.yaml | systemd-cat-native 
# no output

# let's find the message
# journalctl -r -o verbose SYSLOG_IDENTIFIER=nginx-log
Wed 2023-12-06 13:23:07.083299 EET [s=5290f0133f25407aaa1e2c451c0e4756;i=57194;b=0dfa96ecc2094cecaa8ec0efcb93b865;m=b133308867;t=60bd59346a289;x=5c1bdacf2b9c4bbd]
    PRIORITY=6
    _UID=0
    _GID=0
    _CAP_EFFECTIVE=1ffffffffff
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=0dfa96ecc2094cecaa8ec0efcb93b865
    _MACHINE_ID=355c8eca894d462bbe4c9422caf7a8bb
    _HOSTNAME=lab-logtest-src
    _RUNTIME_SCOPE=system
    _TRANSPORT=journal
    MESSAGE=GET /index.html HTTP/1.1
    NGINX_BODY_BYTES_SENT=4172
    NGINX_HTTP_REFERER=-
    NGINX_HTTP_USER_AGENT=Go-http-client/1.1
    NGINX_REMOTE_ADDR=1.2.3.4
    NGINX_REMOTE_USER=-
    NGINX_REQUEST_METHOD=GET
    NGINX_REQUEST_URI=/index.html
    NGINX_SERVER_PROTOCOL=HTTP/1.1
    NGINX_STATUS=200
    NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
    SYSLOG_IDENTIFIER=nginx-log
    _PID=114343
    _COMM=systemd-cat-nat
    _AUDIT_SESSION=253
    _AUDIT_LOGINUID=1000
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/session-253.scope
    _SYSTEMD_SESSION=253
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=session-253.scope
    _SYSTEMD_SLICE=user-1000.slice
    _SYSTEMD_USER_SLICE=-.slice
    _SYSTEMD_INVOCATION_ID=c59e33ead8c24880b027e317b89f9f76
    _SOURCE_REALTIME_TIMESTAMP=1701861787083299

So, the log line, with all its fields parsed, ended up in systemd-journal. Now we can send all the nginx logs to systemd-journal like this:

tail -F /var/log/nginx/access.log |\
   log2journal -f nginx.yaml |\
   systemd-cat-native

Best practices

Create a systemd service unit: Add the above commands to a systemd unit file. When you run it in a systemd unit file you will be able to start/stop it and also see its status. Furthermore you can use the LogNamespace= directive of systemd service units to isolate your nginx logs from the logs of the rest of the system. Here is how to do it:

Create the file /etc/systemd/system/nginx-logs.service (change /path/to/nginx.yaml to the right path):

[Unit]
Description=NGINX Log to Systemd Journal
After=network.target

[Service]
ExecStart=/bin/sh -c 'tail -F /var/log/nginx/access.log | log2journal -f /path/to/nginx.yaml' | systemd-cat-native
LogNamespace=nginx-logs
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Reload systemd to grab this file:

sudo systemctl daemon-reload

Enable and start the service:

sudo systemctl enable nginx-logs.service
sudo systemctl start nginx-logs.service

To see the logs of the namespace, use:

journalctl -f --namespace=nginx-logs

Netdata will automatically pick the new namespace and present it at the list of sources of the dashboard.

You can also instruct systemd-cat-native to log to a remote system, sending the logs to a systemd-journal-remote instance running on another server. Check the manual of systemd-cat-native.

Performance

log2journal and systemd-cat-native have been designed to process hundreds of thousands of log lines per second. They both utilize high performance indexing hashtables to speed up lookups, and queues that dynamically adapt to the number of log lines offered, offering a smooth and fast experience under all conditions.

In our tests, the combined CPU utilization of log2journal and systemd-cat-native versus promtail with similar configuration is 1 to 5. So, log2journal and systemd-cat-native combined, are 5 times faster than promtail.

PCRE2 patterns

The key characteristic that can influence the performance of a logs processing pipeline using these tools, is the quality of the PCRE2 patterns used. Poorly created PCRE2 patterns can make processing significantly slower, or CPU consuming.

Especially the pattern .* seems to have the biggest impact on CPU consumption, especially when multiple .* are on the same pattern.

Usually we use .* to indicate that we need to match everything up to a character, e.g. .* to match up to a space. By replacing it with [^ ]+ (meaning: match at least a character up to a space), the regular expression engine can be a lot more efficient, reducing the overall CPU utilization significantly.

Performance of systemd journals

The ingestion pipeline of logs, from tail to systemd-journald or systemd-journal-remote is very efficient in all aspects. CPU utilization is better than any other system we tested and RAM usage is independent of the number of fields indexed, making systemd-journal one of the most efficient log management engines for ingesting high volumes of structured logs.

High fields cardinality does not have a noticable impact on systemd-journal. The amount of fields indexed and the amount of unique values per field, have a linear and predictable result in the resource utilization of systemd-journald and systemd-journal-remote. This is unlike other logs management solutions, like Loki, that their RAM requirements grow exponentially as the cardinality increases, making it impractical for them to index the amount of information systemd journals can index.

However, the number of fields added to journals influences the overall disk footprint. Less fields means more log entries per journal file, smaller overall disk footprint and faster queries.

systemd-journal files are primarily designed for security and reliability. This comes at the cost of disk footprint. The internal structure of journal files is such that in case of corruption, minimum data loss will incur. To achieve such a unique characteristic, certain data within the files need to be aligned at predefined boundaries, so that in case there is a corruption, non-corrupted parts of the journal file can be recovered.

Despite the fact that systemd-journald employees several techniques to optimize disk footprint, like deduplication of log entries, shared indexes for fields and their values, compression of long log entries, etc. the disk footprint of journal files is generally 10x more compared to other monitoring solutions, like Loki.

This can be improved by storing journal files in a compressed filesystem. In our tests, a compressed filesystem can save up to 75% of the space required by journal files. The journal files will still be bigger than the overall disk footprint of other solutions, but the flexibility (index any number of fields), reliability (minimal potential data loss) and security (tampering protection and sealing) features of systemd-journal justify the difference.

When using versions of systemd prior to 254 and you are centralizing logs to a remote system, systemd-journal-remote creates very small files (32MB). This results in increased duplication of information across the files, increasing the overall disk footprint. systemd versions 254+, added options to systemd-journal-remote to control the max size per file. This can significantly reduce the duplication of information.

Another limitation of the systemd-journald ecosystem is the uncompressed transmission of logs across systems. systemd-journal-remote up to version 254 that we tested, accepts encrypted, but uncompressed data. This means that when centralizing logs to a logs server, the bandwidth required will be increased compared to other log management solution.

Security Considerations

log2journal and systemd-cat-native are used to convert log files to structured logs in the systemd-journald ecosystem.

Systemd-journal is a logs management solution designed primarily for security and reliability. When configured properly, it can reliably and securely store your logs, ensuring they will available and unchanged for as long as you need them.

When sending logs to a remote system, systemd-cat-native can be configured the same way systemd-journal-upload is configured, using HTTPS and private keys to encrypt and secure their transmission over the network.

When dealing with sensitive logs, organizations usually follow 2 strategies:

Anonymize the logs before storing them, so that the stored logs do not have any sensitive information.
Store the logs in full, including sensitive information, and carefully control who and how has access to them.

Netdata can help in both cases.

If you want to anonymize the logs before storing them, use rewriting rules at the log2journal phase to remove sensitive information from them. This process usually means matching the sensitive part and replacing with XXX or CUSTOMER_ID, or CREDIT_CARD_NUMBER, so that the resulting log entries stored in journal files will not include any such sensitive information.

If on other hand your organization prefers to maintain the full logs and control who and how has access on them, use Netdata Cloud to assign roles to your team members and control which roles can access the journal logs in your environment.

`log2journal` options

Netdata log2journal v1.43.0-341-gdac4df856

Convert logs to systemd Journal Export Format.

 - JSON logs: extracts all JSON fields.
 - logfmt logs: extracts all logfmt fields.
 - free-form logs: uses PCRE2 patterns to extracts fields.

Usage: ./log2journal [OPTIONS] PATTERN|json

Options:

  --file /path/to/file.yaml or -f /path/to/file.yaml
       Read yaml configuration file for instructions.

  --config CONFIG_NAME or -c CONFIG_NAME
       Run with the internal YAML configuration named CONFIG_NAME.
       Available internal YAML configs:

       nginx-combined nginx-json default 

--------------------------------------------------------------------------------
  INPUT PROCESSING

  PATTERN
       PATTERN should be a valid PCRE2 regular expression.
       RE2 regular expressions (like the ones usually used in Go applications),
       are usually valid PCRE2 patterns too.
       Sub-expressions without named groups are evaluated, but their matches are
       not added to the output.

     - JSON mode
       JSON mode is enabled when the pattern is set to: json
       Field names are extracted from the JSON logs and are converted to the
       format expected by Journal Export Format (all caps, only _ is allowed).

     - logfmt mode
       logfmt mode is enabled when the pattern is set to: logfmt
       Field names are extracted from the logfmt logs and are converted to the
       format expected by Journal Export Format (all caps, only _ is allowed).

       All keys extracted from the input, are transliterated to match Journal
       semantics (capital A-Z, digits 0-9, underscore).

       In a YAML file:
       ```yaml
       pattern: 'PCRE2 pattern | json | logfmt'
       ```

--------------------------------------------------------------------------------
  GLOBALS

  --prefix PREFIX
       Prefix all fields with PREFIX. The PREFIX is added before any other
       processing, so that the extracted keys have to be matched with the PREFIX in
       them. PREFIX is NOT transliterated and it is assumed to be systemd-journal
       friendly.

       In a YAML file:
       ```yaml
       prefix: 'PREFIX_' # prepend all keys with this prefix.
       ```

  --filename-key KEY
       Add a field with KEY as the key and the current filename as value.
       Automatically detects filenames when piped after 'tail -F',
       and tail matches multiple filenames.
       To inject the filename when tailing a single file, use --inject.

       In a YAML file:
       ```yaml
       filename:
         key: KEY
       ```

--------------------------------------------------------------------------------
  RENAMING OF KEYS

  --rename NEW=OLD
       Rename fields. OLD has been transliterated and PREFIX has been added.
       NEW is assumed to be systemd journal friendly.

       Up to 512 renaming rules are allowed.

       In a YAML file:
       ```yaml
       rename:
         - new_key: KEY1
           old_key: KEY2 # transliterated with PREFIX added
         - new_key: KEY3
           old_key: KEY4 # transliterated with PREFIX added
         # add as many as required
       ```

--------------------------------------------------------------------------------
  INJECTING NEW KEYS

  --inject KEY=VALUE
       Inject constant fields to the output (both matched and unmatched logs).
       --inject entries are added to unmatched lines too, when their key is
       not used in --inject-unmatched (--inject-unmatched override --inject).
       VALUE can use variable like ${OTHER_KEY} to be replaced with the values
       of other keys available.

       Up to 512 fields can be injected.

       In a YAML file:
       ```yaml
       inject:
         - key: KEY1
           value: 'VALUE1'
         - key: KEY2
           value: '${KEY3}${KEY4}' # gets the values of KEY3 and KEY4
         # add as many as required
       ```

--------------------------------------------------------------------------------
  REWRITING KEY VALUES

  --rewrite KEY=/MATCH/REPLACE[/OPTIONS]
       Apply a rewrite rule to the values of a specific key.
       The first character after KEY= is the separator, which should also
       be used between the MATCH, REPLACE and OPTIONS.

       OPTIONS can be a comma separated list of `non-empty`, `dont-stop` and
       `inject`.

       When `non-empty` is given, MATCH is expected to be a variable
       substitution using `${KEY1}${KEY2}`. Once the substitution is completed
       the rule is matching the KEY only if the result is not empty.
       When `non-empty` is not set, the MATCH string is expected to be a PCRE2
       regular expression to be checked against the KEY value. This PCRE2
       pattern may include named groups to extract parts of the KEY's value.

       REPLACE supports variable substitution like `${variable}` against MATCH
       named groups (when MATCH is a PCRE2 pattern) and `${KEY}` against the
       keys defined so far.

       Example:
              --rewrite DATE=/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/
                             ${day}/${month}/${year}
       The above will rewrite dates in the format YYYY-MM-DD to DD/MM/YYYY.

       Only one rewrite rule is applied per key; the sequence of rewrites for a
       given key, stops once a rule matches it. This allows providing a sequence
       of independent rewriting rules for the same key, matching the different
       values the key may get, and also provide a catch-all rewrite rule at the
       end, for setting the key value if no other rule matched it. The rewrite
       rule can allow processing more rewrite rules when OPTIONS includes
       the keyword 'dont-stop'.

       Up to 512 rewriting rules are allowed.

       In a YAML file:
       ```yaml
       rewrite:
         # the order if these rules in important - processed top to bottom
         - key: KEY1
           match: 'PCRE2 PATTERN WITH NAMED GROUPS'
           value: 'all match fields and input keys as ${VARIABLE}'
           inject: BOOLEAN # yes = inject the field, don't just rewrite it
           stop: BOOLEAN # no = continue processing, don't stop if matched
         - key: KEY2
           non_empty: '${KEY3}${KEY4}' # match only if this evaluates to non empty
           value: 'all input keys as ${VARIABLE}'
           inject: BOOLEAN # yes = inject the field, don't just rewrite it
           stop: BOOLEAN # no = continue processing, don't stop if matched
         # add as many rewrites as required
       ```

       By default rewrite rules are applied only on fields already defined.
       This allows shipping YAML files that include more rewrites than are
       required for a specific input file.
       Rewrite rules however allow injecting new fields when OPTIONS include
       the keyword `inject` or in YAML `inject: yes` is given.

       MATCH on the command line can be empty to define an unconditional rule.
       Similarly, `match` and `non_empty` can be omitted in the YAML file.
--------------------------------------------------------------------------------
  UNMATCHED LINES

  --unmatched-key KEY
       Include unmatched log entries in the output with KEY as the field name.
       Use this to include unmatched entries to the output stream.
       Usually it should be set to --unmatched-key=MESSAGE so that the
       unmatched entry will appear as the log message in the journals.
       Use --inject-unmatched to inject additional fields to unmatched lines.

       In a YAML file:
       ```yaml
       unmatched:
         key: MESSAGE  # inject the error log as MESSAGE
       ```

  --inject-unmatched LINE
       Inject lines into the output for each unmatched log entry.
       Usually, --inject-unmatched=PRIORITY=3 is needed to mark the unmatched
       lines as errors, so that they can easily be spotted in the journals.

       Up to 512 such lines can be injected.

       In a YAML file:
       ```yaml
       unmatched:
         key: MESSAGE  # inject the error log as MESSAGE
         inject::
           - key: KEY1
             value: 'VALUE1'
           # add as many constants as required
       ```

--------------------------------------------------------------------------------
  FILTERING

  --include PATTERN
       Include only keys matching the PCRE2 PATTERN.
       Useful when parsing JSON of logfmt logs, to include only the keys given.
       The keys are matched after the PREFIX has been added to them.

  --exclude PATTERN
       Exclude the keys matching the PCRE2 PATTERN.
       Useful when parsing JSON of logfmt logs, to exclude some of the keys given.
       The keys are matched after the PREFIX has been added to them.

       When both include and exclude patterns are set and both match a key,
       exclude wins and the key will not be added, like a pipeline, we first
       include it and then exclude it.

       In a YAML file:
       ```yaml
       filter:
         include: 'PCRE2 PATTERN MATCHING KEY NAMES TO INCLUDE'
         exclude: 'PCRE2 PATTERN MATCHING KEY NAMES TO EXCLUDE'
       ```

--------------------------------------------------------------------------------
  OTHER

  -h, or --help
       Display this help and exit.

  --show-config
       Show the configuration in YAML format before starting the job.
       This is also an easy way to convert command line parameters to yaml.

The program accepts all parameters as both --option=value and --option value.

The maximum log line length accepted is 1048576 characters.

PIPELINE AND SEQUENCE OF PROCESSING

This is a simple diagram of the pipeline taking place:
                                                                 
          +---------------------------------------------------+  
          |                       INPUT                       |  
          |             read one log line at a time           |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |             EXTRACT FIELDS AND VALUES             |  
          |            JSON, logfmt, or pattern based         |  
          |  (apply optional PREFIX - all keys use capitals)  |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                   RENAME FIELDS                   |  
          |           change the names of the fields          |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                 INJECT NEW FIELDS                 |  
          |   constants, or other field values as variables   |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                REWRITE FIELD VALUES               |  
          |     pipeline multiple rewriting rules to alter    |  
          |               the values of the fields            |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                   FILTER FIELDS                   |  
          |  use include and exclude patterns on the field    |  
          | names, to select which fields are sent to journal |  
          +---------------------------------------------------+  
                          v   v   v   v   v   v                  
          +---------------------------------------------------+  
          |                       OUTPUT                      |  
          |           generate Journal Export Format          |  
          +---------------------------------------------------+  
                                                                 
--------------------------------------------------------------------------------
JOURNAL FIELDS RULES (enforced by systemd-journald)

     - field names can be up to 64 characters
     - the only allowed field characters are A-Z, 0-9 and underscore
     - the first character of fields cannot be a digit
     - protected journal fields start with underscore:
       * they are accepted by systemd-journal-remote
       * they are NOT accepted by a local systemd-journald

     For best results, always include these fields:

      MESSAGE=TEXT
      The MESSAGE is the body of the log entry.
      This field is what we usually see in our logs.

      PRIORITY=NUMBER
      PRIORITY sets the severity of the log entry.
      0=emerg, 1=alert, 2=crit, 3=err, 4=warn, 5=notice, 6=info, 7=debug
      - Emergency events (0) are usually broadcast to all terminals.
      - Emergency, alert, critical, and error (0-3) are usually colored red.
      - Warning (4) entries are usually colored yellow.
      - Notice (5) entries are usually bold or have a brighter white color.
      - Info (6) entries are the default.
      - Debug (7) entries are usually grayed or dimmed.

      SYSLOG_IDENTIFIER=NAME
      SYSLOG_IDENTIFIER sets the name of application.
      Use something descriptive, like: SYSLOG_IDENTIFIER=nginx-logs

You can find the most common fields at 'man systemd.journal-fields'.

log2journal supports YAML configuration files, like the ones found in this directory.

`systemd-cat-native` options

Read the manual of systemd-cat-native.

Do you have any feedback for this page? If so, you can open a new issue on our netdata/learn repository.

log2journal

Processing pipeline​

Real-life example​

Extracting fields with a pattern​

Prefixing field names​

Renaming fields​

Injecting new fields​

Rewriting field values​

Sending logs to systemd-journal​

Best practices​

Performance​

PCRE2 patterns​

Performance of systemd journals​

Security Considerations​

log2journal options​

systemd-cat-native options​