Performance improvements #65

ercpe · 2021-05-27T04:21:55Z

In my hunt for a bottleneck in one of our applications, I've found another few hotspots. Not that it was slow before, but when you push a few thousand messages per minute, some things just pop up.

This PR contains two small but measurable improvements. My approach was a simple timeit script, run standalone and with a profiler:

    result = timeit.timeit(stmt="""formatter.format(lr)""",
                           setup="""
from logstash_async.formatter import LogstashFormatter
from logging import LogRecord, Logger
import datetime
import logging

now = datetime.datetime.now()
formatter = LogstashFormatter()
lr = Logger('dummy-logger').makeRecord(
    'Some name', logging.ERROR, 'blubb.py', 123, 'Some message',
    args=(), exc_info=None,
    extra={'foo': 'bar', 'baz': 123, 'somedatetime': now})
""",
                           number=100000)
    print(result)

With the current 2.3.0, this script completed (on my machine) in around 2.6 seconds.

The first change was to change FORMATTER_RECORD_FIELD_SKIP_LIST to a set: this list is used in the fashion of x in y which is way faster for sets than list. This brought the execution time down to 2.2 seconds.

The other change was to exclude fields from FORMATTER_RECORD_FIELD_SKIP_LIST in _get_record_fields: The formatter pulled all fields from the message and the extra into the dict and removed the fields afterwards in _remove_excluded_fields. Since we're iterating over __dict__.items() in _get_record_fields anyway, we can skip the fields right there and save us the del later on.
With both changes, I'm down to 1.7 seconds.

I'm working on another improvement involving the datetime handling, but that will come in another PR as it involves a little bit more code.

The `FORMATTER_RECORD_FIELD_SKIP_LIST` is used heavily to remove fields from the resulting document in the fashion `x in FORMATTER_RECORD_FIELD_SKIP_LIST`. This operation is `O(n)` for lists, but `O(1)` to `O(n)` for sets.

The `_get_record_fields` method extracts and formats the items of the `LogRecord` object into the message dict. After collecting the static extra fields and adding the per-record extra fields, `_remove_excluded_fields` is called to remove any keys we don't want. `_remove_excluded_fields` is called for the two `dict`s `message` and `extra_fields` and removes items if the key is present in `FORMATTER_RECORD_FIELD_SKIP_LIST`. We can improve this by not adding those fields in the first place since we're iterating over `record.__dict__` anyways.

eht16 · 2021-05-30T08:42:09Z

logstash_async/constants.py

@@ -32,11 +32,11 @@ class Constants:
    # Usually this list does not need to be modified. Add/Remove elements to
    # exclude/include them in the Logstash event, for the full list see:
    # http://docs.python.org/library/logging.html#logrecord-attributes
-    FORMATTER_RECORD_FIELD_SKIP_LIST = [
+    FORMATTER_RECORD_FIELD_SKIP_LIST = {


Cool, I wasn't aware there is such a difference on lookup performance between lists and sets.

It would be even better if we keep the constant a list here and convert it to a set later in the Formatter class, probably best in its __init__(). Then people can still append/remove items as they wish. Even if named constants, they are meant to be customised if necessary.
A set is also mutable but with a different API and so users had to adjust their code.

I think converting the list to a set later when creating a Formatter instance, should be ok as this happens only once (or only a few times if the formatter is configured multiple times or on forking or so but still not as often as log records are to be formatted).

eht16 · 2021-05-30T08:50:04Z

logstash_async/formatter.py

@@ -133,7 +133,9 @@ def _format_timestamp(self, time_):

    # ----------------------------------------------------------------------
    def _get_record_fields(self, record):
-        return {k: self._value_repr(v) for k, v in record.__dict__.items()}
+        return {k: self._value_repr(v)


Nice!
Since we removed now all unwanted fields from the record, we could skip trying to remove them again in _remove_excluded_fields by only passing extra_fields to the method.
Or maybe we incoporate the field removing code into _get_extra_fields as it remains the only place where the filtering is necessary yet.

Yes, I thought of that. However, subclasses may depend on _remove_excluded_fields to be called.

Another optimization would be to fold the _get_record_fields, _get_extra_fields and _remove_excluded_fields into one method: It iterates over record.__dict__ and builds a dict that can be message.updated in the format method. This would eliminate some iterations, checks and eventually del calls as well.
However, this would be a larger break in API and behaviour.

Good point about API breakage.

Maybe it's worth for these changes to bump to 3.0 and announce the API changes, so people will notice.

Johann Schmitz added 2 commits May 25, 2021 06:28

Convert FORMATTER_RECORD_FIELD_SKIP_LIST to set

333785b

The `FORMATTER_RECORD_FIELD_SKIP_LIST` is used heavily to remove fields from the resulting document in the fashion `x in FORMATTER_RECORD_FIELD_SKIP_LIST`. This operation is `O(n)` for lists, but `O(1)` to `O(n)` for sets.

eht16 reviewed May 30, 2021

View reviewed changes

oschulzSam added a commit to SamhammerAG/python-logstash-async that referenced this pull request Jul 15, 2021

KIT-1010 Performance improvements eht16#65

5249d59

gkreuzer pushed a commit to SamhammerAG/python-logstash-async that referenced this pull request Jul 15, 2021

KIT-1010 Performance improvements eht16#65

8e6dcae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #65

Performance improvements #65

ercpe commented May 27, 2021

eht16 May 30, 2021

eht16 May 30, 2021

ercpe May 30, 2021

eht16 May 31, 2021

Performance improvements #65

Are you sure you want to change the base?

Performance improvements #65

Conversation

ercpe commented May 27, 2021

eht16 May 30, 2021

Choose a reason for hiding this comment

eht16 May 30, 2021

Choose a reason for hiding this comment

ercpe May 30, 2021

Choose a reason for hiding this comment

eht16 May 31, 2021

Choose a reason for hiding this comment