Skip to content

Performance improvements #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions logstash_async/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,11 @@ class Constants:
# Usually this list does not need to be modified. Add/Remove elements to
# exclude/include them in the Logstash event, for the full list see:
# http://docs.python.org/library/logging.html#logrecord-attributes
FORMATTER_RECORD_FIELD_SKIP_LIST = [
FORMATTER_RECORD_FIELD_SKIP_LIST = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I wasn't aware there is such a difference on lookup performance between lists and sets.

It would be even better if we keep the constant a list here and convert it to a set later in the Formatter class, probably best in its __init__(). Then people can still append/remove items as they wish. Even if named constants, they are meant to be customised if necessary.
A set is also mutable but with a different API and so users had to adjust their code.

I think converting the list to a set later when creating a Formatter instance, should be ok as this happens only once (or only a few times if the formatter is configured multiple times or on forking or so but still not as often as log records are to be formatted).

'args', 'asctime', 'created', 'exc_info', 'exc_text', 'filename',
'funcName', 'id', 'levelname', 'levelno', 'lineno', 'module',
'msecs', 'msg', 'name', 'pathname', 'process',
'processName', 'relativeCreated', 'stack_info', 'thread', 'threadName']
'processName', 'relativeCreated', 'stack_info', 'thread', 'threadName'}
# fields to be set on the top-level of a Logstash event/message, do not modify this
# unless you know what you are doing
FORMATTER_LOGSTASH_MESSAGE_FIELD_LIST = [
Expand Down
4 changes: 3 additions & 1 deletion logstash_async/formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,9 @@ def _format_timestamp(self, time_):

# ----------------------------------------------------------------------
def _get_record_fields(self, record):
return {k: self._value_repr(v) for k, v in record.__dict__.items()}
return {k: self._value_repr(v)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
Since we removed now all unwanted fields from the record, we could skip trying to remove them again in _remove_excluded_fields by only passing extra_fields to the method.
Or maybe we incoporate the field removing code into _get_extra_fields as it remains the only place where the filtering is necessary yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought of that. However, subclasses may depend on _remove_excluded_fields to be called.

Another optimization would be to fold the _get_record_fields, _get_extra_fields and _remove_excluded_fields into one method: It iterates over record.__dict__ and builds a dict that can be message.updated in the format method. This would eliminate some iterations, checks and eventually del calls as well.
However, this would be a larger break in API and behaviour.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about API breakage.

Maybe it's worth for these changes to bump to 3.0 and announce the API changes, so people will notice.

for k, v in record.__dict__.items()
if k not in constants.FORMATTER_RECORD_FIELD_SKIP_LIST}

# ----------------------------------------------------------------------
def _value_repr(self, value):
Expand Down
9 changes: 9 additions & 0 deletions tests/formatter_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,15 @@ def test_format(self):

self.assertIsNone(file_handler.exception)

def test_fields_are_excluded_in_get_record_fields(self):
formatter = LogstashFormatter()
log_record = makeLogRecord({
'filename': 'foo.py',
'dummy': 'foobar'
})
self.assertDictEqual(formatter._get_record_fields(log_record), {
'dummy': 'foobar'
})

if __name__ == "__main__":
unittest.main()