ESQL: Skip LOOKUP JOIN when join key is missing from index #125577

alex-spies · 2025-03-25T10:51:24Z

ReplaceMissingFieldWithNull will check if the indices of a data node even have a given field, and if not, will define a literal NULL in its place to avoid extracting null blocks.

When the join key for a LOOKUP JOIN is missing on the data node, we could optimize away the whole join rather than extracting null blocks and then performing lookups with them.

Or, at least we should check that the current implementation is sufficiently cheap - but I think it's likely not in cases where we fan out to loads of data nodes, many of which just don't have values for the join key; like in FROM * | LOOKUP JOIN lu_idx ON some_rather_rare_field | SORT whatever_field.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2025-03-25T10:51:48Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

costin · 2025-04-23T00:35:54Z

The lookup can be completely reworked into a simple projection (due to being a left join) that references the left and right fields. The right side would have had the fields already added as null so

FROM * | LOOKUP JOIN lu_idx ON some_rather_rare_field
// becomes
FROM * | KEEP <left-fields>, <lu_idx fields>
// which would be further optimized (if the stats for some_rather_rare_field are properly read) to
FROM * | EVAL <lu_idx fields> = NULL | KEEP <left_fields> , <lu_idx fields>

One potential complication is in returning data from the optimized data node in a format suitable for the consumer on the coordinator.

P.S.
Ideally we won't broadcast the call at all however that requires some extra infrastructure that we don't have yet at the moment.
Another approach would be to look at the intersection of the keys first before deciding on the execution strategy - no keys is an extreme variant of that.

alex-spies · 2025-04-23T09:56:47Z

The lookup can be completely reworked into a simple projection

This is only true if the names on the left/right hand sides are qualified (once we have qualifiers) due to name conflict handling. (There is always at least 1 name conflict, by design, specified via the ON clause.) Right now, the projection has to be baked into the lookup join because duplicate names are forbidden in plans' outputs.
However, once we have qualifiers, you are correct that it may become more convenient to refactor our planning such that LOOKUP JOIN ... ON field is represented as a LOOKUP JOIN ... AS right ON field == right.field | KEEP ...

FROM * | LOOKUP JOIN lu_idx ON some_rather_rare_field becomes FROM * | EVAL <lu_idx fields> = NULL | KEEP <left_fields> , <lu_idx fields>

That's what I had in mind. The last KEEP is likely not even needed due to LOOKUP JOIN ... ON field working just like EVAL in terms of shadowing.

Ideally we won't broadcast the call at all

Not sure that's true in general, because the LOOKUP JOIN is not the only work that's being done on the data nodes. It's true if the query has a WHERE command that can only be true if a lookup field is non-null, though.

Another approach would be to look at the intersection of the keys first before deciding on the execution strategy - no keys is an extreme variant of that.

Yep, there's some optimization potential here when we run LOOKUP JOIN ... ON field and we know that the values for field in the current shard can't have any matches at all with the lookup index. Unfortunately, the lookup index is non-local but maybe it can still be cheap enough to figure this out, or we can make it cheap enough via e.g. caching of metadata.

alex-spies added :Analytics/ES|QL AKA ESQL >enhancement labels Mar 25, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 25, 2025

alex-spies mentioned this issue Mar 25, 2025

[ES|QL] Assign new id to alias created by ReplaceMissingFieldWithNull when there is lookup join #125462

Closed

wchaparro assigned costin Apr 22, 2025

costin assigned bpintea and unassigned costin Apr 23, 2025

This was referenced Apr 24, 2025

ESQL: Have CombineProjections propagate references upwards #127264

Merged

ESQL: Add optimization to purge join on null merge key #127583

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Skip LOOKUP JOIN when join key is missing from index #125577

ESQL: Skip LOOKUP JOIN when join key is missing from index #125577

alex-spies commented Mar 25, 2025

elasticsearchmachine commented Mar 25, 2025

costin commented Apr 23, 2025

alex-spies commented Apr 23, 2025

ESQL: Skip LOOKUP JOIN when join key is missing from index #125577

ESQL: Skip LOOKUP JOIN when join key is missing from index #125577

Comments

alex-spies commented Mar 25, 2025

elasticsearchmachine commented Mar 25, 2025

costin commented Apr 23, 2025

alex-spies commented Apr 23, 2025