Skip to content

Collection.from_dict fails with type error if spatial extent is explicitly null #1551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
philvarner opened this issue Apr 19, 2025 · 7 comments

Comments

@philvarner
Copy link
Collaborator

philvarner commented Apr 19, 2025

This code (being used for the pystac-client Quickstart):

from pystac_client import Client
from pystac import Collection
import json



client = Client.open("https://emc.spacebel.be")

collection_search = client.collection_search(
        q='SEALEVEL_EUR_PHY_L3_NRT_008_059',
    )
print(f"{collection_search.matched()} collections found")

for collection in collection_search.collections_as_dicts():
        print(collection.get("id"))
        print(json.dumps(collection.get("extent")))
        Collection.from_dict(collection)

This fails because the spatial extent of the collection is explicitly null. This collection advertises it is stac_version 1.0.0. Null extent is invalid wrt the spec, but the tooling should be more permissive it what it will process.

The extent is:

{
  "spatial": null,
  "temporal": {
    "interval": [
      [
        "2022-01-01T00:00:00.000Z",
        null
      ]
    ]
  }
}

Exception is:

TypeError                                 Traceback (most recent call last)
Cell In[12], line 10
      5 collection_search = client.collection_search(
      6         q='Sentinel-6',
      7     )
      8 print(f"{collection_search.matched()} collections found")
---> 10 for collection in collection_search.collections():
     11         print(collection.id)

File ~/code/stac-utils/pystac-client/pystac_client/collection_search.py:395, in CollectionSearch.collections(self)
    387 """Iterator that yields :class:`pystac.Collection` instances for each collection
    388 matching the given search parameters.
    389 
    390 Yields:
    391     Collection : each Collection matching the search criteria
    392 """
    393 for collection in self.collections_as_dicts():
    394     # already signed in collections_as_dicts
--> 395     yield Collection.from_dict(
    396         collection, root=self.client, preserve_dict=False
    397     )

File ~/code/stac-utils/pystac-client/.venv/lib/python3.12/site-packages/pystac/collection.py:654, in Collection.from_dict(cls, d, href, root, migrate, preserve_dict)
    652 description = d.pop("description")
    653 license = d.pop("license")
--> 654 extent = Extent.from_dict(d.pop("extent"))
    655 title = d.pop("title", None)
    656 stac_extensions = d.pop("stac_extensions", None)

File ~/code/stac-utils/pystac-client/.venv/lib/python3.12/site-packages/pystac/collection.py:366, in Extent.from_dict(d)
    358 @staticmethod
    359 def from_dict(d: dict[str, Any]) -> Extent:
    360     """Constructs an Extent from a dict.
    361 
    362     Returns:
    363         Extent: The Extent deserialized from the JSON dict.
    364     """
    365     return Extent(
--> 366         spatial=SpatialExtent.from_dict(d["spatial"]),
    367         temporal=TemporalExtent.from_dict(d["temporal"]),
    368         extra_fields={
    369             k: v for k, v in d.items() if k not in {"spatial", "temporal"}
    370         },
    371     )

File ~/code/stac-utils/pystac-client/.venv/lib/python3.12/site-packages/pystac/collection.py:118, in SpatialExtent.from_dict(d)
    110 @staticmethod
    111 def from_dict(d: dict[str, Any]) -> SpatialExtent:
    112     """Constructs a SpatialExtent from a dict.
    113 
    114     Returns:
    115         SpatialExtent: The SpatialExtent deserialized from the JSON dict.
    116     """
    117     return SpatialExtent(
--> 118         bboxes=d["bbox"], extra_fields={k: v for k, v in d.items() if k != "bbox"}
    119     )

TypeError: 'NoneType' object is not subscriptable

I think it would be reasonable (though not desired) to fail to deserialize invalid items, but with an explicit error as to why it could not be deserialized. In this case, it's just an accidental TypeError from trying to subscript None.

@philvarner philvarner changed the title error populating Collection with spatial extent from JSON Collection.from_dict fails with type error if spatial extent is explicitly null Apr 19, 2025
@gadomski
Copy link
Member

I think it would be reasonable (though not desired) to fail to deserialize invalid items

I'd almost go as far to say that we should deserialize with a warning and a default extent ... Postel's law, eh?

@philvarner
Copy link
Collaborator Author

philvarner commented Apr 20, 2025 via email

@gadomski
Copy link
Member

I don’t think we can set a default extent - I think it needs to be null

Wouldn't global bounds (-180, -90, 180, 90) be a reasonable "default", with a warning?

I think the larger issue is that we need two different models - a lax one
for reading and a strict one for writing.

I think that's one approach ... the other is to just have asymmetric serialization and deserilization, where deserialization is much more permissive. This relates to #1092 ... to me (other opinions are of course available) pystac gets more value out of not being tied to pydantic, but rather by hand-crafting that permissive deserialization to match real-world patterns/data.

@philvarner
Copy link
Collaborator Author

If it's going to have a value, then global extent sounds reasonable. But maybe changing this to allow null is better, if the ser/deser are asymmetric?

I think the underlying problem is that we shouldn't be using pystac for construction of stac items from other metdata, and there needs to be another package for that. Fewer and only more "advanced" users are likely to use this, so it doesn't have to be as easy to work with (easy != simple). The types and validation should be such that you can only construct a valid STAC Item. Using pydantic and tightening up the validation is one way, Builder pattern is another, Pyrsistent with evolvers is another.

@gadomski
Copy link
Member

we shouldn't be using pystac for construction of stac items from other metdata

🤔 why not? To me, pystac seems like a great place to store all of the weird heuristics that you might need to do to create valid STAC from messy real-world data.

The types and validation should be such that you can only construct a valid STAC Item.

I think that's the idea behind "permissive" deserialization ... even if you give it stuff that would be invalid STAC, pystac will make it valid for you.

@philvarner
Copy link
Collaborator Author

The main thing I had in mind with that comment is that you can construct invalid items with it. This would force the user to sort out the messiness before constructing the Item. Maybe a combination of using validate (with #1552 ) would improve this situation, and other semantic validation that can't be expressed well in schema (e.g., no antimeridian crossing, all bbox and geometry points are within 180/90 bounds).

I don't like the idea that you could provide invalid values and pystac will make it valid, for the same reason that I don't like languages like JavaScript that have such weak typing. It allows the user to be sloppy and get some answer, which may not be the one they actually intended. Maybe this is another tool, but I think pystac should be explicit and do what you tell it to.

@gadomski
Copy link
Member

gadomski commented Apr 21, 2025

I think pystac should be explicit and do what you tell it to.

I take your point ... this is some built-in messiness with pystac where we're trying to please three different user groups (data producers, "developers", and data consumers).

If we were to split things up, I'd want to target those groups specifically, rather than starting w/ technologies. Right now, my thinking goes like:

  • Data producers: maybe you just want TypedDict? E.g. rustac-py
  • Developers: pydantic all the way, since you're quite often building APIs
  • Data consumers: pystac (more-or-less in its current form), where you can accept almost anything, it comes with lots of "magic" functionality, and you can integrate with other libraries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants