-
Notifications
You must be signed in to change notification settings - Fork 46
Issue29 cache loading by GUID #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue29 cache loading by GUID #73
Conversation
+ Collect all hardcoded urls used in Cache class in dictionary + Add private method to load a caches GUID for latter usage when loading the cache details via `load_by_guid()` + Add unittests for `load_by_guid()` and ensure that `load()` and `load_quick()` are not called
+ Parse all available cache details from its "print-page". + The following list of properties is excluded, since they are not contained in the "print-page": - original_location - state - found - pm_only + Add optional 'id' argument to Waypoints.from_html() classmethod to specify the html of the table to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIG thank you for your PR. There are some more changes to do, but you have done a lot of work. 👏
|
||
self.waypoints = Waypoint.from_html(content, "Waypoints") | ||
|
||
def _load_guid(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please drop this method and use load_quick
instead of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't using load_quick
spoil the whole point of load_by_guid
since most of the attributes are already contained there? I introduced this method, cause I thought it might be easier when you are going to refactor the code.
Should I remove it anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to save a bandwidth. When I already do an request to self._urls["tiles_server"]
(either by calling load_quick()
or _load_guid()
), I see no reason, why not to parse all the data got in a response. To prevent it to be messy, I wanted you to write a simple comment inside the load_by_guid()
method.
You are right, that the loading code is somehow redundant now, but this is intentional – everyone can have a different starting info about a cache (WP or GUID) and can end up with different result, depending on his needs and loading method used.
So please either remove the method anyway / or, if you prefer, you can "wrap" the load_quick()
method to the _load_guid()
to have it more clear. Somehow like this:
def load_by_guid(self):
...
_load_guid()
...
def _load_guid(self):
"""COMMENT WHY"""
return load_quick(self)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking different starting points (WP or GUID) into account is actually a good point. So I moved GUID loading to load_quick()
and removed the redundant _load_guid()
then. Now its possible to e.g. set the guid (if already known) in Cache
ctor and call load_by_guid()
(see according test case). Sorry that I missed that point in the first place :)
msg = res["msg"] if "msg" in res else "Unknown error (probably not existing cache)" | ||
raise errors.LoadError("Cache {} cannot be loaded: {}".format(self, msg)) | ||
|
||
self.guid = res["data"][0]["g"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you just move this line to load_quick
, it will have the same effect in the end. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above
:raise .PMOnlyException: If the PM only warning is shown on the page | ||
""" | ||
if not hasattr(self, "guid"): | ||
self._load_guid() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change here – please see the note for line 752 + please add a comment on this line, why using load_quick
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment below
self._load_guid() | ||
res = self.geocaching._request(self._urls["print_page"], | ||
params={"guid": self.guid}) | ||
if res.find("p", {"class": "Warning"}) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be consistent with the rest of the library and use find("p", "Warning")
for finding an element by a class. The same note applies multiple times below.
raise errors.PMOnlyException() | ||
content = res.find(id="Content") | ||
|
||
self.name = content.find("h2").get_text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be consistent and use .text
attribute.
self.location = Point.from_string( | ||
content.find("p", {"class": "LatLong Meta"}).text) | ||
|
||
type_img = os.path.basename(content.find("h2").img.get("src")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be consistent and use ["src"]
for HTML attribute access. Also please use find("img")
for traversing the DOM tree.
BTW, good point with the os.path.basename
! Can you please create a separate PR for cleaning up <img src="...">
parsing like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the src attribute of img
tags seems to be always accessed via get("src")
all over the code. Do you want me to change them all (even outside load_by_guid
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! So there is an inconsistency in accessing the HTML attributes, because my previsous comment was based on this line.
I would be glad, if you could pick one way (whatever you prefer) of handling attributes and unite it all over the code – maybe in a separate PR to keep this focused.
"p", text=re.compile("Placed by:")).text.split("\r\n")[2].strip() | ||
|
||
self.hidden = content.find( | ||
"p", text=re.compile("Placed Date:")).text.split(": ")[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this a little bit more solid. This can easily break, if some user sets a date format containing :
.
|
||
attr_img = content.find_all("img", src=re.compile("\/attributes\/")) | ||
attributes_raw = [ | ||
_.get("src").split("/")[-1].rsplit("-", 1) for _ in attr_img |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use os.path.basepath
like you already did. 😉
self.description = content.find( | ||
"h2", text="Long Description").find_next("div").text | ||
|
||
self.hint = rot13(content.find(id="uxDecryptedHint").text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use uxEncryptedHint
without rot13
.
@@ -831,10 +918,14 @@ def __init__(self, id=None, type=None, location=None, note=None): | |||
self._note = note | |||
|
|||
@classmethod | |||
def from_html(cls, root): | |||
"""Return a dictionary of all waypoints found in the page representation""" | |||
def from_html(cls, root, id_="ctl00_ContentBody_Waypoints"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the default value of id_
and propagate it to the method call (line 650). Also maybe rename the variables root
and id
somehow more meaningful...
+ GUID is now set by load_quick(), since the _load_guid() and load_quick() requested the same page anyway + Add property getter/setter for GUID with regex validator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last change before merging - just the comment. ;)
return self._guid | ||
|
||
@guid.setter | ||
def guid(self, guid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for validating a guid
@@ -697,7 +714,7 @@ def load_by_guid(self): | |||
:raise .PMOnlyException: If the PM only warning is shown on the page | |||
""" | |||
if not hasattr(self, "guid"): | |||
self._load_guid() | |||
self.load_quick() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a brief comment why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :)
+ `Cache.guid` now return `None` in case it has not yet been set + Add test that checks `load_quick()` is called in case guid is `None` + Add comment to `load_by_guid` when calling `load_quick`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. :)
Implement cache details loading by GUID according to issue #29
id_
argument toWaypoints.from_html()
allowing to specify the html-id of the waypoint table to parse