Skip to content

Issue29 cache loading by GUID #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

weinshec
Copy link
Collaborator

Implement cache details loading by GUID according to issue #29

  • Add method to retrieve a caches GUID
  • Add method to load all cache details from its print-page (with a few exceptions of properties that are not listed there)
  • Add id_ argument to Waypoints.from_html() allowing to specify the html-id of the waypoint table to parse

Christoph Weinsheimer added 2 commits October 1, 2016 12:11
+ Collect all hardcoded urls used in Cache class in dictionary
+ Add private method to load a caches GUID for latter usage when loading
  the cache details via `load_by_guid()`
+ Add unittests for `load_by_guid()` and ensure that `load()` and
  `load_quick()` are not called
+ Parse all available cache details from its "print-page".
+ The following list of properties is excluded, since they are not
  contained in the "print-page":
  - original_location
  - state
  - found
  - pm_only
+ Add optional 'id' argument to Waypoints.from_html() classmethod to
  specify the html of the table to parse.
@coveralls
Copy link

coveralls commented Oct 26, 2016

Coverage Status

Coverage increased (+0.4%) to 96.11% when pulling 31f68aa on weinshec:issue29_cache_loading_performance into dcb993f on tomasbedrich:master.

Copy link
Owner

@tomasbedrich tomasbedrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIG thank you for your PR. There are some more changes to do, but you have done a lot of work. 👏


self.waypoints = Waypoint.from_html(content, "Waypoints")

def _load_guid(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please drop this method and use load_quick instead of it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't using load_quick spoil the whole point of load_by_guid since most of the attributes are already contained there? I introduced this method, cause I thought it might be easier when you are going to refactor the code.

Should I remove it anyway?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to save a bandwidth. When I already do an request to self._urls["tiles_server"] (either by calling load_quick() or _load_guid()), I see no reason, why not to parse all the data got in a response. To prevent it to be messy, I wanted you to write a simple comment inside the load_by_guid() method.

You are right, that the loading code is somehow redundant now, but this is intentional – everyone can have a different starting info about a cache (WP or GUID) and can end up with different result, depending on his needs and loading method used.

So please either remove the method anyway / or, if you prefer, you can "wrap" the load_quick() method to the _load_guid() to have it more clear. Somehow like this:

def load_by_guid(self):
    ...
    _load_guid()
    ...

def _load_guid(self):
    """COMMENT WHY"""
    return load_quick(self)

Copy link
Collaborator Author

@weinshec weinshec Dec 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking different starting points (WP or GUID) into account is actually a good point. So I moved GUID loading to load_quick() and removed the redundant _load_guid() then. Now its possible to e.g. set the guid (if already known) in Cache ctor and call load_by_guid() (see according test case). Sorry that I missed that point in the first place :)

msg = res["msg"] if "msg" in res else "Unknown error (probably not existing cache)"
raise errors.LoadError("Cache {} cannot be loaded: {}".format(self, msg))

self.guid = res["data"][0]["g"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just move this line to load_quick, it will have the same effect in the end. 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

:raise .PMOnlyException: If the PM only warning is shown on the page
"""
if not hasattr(self, "guid"):
self._load_guid()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change here – please see the note for line 752 + please add a comment on this line, why using load_quick.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment below

self._load_guid()
res = self.geocaching._request(self._urls["print_page"],
params={"guid": self.guid})
if res.find("p", {"class": "Warning"}) is not None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent with the rest of the library and use find("p", "Warning") for finding an element by a class. The same note applies multiple times below.

raise errors.PMOnlyException()
content = res.find(id="Content")

self.name = content.find("h2").get_text()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent and use .text attribute.

self.location = Point.from_string(
content.find("p", {"class": "LatLong Meta"}).text)

type_img = os.path.basename(content.find("h2").img.get("src"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent and use ["src"] for HTML attribute access. Also please use find("img") for traversing the DOM tree.

BTW, good point with the os.path.basename! Can you please create a separate PR for cleaning up <img src="..."> parsing like this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the src attribute of img tags seems to be always accessed via get("src") all over the code. Do you want me to change them all (even outside load_by_guid)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! So there is an inconsistency in accessing the HTML attributes, because my previsous comment was based on this line.

I would be glad, if you could pick one way (whatever you prefer) of handling attributes and unite it all over the code – maybe in a separate PR to keep this focused.

"p", text=re.compile("Placed by:")).text.split("\r\n")[2].strip()

self.hidden = content.find(
"p", text=re.compile("Placed Date:")).text.split(": ")[1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a little bit more solid. This can easily break, if some user sets a date format containing :.


attr_img = content.find_all("img", src=re.compile("\/attributes\/"))
attributes_raw = [
_.get("src").split("/")[-1].rsplit("-", 1) for _ in attr_img
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use os.path.basepath like you already did. 😉

self.description = content.find(
"h2", text="Long Description").find_next("div").text

self.hint = rot13(content.find(id="uxDecryptedHint").text)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use uxEncryptedHint without rot13.

@@ -831,10 +918,14 @@ def __init__(self, id=None, type=None, location=None, note=None):
self._note = note

@classmethod
def from_html(cls, root):
"""Return a dictionary of all waypoints found in the page representation"""
def from_html(cls, root, id_="ctl00_ContentBody_Waypoints"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the default value of id_ and propagate it to the method call (line 650). Also maybe rename the variables root and id somehow more meaningful...

@coveralls
Copy link

coveralls commented Nov 9, 2016

Coverage Status

Coverage increased (+0.1%) to 95.844% when pulling 185cd38 on weinshec:issue29_cache_loading_performance into dcb993f on tomasbedrich:master.

+ GUID is now set by load_quick(), since the _load_guid() and
  load_quick() requested the same page anyway
+ Add property getter/setter for GUID with regex validator
@coveralls
Copy link

coveralls commented Dec 11, 2016

Coverage Status

Coverage increased (+0.05%) to 95.763% when pulling d5edee2 on weinshec:issue29_cache_loading_performance into dcb993f on tomasbedrich:master.

Copy link
Owner

@tomasbedrich tomasbedrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last change before merging - just the comment. ;)

return self._guid

@guid.setter
def guid(self, guid):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for validating a guid

@@ -697,7 +714,7 @@ def load_by_guid(self):
:raise .PMOnlyException: If the PM only warning is shown on the page
"""
if not hasattr(self, "guid"):
self._load_guid()
self.load_quick()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a brief comment why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

+ `Cache.guid` now return `None` in case it has not yet been set
+ Add test that checks `load_quick()` is called in case guid is `None`
+ Add comment to `load_by_guid` when calling `load_quick`
@coveralls
Copy link

coveralls commented Dec 12, 2016

Coverage Status

Coverage increased (+0.1%) to 95.852% when pulling d922e3f on weinshec:issue29_cache_loading_performance into dcb993f on tomasbedrich:master.

Copy link
Owner

@tomasbedrich tomasbedrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. :)

@tomasbedrich tomasbedrich merged commit 3d3504a into tomasbedrich:master Dec 12, 2016
@weinshec weinshec deleted the issue29_cache_loading_performance branch December 13, 2016 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants