Skip to content

Latest commit

 

History

History
26 lines (14 loc) · 577 Bytes

scraping-strategy.md

File metadata and controls

26 lines (14 loc) · 577 Bytes

General Strategy:

  1. Have seed page (http://wordpress.org/extend/themes/)
  2. Visit every theme page from that seed page A. http://wordpress.org/extend/themes/twentytwelve From that page... Download the zip file

create a function for each purpose.

-- createThemeDir() -- themeListPageToListOfThemeURLs( e.g., take in step (1) and produce urls for step (2)) -- themePageToZipLink -- downloadZipLink(url, intoDirectory) -> downloads

Make the code able to know how to skip things it's alread done.

desired end goal:

  • all the zip files in theme data directory