-
Notifications
You must be signed in to change notification settings - Fork 23
/
Copy pathassignment0-431.html
311 lines (269 loc) · 14.2 KB
/
assignment0-431.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="Course homepage for CS 431/631 451/651 Data-Intensive Distributed Computing (Winter 2019) at the University of Waterloo">
<meta name="author" content="Adam Roegiest">
<title>Data-Intensive Distributed Computing</title>
<!-- Bootstrap core CSS -->
<link href="css/bootstrap.min.css" rel="stylesheet">
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<link href="css/ie10-viewport-bug-workaround.css" rel="stylesheet">
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<script src="js/ie-emulation-modes-warning.js"></script>
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<style>
body {
padding-top: 60px; /* 60px to make the container go all the way to the bottom of the topbar */
}
</style>
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav">
<li><a href="index.html">Overview</a></li>
<li><a href="organization.html">Organization</a></li>
<li><a href="syllabus.html">Syllabus</a></li>
<li class="active"><a href="assignments.html">Assignments</a></li>
<li><a href="software.html">Software</a></li>
</ul>
</div><!--/.nav-collapse -->
</div>
</nav>
<div class="container">
<div class="page-header">
<div style="float: right"><img width="250" src="images/waterloo_logo.png" alt="University of Waterloo logo"/></div>
<h1>Assignments <br/><small>Data-Intensive Distributed Computing (Winter 2019)</small></h1>
</div>
<p>Note that there separate sets of assignments for CS 451/651 and CS
431/631. Make sure you work on the correct asssignments!</p>
<p><a href="assignments-431.html" class="btn btn-info btn-large">CS 431/631 Assignments</a></p>
<div class="subnav">
<ul class="nav nav-pills">
<li><a href="assignment0-431.html">0</a></li>
<li><a href="assignment1-431.html">1</a></li>
<li><a href="assignment2-431.html">2</a></li>
<li><a href="assignment3-431.html">3</a></li>
<li><a href="assignment4-431.html">4</a></li>
<li><a href="assignment5-431.html">5</a></li>
<li><a href="project-431.html">Final Project</a></li>
</ul>
</div>
<h3>Assignment 0: Warmup <small>due 2:30pm January 22</small></h3>
<p>This assignment is a warmup exercise to get you familar with
some of the basic tools you will need for the remaining
assignments. In particular, we will be making use of
Python (for programming) and Jupyter notebooks.</p>
<p>The general setup is as follows: for each assignment, you will
be provided with a "starter" notebook, which will describe what
needs to be done for that assignment. You'll complete the
assignment in the notebook, and then submit your notebook to a
private GitHub repo. Shortly after the assignment deadline, we'll
pull your repo for marking.</p>
<p>I'm assuming you already have
a <a href="http://github.com/">GitHub</a> account. If not, create one
as soon as possible. Once you've signed up for an account, go and
<a href="https://education.github.com/discount_requests/new">request
an educational account</a>. This will allow you to create private
repos for free. Please do this as soon as possible since there may be
delays in the request verification process.</p>
<p>Create a <b>private</b> repo called <code>bigdata2019w</code>. I'm
assuming that you're already familiar GitHub, but just in
case, here is <a href="https://help.github.com/articles/create-a-repo">how
you create a repo on GitHub</a>. If you've successfully gotten an
educational account (per above), you should be able to create private
repos for free.</p>
<h4 style="padding-top: 10px">Python and Jupyter</h4>
<p>All of the programming required for the assignments will be in
Python. If you have never programmed in Python before, you
will need to gradually bring yourself up to speed.
There are many on-line resources that can help with this.
A good place to start is <a target="_blank"
href="https://www.python.org/">python.org</a>.
In particular, you can start with their <a target="_blank"
href="https://docs.python.org/3/tutorial/index.html">Python Tutorial</a>.
If you don't like that particular tutorial, there
are <a href="https://wiki.python.org/moin/BeginnersGuide/Programmers"
target="_blank">many others to choose from</a>.
There are also many Python books
to choose from, if you prefer to learn that way.
Choose a book that fits your needs. For example, some books target
people who are migrating to Python from other languages, while
others are directed at novice programmers.</p>
<p>Most Python tutorials expect you to try out examples as you
go along, i.e., they expect you to write and run Python code.
This kind of active learing is definitely the way to go.
The simplest way for you to run Python code is by using a
Jupyter notebook running on the CS Jupyter hub (see below).
This will allow you to run Python in a web browser, without
having to install any software on your machine. If you
wish, you can also install Python locally on your own machine.
Python is
<a href="https://wiki.python.org/moin/BeginnersGuide/Download"
target="_blank"> freely available for a variety of platforms</a>.
Bear in mind that all assignments for CS431/631 will be done using
notebooks, so it is not a bad idea to get used to them.</p>
<h4 style="padding-top: 10px">Jupyter Notebooks</h4>
<p>
For
this course, you will be writing and running Python code in
<a href="http://jupyter.org/" target="_blank">Jupyter
notebooks</a>. Each notebook consists of a sequence of <i>cells</i>.
An cell can hold (formatted) text, Python code, or graphics. A great thing
about notebooks is that you can open and run them in a web browser.
This means that you can
work on your own machine, using only a web browser, without having
to install any additional software.</p>
<p>A Jupyter "hub" is a place to store and use Jupyter notebooks.
For the CS431/631 assignments, you'll be using a hub operated by
the School of Computer Science. To get started,
go to <a href="https://jupyter.student.cs.uwaterloo.ca:8000"
target="_blank">jupyter.student.cs.uwaterloo.ca</a>.
Log in using your userid and password for the <emph>CS student computing
environment</emph> (not your WatIAM password).
Once you have logged in, you should see a list of folders and files
- this is the contents of your home directory (folder) in the
CS student computing environment.
<!-- <ins>Unfortunately, the CS Jupyter hub is not yet ready for use.
So, for Assignment 0, we will instead be using a similar hub
run by Compute Canada. To get started, go to
<a href="https://uwaterloo.syzygy.ca/"
target="_blank">https://uwaterloo.syzygy.ca/</a>, and log in
using your <emph>WatIAM</emph> password. Once you are
authenticated, click on the <samp>Start My Server</samp> button.
It will take a few minutes for your server to launch. Once it
does, you will see the contents of your home folder, which should
be empty at this point. You can now proceed with the
instructions below, just as you would on the
CS hub.</ins></p> -->
<p>
It is a good idea to create
a new folder to hold all of your work for this course, if you do
not already have one. To do this, use the <samp>New</samp> dropdown on the
top right to create a new folder, and call the folder <samp>cs431</samp>
(or whatever name you prefer). Then, open your new folder by
clicking on it.
</p>
<p>
Once you are in your <samp>cs431</samp> folder, try creating a new
Jupyter notebook.
To create a notebook, use the <samp>New</samp> dropdown to
create a new Python 3 notebook.
You should see something that looks like this:
<div style="padding-top: 20px; padding-left: 20px; padding-bottom:
20px"><img width="800px" src="images/newnotebook.png"
alt="A New Jupyter Notebook"/></div>
This represents
a notebook with a single, empty cell.
Before going any further with your notebook, try out the following
three basic things that you will need to be able to do:
<ol>
<li>First, change the name of your notebook by clicking on the
current
name ("Untitled"), and entering a new name, say, <samp>Test
Notebook</samp>.</li>
<li>Next, save your notebook using <samp>Save and Checkpoint</samp> from
the <samp>File</samp> menu. Saving a notebook saves its current state,
so that you can stop working at any time, and resume later from where you left
off.</li>
<li>Finally, stop your notebook by selecting <samp>Close and Halt</samp> from the
<samp>File</samp> menu. This should take you back to your list of files and
folders. You should see a new file called <samp>Test
Notebook.ipynb</samp>, which is your saved notebook. By clicking on
that notebook file, you can start your notebook running again from the point
at which you last saved (try it!).</li>
</ol>
Once you've tried out these basics, start your test notebook and spend
some time familiarizing yourself with the notebook interface.
Take the User Interface Tour, which you can launch from the <samp>Help</samp>
menu of a running notebook.</p>
<h4 style="padding-top: 10px">Assignment Workflow</h4>
<p> The basic workflow for each assignment will be something like this:
<ol>
<li>Download the starter notebook for the assignment, as well as any
other required files, from the assignment web page to your computer.</li>
<li>Use a web browser to log in to the CS Jupyter
hub at <a href="https://jupyter.student.cs.uwaterloo.ca:8000"
target="_blank">jupyter.student.cs.uwaterloo.ca</a>.
<li>Upload the starter notebook for the assignment, as well as any
other required files, from your computer to the CS hub, into
your <samp>cs431</samp> folder.</li>
<li>Launch the starter notebook that you just uploaded, and follow
the instructions in
the notebook to complete the assignment. Be sure to save your work.</li>
<li>When you are finished with the assignment, download your notebook
(the <samp>.ipynb</samp> file) from your <samp>cs431</samp> folder on
the hub to your computer, and submit it to the course staff by
following the <a href="#submitting">submission instructions</a>.</li>
</ol>
</p>
<h4 style="padding-top: 10px">Assignment 0</h4>
For the first assignment, you will do some simple analyses on the
text of Shakespeare's plays.
For this assignment, you will need to download three files to your
local machine, and then upload them to the Jupyter hub.
They are:
<ul>
<li><a href="content/cs431/Shakespeare.txt">Shakespeare.txt</a>: this is a plain text file that contains the
complete text of Shakespeare's plays.</li>
<li><a href="content/cs431/simple_tokenize.py">simple_tokenize.py</a>:
this is a simple Python module for tokenizing
text</li>
<li><a href="content/cs431/A0.ipynb">A0.ipynb</a>: this is the starter
notebook for A0, in which you will do your assignment work.
</ul>
Files with names that end in <samp>.ipynb</samp> are Python notebook
files. When you work in a notebook and save your work, your work
is saved in the <samp>.ipynb</samp> file. You'll submit
your saved <samp>A0.ipynb</samp> file to your github repository when you are done with
the assignment. That will allow us to open your notebook and review
your work.
After you have uploaded these files to the hub, open <samp>A0.ipynb</samp>
to get started on the assignment. The notebook itself describes what
we expect you to do.
<a name="submitting"></a>
<h4 style="padding-top: 10px">Submitting Assignment 0</h4>
<p>
Once you are done with the assignment, submit A0 using the following steps:
<ol>
<li>Download your <samp>A0.ipynb</samp> file from the Jupyter hub to your
computer.</li>
<li>Submit your <samp>A0.ipynb</samp> file to your GitHub repository using the web interface.
If you're not already familiar with GitHub, here is <a href="https://help.github.com/articles/adding-a-file-to-a-repository/">how
you submit a new file to a repo on GitHub</a>. Make sure your <samp>A0.ipynb</samp> file is committed to the master branch. Your assignment should be viewable in the web interface.</li>
<li>Add the
user <a href="https://github.com/bigdatateach">bigdatateach</a> a
collaborator to your repo so that we can access it. Here is <a href="https://help.github.com/articles/inviting-collaborators-to-a-personal-repository/">how
you add a collaborator to your repo</a>.</li>
<li>Finally, you need to tell us your GitHub account so we can link it
to you. Submit your information <a href="https://goo.gl/forms/BVdXDGIXxWOuhUze2">here</a>.</li>
</ol>
</p>
<p style="padding-top: 20px"><a href="#">Back to top</a></p>
<div style="padding-bottom: 100px"></div>
</div><!-- /.container -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<script src="js/ie10-viewport-bug-workaround.js"></script>
</body>
</html>