Skip to content

Commit f14334c

Browse files
authored
Merge pull request #92 from nsidc/datatree
Add xarray.Datatree() guidance for ICESat-2 ATL06 tutorial
2 parents 990ff88 + 25e9b28 commit f14334c

File tree

3 files changed

+42109
-1822
lines changed

3 files changed

+42109
-1822
lines changed

notebooks/ICESat-2_Cloud_Access/ATL06-direct-access.ipynb

+36-22
Original file line numberDiff line numberDiff line change
@@ -17,20 +17,18 @@
1717
"\n",
1818
"## **1. Tutorial Overview**\n",
1919
"\n",
20-
"**Note: This is an updated version of the notebook that was presented to the NSIDC DAAC User Working Group in May 2022**\n",
21-
"\n",
2220
"This notebook demonstrates searching for cloud-hosted ICESat-2 data and directly accessing Land Ice Height (ATL06) granules from an Amazon Compute Cloud (EC2) instance using the `earthaccess` package. NASA data \"in the cloud\" are stored in Amazon Web Services (AWS) Simple Storage Service (S3) Buckets. **Direct Access** is an efficient way to work with data stored in an S3 Bucket when you are working in the cloud. Cloud-hosted granules can be opened and loaded into memory without the need to download them first. This allows you take advantage of the scalability and power of cloud computing. \n",
2321
"\n",
2422
"The Amazon Global cloud is divided into geographical regions. To have direct access to data stored in a region, our compute instance - a virtual computer that we create to perform processing operations in place of using our own desktop or laptop - must be in the same region as the data. This is a fundamental concept of _analysis in place_. **NASA cloud-hosted data is in Amazon Region us-west2. So your compute instance must also be in us-west2.** If we wanted to use data stored in another region, to use direct access for that data, we would start a compute instance in that region.\n",
2523
"\n",
2624
"As an example data collection, we use ICESat-2 Land Ice Height (ATL06) over the Juneau Icefield, AK, for March 2003. ICESat-2 data granules, including ATL06, are stored in HDF5 format. We demonstrate how to open an HDF5 granule and access data variables using `xarray`. Land Ice Heights are then plotted using `hvplot`. \n",
2725
"\n",
28-
"`earthaccess` is a package developed by Luis Lopez (NSIDC developer) to allow easy search of the NASA Common Metadata Repository (CMR) and download of NASA data collections. It can be used for programmatic search and access for both _DAAC-hosted_ and _cloud-hosted_ data. It manages authenticating using Earthdata Login credentials which are then used to obtain the S3 tokens that are needed for S3 direct access. https://github.com/nsidc/earthaccess\n",
26+
"`earthaccess` is a community-developed Python package developed by Luis Lopez (NSIDC developer) to allow easy search of the NASA Common Metadata Repository (CMR) and download of NASA data collections. It can be used for programmatic search and access for both _on-premises-hosted_ and _cloud-hosted_ data. It manages authenticating using Earthdata Login credentials which are then used to obtain the S3 tokens that are needed for S3 direct access. https://github.com/nsidc/earthaccess\n",
2927
"\n",
3028
"\n",
3129
"### **Credits**\n",
3230
"\n",
33-
"The notebook was created by Andy Barrett, NSIDC, updated by Jennifer Roebuck, NSIDC, and is based on notebooks developed by Luis Lopez and Mikala Beig, NSIDC.\n",
31+
"The notebook was created by Andy Barrett, NSIDC, updated by Jennifer Roebuck and Amy Steiker, NSIDC, and is based on notebooks developed by Luis Lopez and Mikala Beig, NSIDC.\n",
3432
"\n",
3533
"For questions regarding the notebook, or to report problems, please create a new issue in the [NSIDC-Data-Tutorials repo](https://github.com/nsidc/NSIDC-Data-Tutorials/issues).\n",
3634
"\n",
@@ -39,7 +37,7 @@
3937
"By the end of this demonstration you will be able to: \n",
4038
"1. use `earthaccess` to search for ICESat-2 data using spatial and temporal filters and explore search results; \n",
4139
"2. open data granules using direct access to the ICESat-2 S3 bucket; \n",
42-
"3. load a HDF5 group into an `xarray.Dataset`; \n",
40+
"3. load HDF5 data into an `xarray.Datatree` object;\n",
4341
"4. visualize the land ice heights using `hvplot`. \n",
4442
"\n",
4543
"### **Prerequisites**\n",
@@ -158,7 +156,7 @@
158156
"id": "d3957627",
159157
"metadata": {},
160158
"source": [
161-
"In this case there are 65 collections that have the keyword ICESat-2.\n",
159+
"Several dozen collections with the keyword ICESat-2 are returned in the Query object.\n",
162160
"\n",
163161
"The `search_datasets` method returns a python list of `DataCollection` objects. We can view the metadata for each collection in long form by passing a `DataCollection` object to print or as a summary using the `summary` method. We can also use the `pprint` function to Pretty Print each object.\n",
164162
"\n",
@@ -267,9 +265,9 @@
267265
"source": [
268266
"To display the rendered metadata, including the download link, granule size and two images, we will use `display`. In the example below, all 4 results are shown. \n",
269267
"\n",
270-
"The download link is `https` and can be used download the granule to your local machine. This is similar to downloading _DAAC-hosted_ data but in this case the data are coming from the Earthdata Cloud. For NASA data in the Earthdata Cloud, there is no charge to the user for egress from AWS Cloud servers. This is not the case for other data in the cloud.\n",
268+
"The download link is `https` and can be used download the granule to your local machine. This is similar to downloading data located _on-premises_ but in this case the data are coming from the Earthdata Cloud. For NASA data in the Earthdata Cloud, there is no charge to the user for egress from AWS Cloud servers. This may not be the case for other data in the cloud.\n",
271269
"\n",
272-
"Note the `[None, None, None, None]` that is displayed at the end can be ignored, it has no meaning in relation to the metadata."
270+
"Note the `[None, None, None, None]` that is displayed at the end can be ignored; it has no meaning in relation to the metadata."
273271
]
274272
},
275273
{
@@ -291,40 +289,56 @@
291289
"source": [
292290
"## Use Direct-Access to open, load and display data stored on S3\n",
293291
"\n",
294-
"Direct-access to data from an S3 bucket is a two step process. First, the files are opened using the `open` method. The `auth` object created at the start of the notebook is used to provide Earthdata Login authentication and AWS credentials.\n",
295-
"\n",
296-
"The next step is to load the data. In this case, data are loaded into an `xarray.Dataset`. Data could be read into `numpy` arrays or a `pandas.Dataframe`. However, each granule would have to be read using a package that reads HDF5 granules such as `h5py`. `xarray` does this all _under-the-hood_ in a single line but for a single group in the HDF5 granule*.\n",
297-
"\n",
298-
"*ICESat-2 measures photon returns from 3 beam pairs numbered 1, 2 and 3 that each consist of a left and a right beam. In this case, we are interested in the left ground track (gt) of beam pair 1. "
292+
"Direct-access to data from an S3 bucket is a two step process. First, the files are opened using the `open` method. The `auth` object created at the start of the notebook is used to provide Earthdata Login authentication and AWS credentials. "
299293
]
300294
},
301295
{
302296
"cell_type": "code",
303297
"execution_count": null,
304-
"id": "11205bbb",
298+
"id": "e50bf87d-1c83-42c7-b645-3948b15b7675",
305299
"metadata": {},
306300
"outputs": [],
307301
"source": [
308-
"files = earthaccess.open(results)\n",
309-
"ds = xr.open_dataset(files[1], group='/gt1l/land_ice_segments')"
302+
"files = earthaccess.open(results)"
303+
]
304+
},
305+
{
306+
"cell_type": "markdown",
307+
"id": "cecdf984-ce9f-41c0-946b-3a0fa1ce40bc",
308+
"metadata": {},
309+
"source": [
310+
"The next step is to load the data. `xarray.DataTree` objects allow us to work with hierarchical data structures and file formats such as HDF5, Zarr and NetCDF4 with groups. \n",
311+
"\n",
312+
"We use `xr.open_datatree` to open the ATL06 data. We add the `phony_dims=\"sort\"` option because data variables in several groups including `ancillary_data` do not have any assigned dimension scales. `xarray` names dimensions `phony_dim0`, `phony_dim1`, etc."
310313
]
311314
},
312315
{
313316
"cell_type": "code",
314317
"execution_count": null,
315-
"id": "75881751",
318+
"id": "013136a2-80fd-4625-aca8-1a64775c9593",
316319
"metadata": {},
317320
"outputs": [],
318321
"source": [
319-
"ds"
322+
"dt = xr.open_datatree(files[1], phony_dims='sort')\n",
323+
"dt"
324+
]
325+
},
326+
{
327+
"cell_type": "markdown",
328+
"id": "56677586-a2cd-4ddb-8e57-457ced954331",
329+
"metadata": {},
330+
"source": [
331+
"We can see from the representation of the `xarray.DataTree` object `dt` that there are ten groups in the top, or \"root\", level. Clicking on Groups reveals various Metadata and Ancillary data groups as well as groups representing each of the left and right beam pairs from the ICESat-2 ATLAS instrument*. We can also see that there are no dimensions, coordinates, or data variables in the root group. Reading the data into `numpy` arrays or a `pandas.Dataframe` could be an alternative method to using `xarray.Datatree`. However, each granule (file) would have to be read first using a package that reads HDF5 files such as `h5py`. `xarray` does this all under-the-hood in a single line.\n",
332+
"\n",
333+
"*ICESat-2 measures photon returns from 3 beam pairs numbered 1, 2 and 3 that each consist of a left and a right beam. In this case, we are interested in plotting the left ground track (gt) of beam pair 1. "
320334
]
321335
},
322336
{
323337
"cell_type": "markdown",
324338
"id": "1282ce34",
325339
"metadata": {},
326340
"source": [
327-
"`hvplot` is an interactive plotting tool that is useful for exploring data."
341+
"`hvplot` is an interactive plotting tool that is useful for exploring data:"
328342
]
329343
},
330344
{
@@ -334,7 +348,7 @@
334348
"metadata": {},
335349
"outputs": [],
336350
"source": [
337-
"ds['h_li'].hvplot(kind='scatter', s=2)"
351+
"dt['/gt1l/land_ice_segments/h_li'].hvplot(kind='scatter', s=2)"
338352
]
339353
},
340354
{
@@ -347,7 +361,7 @@
347361
"We have learned how to:\n",
348362
"1. use `earthaccess` to search for ICESat-2 data using spatial and temporal filters and explore search results;\n",
349363
"2. open data granules using direct access to the ICESat-2 S3 bucket;\n",
350-
"3. load a HDF5 group into an xarray.Dataset;\n",
364+
"3. load a HDF5 group into an xarray.Datatree;\n",
351365
"4. visualize the land ice heights using hvplot."
352366
]
353367
},
@@ -394,7 +408,7 @@
394408
"name": "python",
395409
"nbconvert_exporter": "python",
396410
"pygments_lexer": "ipython3",
397-
"version": "3.10.14"
411+
"version": "3.11.11"
398412
}
399413
},
400414
"nbformat": 4,

0 commit comments

Comments
 (0)