Accessing gridded data

Accessing gridded data#

Packages used#

To use python to access the ERDDAP server directly from your python script or jupyter-notebook, you will need

ERDDAPY
Xarray
netcdf4
matplotlib

Note

The package netcdf4 develop by UNIDATA is not needed in the import part of the python script. However, it is the essential package that support netCDF format output from Xarray. The package matplotlib is also not needed in the import part of the python script. It is the essential package that support quick visualization from Xarray.

In this page, we demonstrate how to extract/download data directly from a ERDDAP server and perform data processing, visualization, and export data in python environment.

Tip

Understanding of the ERDDAP server and what it provides is highly recommended before reading the following intructions.

Import python packages#

import xarray as xr
from erddapy import ERDDAP

xarray is used for data processing and netCDF file output.
erddapy is used to access the ERDDAP server.

Both package-webpages have more detail explanation on its full range of functionalities. Here we will mainly focusing on getting the data to be displayer and downloaded.

Access GridDAP type data#

In this demostration, we will be getting the gridded data of AMSRE model output from NOAA NMFS ERDDAP server

Firstly, the way to use the erddapy is to setup the destination ERDDAP server as an object in python through ERDDAP (a python class)

#### access the ERDDAP server
e = ERDDAP(
    server="https://coastwatch.pfeg.noaa.gov/erddap/",    # The URL that the ERDDAP server has
    protocol="griddap",                                   # The data type (griddap or tabledap)
    response="opendap",                                   # different output data type that provided by ERDDAP server       
)

Note

Like the comment in the code above, three most important keyword arguments (kwarg) to set for the ERDDAP class are server (The URL that the ERDDAP server is located which has the form of "https://.../erddap/"), protocol (The data type one want to get. It is either "tabledap" or "griddap"), and response (For most general use, set the kwarg as "opendap" to request the data through OPeNDAP Data Access Protocol (DAP) and its projection constraints).

By executing the above code block, we have already setup the connection with the desired ERDDAP server. To request a specific dataset on the server, we need to know the dataset_id. The fastest way to get the dataset ID is to go into the data page (e.g. https://coastwatch.pfeg.noaa.gov/erddap/griddap/nceiErsstv5_LonPM180.html). The dataset ID is shown on the second line right after institution.

Tip

One can also get the dataset ID directly from the URL shown above (e.g. https://…/nceiErsstv5_LonPM180.html).

To set the dataset_id, execute

# set the dataset id name 
#  ex:  https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplAquariusSSS3MonthV5.html
#  dataset_id = jplAquariusSSS3MonthV5
e.dataset_id = "nceiErsstv5_LonPM180"

To have a quick view and setup of the download variables and range on different dimension, execute

# griddap_initialize() help fetch the default variables and constraints available for the data
e.griddap_initialize()

# print available variables and range included in the erddap request
print(e.variables)
print('==========')
print(*e.constraints.items(),sep='\n')

['sst', 'ssta']
==========
('time>=', '2024-06-15T00:00:00Z')
('time<=', '2024-06-15T00:00:00Z')
('time_step', 1)
('depth>=', 0.0)
('depth<=', 0.0)
('depth_step', 1)
('latitude>=', -88.0)
('latitude<=', 88.0)
('latitude_step', 1)
('longitude>=', -180.0)
('longitude<=', 178.0)
('longitude_step', 1)

Note

The griddap_initialize() method is to fetch the default variables and constraints available for the specific dataset. Once the griddap_initialize() is called, the e.variables (python list) and e.constraints (python dictionary) change from None to include the default values.

From the print out above, one can see the available variables in 'nceiErsstv5_LonPM180' are 'sst'(sea surface temperature), and 'ssta' (sea surface temperature anomaly).

Subset data#

On the other hand, the e.constraints provide the default range for different dimensions. The time dimension is default to 1 time step in this example. The latitude and longitude is default to have global coverage. A regional subsetting can be setup here if needed

e.constraints['latitude>='] = -60
e.constraints['latitude<='] = 60
print(*e.constraints.items(),sep='\n')

('time>=', '2024-06-15T00:00:00Z')
('time<=', '2024-06-15T00:00:00Z')
('time_step', 1)
('depth>=', 0.0)
('depth<=', 0.0)
('depth_step', 1)
('latitude>=', -60)
('latitude<=', 60)
('latitude_step', 1)
('longitude>=', -180.0)
('longitude<=', 178.0)
('longitude_step', 1)

Visualize data#

To quickly visualize the different variables (with the help of the installed matplotlib package not imported but supporting the plot method in Xarray),

ds.sst.plot()

<matplotlib.collections.QuadMesh at 0x7f1878324100>

../../../_images/9e646a09cb29b1da397b6e4d67064d9e96875e55f74b4710d634bd423e0a41da.png

ds.ssta.plot()

<matplotlib.collections.QuadMesh at 0x7f1878aa0580>

../../../_images/f5dc412795a027f591e7c38a62ec591ca51fa3b2c08da33ae5a4600ee72cbc37.png

Preprocess data#

With the help of the Xarray, we can also performed a quick zonal average of the variable sst to see the latitudinal distribution of the sea surface temperature

ds.sst.mean(dim='longitude').plot()

[<matplotlib.lines.Line2D at 0x7f187017c760>]

../../../_images/7e7c1b0468672b79eb88c0e8e8ded8c2ef7b24a5ab1cfb2033362070d7971ddf.png

The .mean(dim='longitude') is the method Xarray provide for zonal averaging.

Export to netCDF#

To output the dataset, we use the .to_netcdf() method

ds.to_netcdf('./nceiErsstv5_LonPM180.nc')

Accessing gridded data

Contents

Accessing gridded data#

Packages used#

Import python packages#

Access GridDAP type data#

Subset data#

Download data#

Visualize data#

Preprocess data#

Export to netCDF#