As I was searching for code options to regrid data instead of relying on tools, I discovered two popular methods: SciPy and xarray.
To explore and compare their capabilities, I downloaded single day 2-meter temperature data (T2m) from the ERA5 satellite. The original data had a resolution of 25 km, and I regridded it to a finer resolution of 1 km using both methods. Below, I discuss the process and findings.
What is Regridding?
Regridding is the process of interpolating (Interpolation is the process of estimating unknown values within a range of known data points) data from one spatial grid to another. This is often required when working with datasets from different sources or when finer resolution is needed for specific applications like climate modeling or geographic analysis.
Dataset
The dataset I used contains T2m data at 25 km resolution over India, dowloaded through ERA5 satellite. The original file is in NetCDF format, and I defined new latitude and longitude bounds (latitude=4 to 40 and longitude=60 to 100, to cover India) to create a target grid with a resolution of 1 km.
‘t2m’ for a single day (ERA5 satellite):
xarray & Scipy
xarray is a Python library for working with labeled multi-dimensional arrays, offering powerful tools for analysis and metadata handling, especially with NetCDF files.
SciPy is a scientific computing library for Python, providing numerical tools like interpolation (griddata) for irregular or regular grid data.
Methods Used by xarray & Scipy
Both xarray and scipy.interpolate uses methods like linear, nearest and cubic, let see what does they mean:
1. Linear Interpolation(Method Name: ‘linear’): Linear interpolation calculates the value at an unknown point by linearly combining the values of surrounding points. It assumes the data forms straight lines or planes between points, resulting in smooth but piecewise surfaces.
2. Nearest Neighbor Interpolation(Method Name: ‘nearest’): Nearest neighbor interpolation assigns the value of the closest data point to the unknown point. It creates a blocky or stepped appearance in the output, as no averaging or smoothing occurs.
3. Cubic Interpolation(Method Name: ‘cubic’): Cubic interpolation uses cubic polynomials to estimate values at unknown points, producing smoother surfaces than linear interpolation. But remember, it considers more surrounding points, which can lead to overshooting in some cases.
From my findings: Cubic > Linear > Nearest
Procedure to Regrid:
Using xarray:
1. xarray.open_dataset() to load the NetCDF file containing the data.
2. Create arrays for the new latitude and longitude with the desired resolution.
3. Use Dataset.interp() to interpolate data to the new grid.
Using scipy:
1. Use xarray.open_dataset() to load the dataset and extract numpy arrays for the data, latitudes, and longitudes.
2. Create a 1D array of coordinate pairs and the corresponding data values.
3. Create a new 2D grid for the desired latitude and longitude resolution.
4. Use scipy.interpolate.griddata() to interpolate the data.
5. Convert the regridded data back into an xarray Dataset for saving and plotting.
Code:
- Scipy regridding:
import xarray as xr
import numpy as np
import geopandas as gpd
import regionmask
import os
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
input_dir = 'era5_data/test/'
output_dir = 'era5_data/test/'
os.makedirs(output_dir, exist_ok=True)
lat_bounds = (4, 40)
lon_bounds = (68, 98)
new_lats = np.arange(lat_bounds[0], lat_bounds[1] + 0.01, 0.01)
new_lons = np.arange(lon_bounds[0], lon_bounds[1] + 0.01, 0.01)
lon_grid, lat_grid = np.meshgrid(new_lons, new_lats)
for f in os.listdir(input_dir):
if f.endswith('.nc'):
file_path = os.path.join(input_dir, f)
print(f"Processing file for regridding: {file_path}")
if file_path == 'era5_data/test/1dec2024_2mtemp_era5.nc':
ds = xr.open_dataset(file_path)
clipped_ds = ds.sel(latitude=slice(lat_bounds[1], lat_bounds[0]),
longitude=slice(lon_bounds[0], lon_bounds[1]))
# gdf = gpd.read_file(india_shapefile_path)
# mask = regionmask.mask_geopandas(gdf, clipped_ds.longitude, clipped_ds.latitude)
# mask = xr.DataArray(np.logical_not(mask), coords=mask.coords, dims=mask.dims)
# masked_ds = clipped_ds.where(mask)
masked_ds = clipped_ds
variable_name = list(masked_ds.data_vars.keys())[0]
data = masked_ds[variable_name].values
lats = masked_ds.latitude.values
lons = masked_ds.longitude.values
time = masked_ds.valid_time.values if 'time' in masked_ds.dims else None
lon_flat, lat_flat = np.meshgrid(lons, lats)
points = np.array([lon_flat.flatten(), lat_flat.flatten()]).T
values = data.reshape(-1, data.shape[-1]) if time is not None else data.flatten()
if time is not None:
regridded_data = []
for t_idx in range(data.shape[-1]):
interp_data = griddata(points, values[:, t_idx], (lon_grid, lat_grid), method='linear')
regridded_data.append(interp_data)
regridded_data = np.stack(regridded_data, axis=-1)
else:
regridded_data = griddata(points, values, (lon_grid, lat_grid), method='cubic')
regridded_ds = xr.Dataset(
{
variable_name: (['latitude', 'longitude', 'valid_time'] if time is not None else ['latitude', 'longitude'],
regridded_data)
},
coords={
'latitude': new_lats,
'longitude': new_lons,
'valid_time': time if time is not None else None
}
)
# output_nc_path = os.path.join(output_dir, f"regridded_{f}")
# regridded_ds.to_netcdf(output_nc_path)
# print(f"Regridded data saved to {output_nc_path}")
plt.figure(figsize=(12, 8))
regridded_ds[variable_name].mean(dim='valid_time').plot(cmap='viridis') if time is not None else \
regridded_ds[variable_name].plot(cmap='viridis')
plt.title(f'SciPy Cubic: Regridded {variable_name} Data at 0.01° Resolution')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plot_path = os.path.join(output_dir, f"scipy_cubic_regridded_{f.replace('.nc', '.png')}")
plt.savefig(plot_path, dpi=600)
plt.show()
plt.close()
print(f"Plot saved to {plot_path}")
2. xarray regridding:
#file = '1dec2024_2mtemp_era5.nc'
import xarray as xr
import numpy as np
import geopandas as gpd
import regionmask
import os
import matplotlib.pyplot as plt
input_dir = 'era5_data/test/'
output_dir = 'era5_data/test/'
#india_shapefile_path = 'india_shp/india.shp'
os.makedirs(output_dir, exist_ok=True)
lat_bounds = (4, 40)
lon_bounds = (68, 98)
new_lats = np.arange(lat_bounds[0], lat_bounds[1] + 0.01, 0.01)
new_lons = np.arange(lon_bounds[0], lon_bounds[1] + 0.01, 0.01)
target_grid = xr.Dataset(
{
"latitude": (["latitude"], new_lats),
"longitude": (["longitude"], new_lons),
}
)
for f in os.listdir(input_dir):
if f.endswith('.nc'):
file_path = os.path.join(input_dir, f)
print(f"Processing file for regridding: {file_path}")
if file_path == 'era5_data/test/1dec2024_2mtemp_era5.nc':
ds = xr.open_dataset(file_path)
clipped_ds = ds.sel(latitude=slice(lat_bounds[1], lat_bounds[0]),
longitude=slice(lon_bounds[0], lon_bounds[1]))
# gdf = gpd.read_file(india_shapefile_path)
# mask = regionmask.mask_geopandas(gdf, clipped_ds.longitude, clipped_ds.latitude)
# mask = xr.DataArray(np.logical_not(mask), coords=mask.coords, dims=mask.dims)
# masked_ds = clipped_ds.where(mask)
masked_ds = clipped_ds
regridded_ds = masked_ds.interp(
latitude=target_grid.latitude,
longitude=target_grid.longitude,
method='nearest'
)
# output_nc_path = os.path.join(output_dir, f"regridded_{f}")
# regridded_ds.to_netcdf(output_nc_path)
# print(f"Regridded data saved to {output_nc_path}")
variable_name = list(regridded_ds.data_vars.keys())[0]
# plt.figure(figsize=(12, 8))
regridded_ds[variable_name].mean(dim='valid_time').plot(cmap='viridis')
plt.title(f'nearest_Regridded {variable_name} Data at 0.01° Resolution')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plot_path = os.path.join(output_dir, f"nearest_regridded_{f.replace('.nc', '.png')}")
plt.savefig(plot_path, dpi=600)
plt.show()
plt.close()
print(f"Plot saved to {plot_path}")
Result:
Below is the regridded output of above dataset using Scipy and xarray:
- Regridded using Scipy
2. Regridded using xarray
Comparison:
xarray is:
- Easy to use with labeled dimensions, making it beginner-friendly.
- Designed to retain metadata, preserving attributes like variable names and units.
- Optimized for large multidimensional datasets, ensuring efficient computation.
SciPy is:
- Flexible and supports irregular grids, making it suitable for non-standard use cases.
- Packed with multiple interpolation methods for versatile applications.
- Computationally slower for large datasets and does not retain metadata, requiring manual management of attributes.
The major advantage of xarray over scipy is its ability to retain metadata during interpolation, here metadata refers to auxiliary information about a dataset that describes its structure, attributes, or additional information related to the data values.
Also, you will notice that regridding with SciPy will be slower than regridding with xarray.
Conclusion
Both SciPy and xarray are excellent tools for regridding, but the choice depends on your requirements:
- Use SciPy for irregular grids or custom workflows.
- Use xarray for simplicity and labeled data with metadata preservation.