It is unclear what is your data like, but yes, xarray might be what you search for.
Once your data is well-formatted as a DataArray, you can then just do:
da.resample(time="1h")
It will return a DataArrayResample object.
Usually, when resampling, the new coordinates grid doesn't match the previous grid.
Thus, from there, you need to apply one of the numerous methods of the DataArrayResample object to tell xarray how to fill this new grid.
For example, you may want to interpolate values using the original data as knots:
da.resample(time="1h").interpolate("linear")
But you can also backfill, pad, use the nearest values etc.
If you don't want to fill the new grid, use .asfreq() and new times will be set to NaN. You'll still be able to interpolate later using interpolate_na().
Your case
In your case, it seems that you are doing a down-sampling, and thus that there is an exact match between new grid coordinates and original grid coordinates.
So, methods that will work for you are any of .nearest(), .asfreq(), .interpolate() (note that .interpolate() will convert int to float).
However, since you are downsampling at exact grid knots, what you are really doing is selecting a subset of your array, so you might want to use the .sel() method instead.
Example
An example of down-sampling on exact grid points knots.
Create the data:
>>> dims = ("time", "features")
>>> sizes = (6, 3)
>>> h_step = 0.5
>>> da = xr.DataArray(
dims=dims,
data=np.arange(np.prod(sizes)).reshape(*sizes),
coords=dict(
time=pd.date_range(
"04/07/2020",
periods=sizes[0],
freq=pd.DateOffset(hours=h_step),
),
features=list(string.ascii_uppercase[: sizes[1]]),
),
)
>>> da
<xarray.DataArray (time: 6, features: 3)>
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
Coordinates:
* time (time) datetime64[ns] 2020-04-07 ... 2020-04-07T02:30:00
* features (features) <U1 'A' 'B' 'C'
>>> da.time.values
array(['2020-04-07T00:00:00.000000000',
'2020-04-07T00:30:00.000000000',
'2020-04-07T01:00:00.000000000',
'2020-04-07T01:30:00.000000000',
'2020-04-07T02:00:00.000000000',
'2020-04-07T02:30:00.000000000'],
dtype='datetime64[ns]')
Downsampling using .resample() and .nearest():
>>> da.resample(time="1h").nearest()
<xarray.DataArray (time: 3, features: 3)>
array([[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
Coordinates:
* time (time) datetime64[ns] 2020-04-07 ... 2020-04-07T02:00:00
* features (features) <U1 'A' 'B' 'C'
>>> da.resample(time="1h").nearest().time.values
array(['2020-04-07T00:00:00.000000000',
'2020-04-07T01:00:00.000000000',
'2020-04-07T02:00:00.000000000'],
dtype='datetime64[ns]')
Down-sampling by selection:
>>> dwn_step = 2
>>> new_time = pd.date_range(
"04/07/2020",
periods=sizes[0] // dwn_step,
freq=pd.DateOffset(hours=h_step * dwn_step),
)
>>> da.sel(time=new_time)
<xarray.DataArray (time: 3, features: 3)>
array([[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
Coordinates:
* time (time) datetime64[ns] 2020-04-07 ... 2020-04-07T02:00:00
* features (features) <U1 'A' 'B' 'C'
>>> da.sel(time=new_time).time.values
array(['2020-04-07T00:00:00.000000000',
'2020-04-07T01:00:00.000000000',
'2020-04-07T02:00:00.000000000'],
dtype='datetime64[ns]')
Another option to create new_time index is to merely do:
new_time = da.time[::dwn_coeff]
It is more straightforward, but you can't choose the first selected time (which can be either good or a bad, depending on your case).