Remapping `numpy.array` with missing values

Question

I'm dealing with some large data sets - observations as a function of time - which are not continuous in time (i.e., there is a lot of missing data, where the complete record is absent). To make things fun, there are a lot of data sets, all with missing records, all at random places...

I somehow need to get the data "synchronised" in time, with missing data flagged as missing data, instead of being completely absent. I've managed to get this partially working, but I'm still having some problems.

Example:

import numpy as np

# The date range (in the format that I'm dealing with), which I define
# myself for the period in which I'm interested
dc = np.arange(2010010100, 2010010106)

# Observation dates (d1) and values (v1)
d1  = np.array([2010010100, 2010010104, 2010010105]) # date
v1  = np.array([10,         11,         12        ]) # values

# Another data set with (partially) other times
d2  = np.array([2010010100, 2010010102, 2010010104]) # date
v2  = np.array([13,         14,         15        ]) # values

# For now set -1 as fill_value
v1_filled = -1 * np.ones_like(dc)
v2_filled = -1 * np.ones_like(dc)

v1_filled[dc.searchsorted(d1)] = v1
v2_filled[dc.searchsorted(d2)] = v2

This gives me the desired result:

v1_filled = [10 -1 -1 -1 11 12]
v2_filled = [13 -1 14 -1 15 -1]

but only if the values in d1 or d2 are also in dc; if a value in d1 or d2 is not in dc the code fails because then searchsorted behaves as:

If there is no suitable index, return either 0 or N (where N is the length of a).

So for example, if I change d2 and v2 to:

d2  = np.array([2010010100, 2010010102, 2010010104, 0]) # date
v2  = np.array([13,         14,         15,         9999]) # values

The result is

[9999   -1   14   -1   15   -1]

In this case, because d2=0 is not in dc, it should discard that value, instead of inserting it at the start (or end). Any idea how to easily achieve that?

Yes, I was afraid of that.. I have a bit of a love-hate relationship with Pandas; it seems to be very useful, but I also find it a bit difficult to get started with. — Bart
– Bart, Commented Jul 29, 2016 at 20:22

bpachev · Accepted Answer · 2016-07-29 21:48:13Z

1

If you do d2 = np.intersect1d(dc, d2) before calling dc.searchsorted(d2) it will remove all elements in d2 that are not in dc.

answered Jul 29, 2016 at 21:48

bpachev

2,22217 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bart Over a year ago

I ended up using a slightly different approach (compressing the masked arrays first to remove the masked values, since not all statistics routines work well with masked arrays), but intersect1d() was indeed the missing step...

Collectives™ on Stack Overflow

Remapping `numpy.array` with missing values

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related