2

I am having weird problem. I have a numpy array which contain data corresponding to different dates (in date list). I also have separate list with truncate date in it for each row. Now, I need to replace the value in numpy array with NaN, if the date is less than truncate date for that row. Example below.

import numpy as np    
date = ['01-05-2020', '02-05-2020', '03-05-2020', '04-05-2020', '05-05-2020', '06-05-2020', '07-05-2020', '08-05-2020', '09-05-2020', '10-05-2020']
a = np.random.rand(4,10)
truncate_date = ['01-05-2020', '04-05-2020', '06-05-2020', '06-05-2020']

My Output a would look like:

([[0.954637 0.403668    0.63196 0.143053    0.86481 0.119429    0.266624    0.672866    0.902944    0.241125]
[np.NaN np.NaN  np.NaN  0.0207699   0.165715    0.0354149   0.944116    0.759993    0.942923    0.56149]
[np.NaN np.NaN  np.NaN  np.NaN      np.NaN      0.65055 0.948541    0.256155    0.207642    0.600534]
[np.NaN np.NaN  np.NaN  np.NaN      np.NaN     0.431788 0.387213    0.285412    0.770842    0.657336]])

Unfortunately, I am clueless to approach. Not sure if this can be done.

2 Answers 2

2

Pure numpy solution

import numpy as np
import datetime

date = [
    "01-05-2020",
    "02-05-2020",
    "03-05-2020",
    "04-05-2020",
    "05-05-2020",
    "06-05-2020",
    "07-05-2020",
    "08-05-2020",
    "09-05-2020",
    "10-05-2020",
]
a = np.random.rand(4, 10)
truncate_date = ["01-05-2020", "04-05-2020", "06-05-2020", "06-05-2020"]


date_in_datetime_format = np.array(
    [datetime.datetime.strptime(s, "%d-%m-%Y") for s in date]
)
truncate_date_in_datetime_format = np.array(
    [datetime.datetime.strptime(s, "%d-%m-%Y") for s in truncate_date]
)
nan_indices = np.greater.outer(
    truncate_date_in_datetime_format, date_in_datetime_format
)
a[nan_indices] = np.nan
Sign up to request clarification or add additional context in comments.

Comments

1

Using your syntax:

import numpy as np    
import pandas as pd
date_list = ['01-05-2020', '02-05-2020', '03-05-2020', '04-05-2020', '05-05-2020', '06-05-2020', '07-05-2020', '08-05-2020', '09-05-2020', '10-05-2020']
date_list = pd.to_datetime(date_list)
truncate_date_list = ['01-05-2020', '04-05-2020', '06-05-2020', '06-05-2020']
truncate_date_list = pd.to_datetime(truncate_date)
value_matrix = np.random.rand(4,10)

def vals_if_date_not_truncated(date_list, truncate_date_list,
                               value_matrix):
    results = []
    for value_row, truncate_date in zip(value_matrix, truncate_date_list):
        row = []
        for value, date in zip(value_row, date_list):
            if truncate_date <= date:
                row.append(value)
            else:
                row.append(np.NaN)
        results.append(row)
    return np.array(results)

results = vals_if_date_not_truncated(date_list, truncate_date_list, value_matrix)

print(results)
[[0.6085591  0.29623597 0.48222885 0.03307028 0.87412752 0.28812138
  0.10314832 0.63060118 0.58139836 0.47499239]
 [       nan        nan        nan 0.53583195 0.06113442 0.15332923
  0.24596896 0.97465439 0.64973568 0.83442661]
 [       nan        nan        nan        nan        nan 0.64793026
  0.77396558 0.58411891 0.31994605 0.50118944]
 [       nan        nan        nan        nan        nan 0.2483622
  0.06314673 0.12511539 0.02691487 0.57909995]]

pandas is great for converting strings to dates and comparing between the two dates.

zip is used to iterate through two or more items at once in a for loop.

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.