1

I have below data summarized from many csv files.

+----------+-----+------+------+------+
|   Date   | Loc | Hour | Rain | Wind |
+----------+-----+------+------+------+
| 1-Sep-19 |   1 |    1 |  184 |   65 |
| 1-Sep-19 |   1 |    3 |  126 |   64 |
| 2-Sep-19 |   1 |    1 |  112 |   63 |
| 2-Sep-19 |   1 |    2 |  155 |    0 |
| 2-Sep-19 |   1 |    3 |  186 |   50 |
| 3-Sep-19 |   1 |    2 |  154 |   79 |
| 3-Sep-19 |   1 |    3 |  143 |   61 |
| 1-Sep-19 |   2 |    1 |  187 |   73 |
| 1-Sep-19 |   2 |    2 |  173 |   63 |
| 1-Sep-19 |   2 |    3 |  186 |   63 |
| 3-Sep-19 |   2 |    1 |  172 |   56 |
| 3-Sep-19 |   2 |    2 |  156 |   56 |
| 3-Sep-19 |   2 |    3 |  176 |   79 |
+----------+-----+------+------+------+

Where Random observations for Rain & Wind details of particular Day, Location & Hour is collected.

Note the missing lines in Day, Location & Hour.

I want to fill the missing data with 0 values. My desired final output is as below.

+----------+-----+------+------+------+
|   Date   | Loc | Hour | Rain | Wind |
+----------+-----+------+------+------+
| 1-Sep-19 |   1 |    1 |  184 |   65 |
| 1-Sep-19 |   1 |    2 |    0 |    0 |
| 1-Sep-19 |   1 |    3 |  126 |   64 |
| 2-Sep-19 |   1 |    1 |  112 |   63 |
| 2-Sep-19 |   1 |    2 |  155 |    0 |
| 2-Sep-19 |   1 |    3 |  186 |   50 |
| 3-Sep-19 |   1 |    1 |    0 |    0 |
| 3-Sep-19 |   1 |    2 |  154 |   79 |
| 3-Sep-19 |   1 |    3 |  143 |   61 |
| 1-Sep-19 |   2 |    1 |  187 |   73 |
| 1-Sep-19 |   2 |    2 |  173 |   63 |
| 1-Sep-19 |   2 |    3 |  186 |   63 |
| 2-Sep-19 |   2 |    1 |    0 |    0 |
| 2-Sep-19 |   2 |    2 |    0 |    0 |
| 2-Sep-19 |   2 |    3 |    0 |    0 |
| 3-Sep-19 |   2 |    1 |  172 |   56 |
| 3-Sep-19 |   2 |    2 |  156 |   56 |
| 3-Sep-19 |   2 |    3 |  176 |   79 |
+----------+-----+------+------+------+

So far i have tried the below code

    import pandas as pd
    import numpy as np

    df = pd.read_csv('data.csv')

    d1 = date(2019, 9, 1)
    d2 = date(2019, 9, 3)
    delta = d2 - d1

    # below indexes were created to make the code more dynamic & scalable easily 
    idx_date = [(d1 + timedelta(days=i)) for i in range(delta.days + 1)]
    idx_loc = np.arange(1,3)
    idx_hour = np.arange(1,4)

    cols =['Rain', 'Wind']

    df_filled = df.reindex(index=[idx_date,idx_loc,idx_hour], columns=cols, fill_value=0)

But, I get an error. how to solve this

1 Answer 1

1

First is necessary create MultiIndex so possible use reindex by all possible values:

df = df.set_index(['Date','Loc','Hour'])
mux = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
      Date  Loc  Hour  Rain  Wind
0   1Sep19    1     1   184    65
1   1Sep19    1     2     0     0
2   1Sep19    1     3   126    64
3   1Sep19    2     1   187    73
4   1Sep19    2     2   173    63
5   1Sep19    2     3   186    63
6   2Sep19    1     1   112    63
7   2Sep19    1     2   155     0
8   2Sep19    1     3   186    50
9   2Sep19    2     1     0     0
10  2Sep19    2     2     0     0
11  2Sep19    2     3     0     0
12  3Sep19    1     1     0     0
13  3Sep19    1     2   154    79
14  3Sep19    1     3   143    61
15  3Sep19    2     1   172    56
16  3Sep19    2     2   156    56
17  3Sep19    2     3   176    79

Your solution should be changed:

df['Date'] = pd.to_datetime(df['Date'])

idx_date = pd.date_range('2019-09-01','2019-09-03')
idx_loc = np.arange(1,3)
idx_hour = np.arange(1,4)

cols =['Rain', 'Wind']
mux = pd.MultiIndex.from_product([idx_date,idx_loc,idx_hour], 
                                 names=['Date','Loc','Hour'])
df_filled = (df.set_index(['Date','Loc','Hour'])
               .reindex(index=mux, columns=cols, fill_value=0)
               .reset_index())
print (df_filled)
         Date  Loc  Hour  Rain  Wind
0  2019-09-01    1     1   184    65
1  2019-09-01    1     2     0     0
2  2019-09-01    1     3   126    64
3  2019-09-01    2     1   187    73
4  2019-09-01    2     2   173    63
5  2019-09-01    2     3   186    63
6  2019-09-02    1     1   112    63
7  2019-09-02    1     2   155     0
8  2019-09-02    1     3   186    50
9  2019-09-02    2     1     0     0
10 2019-09-02    2     2     0     0
11 2019-09-02    2     3     0     0
12 2019-09-03    1     1     0     0
13 2019-09-03    1     2   154    79
14 2019-09-03    1     3   143    61
15 2019-09-03    2     1   172    56
16 2019-09-03    2     2   156    56
17 2019-09-03    2     3   176    79
Sign up to request clarification or add additional context in comments.

1 Comment

can you help me on this code (stackoverflow.com/questions/63908535/…)?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.