2

I have created a dataframe in python,

import pandas as pd
d = {'col1': ["day1", "7:00", "8:00","9:00", "10:00", "11:00",
               "day2", "7:00", "8:00","9:00", "10:00", "11:00",
               "day3", "7:00", "8:00","9:00", "10:00", "11:00"],
      'col2': [0, 4.1, 3, 3.5, 45.1, 16.9,
               0, 6.5, 4, 9.8, 33.9, 19.8,
               0, 6.9, 2.5, 7, 81.1, 13.8]}
df = pd.DataFrame(data=d)
print(df)
  col1  col2
0    day1   0.0
1    7:00   4.1
2    8:00   3.0
3    9:00   3.5
4   10:00  45.1
5   11:00  16.9
6    day2   0.0
7    7:00   6.5
8    8:00   4.0
9    9:00   9.8
10  10:00  33.9
11  11:00  19.8
12   day3   0.0
13   7:00   6.9
14   8:00   2.5
15   9:00   7.0
16  10:00  81.1
17  11:00  13.8

I want to change all those timeline data in col1 into days, for example

 col1  col2
    0    day1   0.0
    1    day1   4.1
    2    day1   3.0
    3    day1   3.5
    4    day1  45.1
    5    day1  16.9
    6    day2   0.0
    7    day2   6.5
    8    day2   4.0
    9    day2   9.8
    10   day2  33.9
    11   day2  19.8
    12   day3   0.0
    13   day3   6.9
    14   day3   2.5
    15   day3   7.0
    16   day3  81.1
    17   day3  13.8

It is just a sample data set. So I hope to have a little general answer to solve this problem. Like if we have 1000 days data set..

2
  • 1
    This is a marked improvement in your question quality. Well done, keep it up. Commented Jan 9, 2018 at 22:19
  • @cᴏʟᴅsᴘᴇᴇᴅ Thanks! Commented Jan 9, 2018 at 22:36

2 Answers 2

3
df.col1=df.col1.where(df.col1.str.isalnum()).ffill()
df
Out[242]: 
    col1  col2
0   day1   0.0
1   day1   4.1
2   day1   3.0
3   day1   3.5
4   day1  45.1
5   day1  16.9
6   day2   0.0
7   day2   6.5
8   day2   4.0
9   day2   9.8
10  day2  33.9
11  day2  19.8
12  day3   0.0
13  day3   6.9
14  day3   2.5
15  day3   7.0
16  day3  81.1
17  day3  13.8
Sign up to request clarification or add additional context in comments.

2 Comments

Or df.col1 = df.col1.where(df.col1.str.startswith('day')).ffill().
@cᴏʟᴅsᴘᴇᴇᴅ Yes ; -) Nice alternative !!
3

Try discarding the timestamps and forward-filling:

# Remove timestamps 
discard_mask = ~df.col1.str.startswith('day')
df.loc[discard_mask, "col1"] = np.nan

# Forward fill
df.ffill()

#     col1  col2
# 0   day1   0.0
# 1   day1   4.1
# 2   day1   3.0
# 3   day1   3.5
# 4   day1  45.1
# 5   day1  16.9
# 6   day2   0.0
# 7   day2   6.5
# 8   day2   4.0
# 9   day2   9.8
# 10  day2  33.9
# 11  day2  19.8
# 12  day3   0.0
# 13  day3   6.9
# 14  day3   2.5
# 15  day3   7.0
# 16  day3  81.1
# 17  day3  13.8

3 Comments

What does ~df do?
@APorter1031 negates the mask (ie: not operator)
negate: ~[true, true, false] = [false, false, true]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.