0

I have to dataframes (df), df1 contains countries with the number infections over time (2000+ rows) and df2 contains countries with population numbers (200 rows).

I have been trying to get the population number from df2 to df1 in order to transform the infections to infection density (?) over time.

In my mind I have to iterate over the rows of df1 and check the Country column per index to df2. If the result is True I can copy the the population from df2 to df1. I have tried multiple approaches (just one below) but am at a loss right now :(...could someone give me a push in the right direction?

for index, row in df2.iterrows():
   df_test = df1['Country'].str.contains(row[0])

Edit update with df1, df2 and preferred outcome: df1

   ObservationDate  Country/Region  Confirmed
0        -2.118978       Hong Kong        0.0
1        -2.118978           Japan        2.0
2        -2.118978           Macau        1.0
3        -2.118978  Mainland China      547.0
4        -2.118978     South Korea        1.0                  

df2

                 0             1
0             China  1.401580e+09
1             India  1.359321e+09
2  United States[c]  3.293798e+08
3         Indonesia  2.669119e+08
4            Brazil  2.111999e+08

df_preferred

   ObservationDate  Country/Region  Confirmed  Population
0        -2.118978       Hong Kong        0.0
1        -2.118978           Japan        2.0
2        -2.118978           Macau        1.0
3        -2.118978  Mainland China      547.0  1.401580e+09
4        -2.118978     South Korea        1.0  
2
  • You do not give enough element for me to give any code, but this looks like a use case for merge. Commented Mar 9, 2020 at 9:05
  • can you update your question with two data frames and the result you are expecting to get? Commented Mar 9, 2020 at 9:06

2 Answers 2

1

Assume that your both DataFrames are as follows:

  Country        Date  Infection
0   Aaaaa  2020-03-02         10
1   Aaaaa  2020-03-04         20
2   Bbbbb  2020-03-02         15
3   Bbbbb  2020-03-04         20
4   Ccccc  2020-03-02         12
5   Ccccc  2020-03-04         40

  Country  Population
0   Aaaaa    10000000
1   Bbbbb    35200000
2   Ccccc    48700000

Then, to merge them and save the result in another DataFrame you can run:

df3 = df1.merge(df2, on='Country')

getting:

  Country        Date  Infection  Population
0   Aaaaa  2020-03-02         10    10000000
1   Aaaaa  2020-03-04         20    10000000
2   Bbbbb  2020-03-02         15    35200000
3   Bbbbb  2020-03-04         20    35200000
4   Ccccc  2020-03-02         12    48700000
5   Ccccc  2020-03-04         40    48700000

And to compute the infection rate you can execute:

df3['InfectionRate'] = df3.Infection / df3.Population
Sign up to request clarification or add additional context in comments.

1 Comment

Amazing, this tackles most of my problems, some countries are not merges but this is because their name differs between df's. Thank you!
0

I think this will do the work:

data1 = {'Country':['Germany', 'USA',"Canada", "UK"], 'Inf':[2,5,6,8]} 
data2 = {'Country':['Germany', 'USA',"Canada", "UK"], 'popul':[80,300,30,70]} 
# Creating the dataframes
df1 = pd.DataFrame(data1) 
df2 = pd.DataFrame(data2) 
# Setting the index from the column country
df2 = df2.set_index('Country')
df1 = df1.set_index('Country')
# concating the dataframes along axis 1 without sorting
pd.concat([df1,df2], axis=1, sort=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.