1

So I have the following CSV data. If you look at the columns, PPID is the parent process ID and PID is the process ID. I want to update my existing dataframe so that i can add a new column called PPIDName with the corresponding name of the process rather than an ID. How can I go about doing this?

Following is an example:

PID of services.exe is 768. PPID of svchost.exe is PPID as 768 (which is services.exe). I want to make a new column in this so that for every row I print out the actual name of the parent process rather than its PPID

"TreeDepth","PID","PPID","ImageFileName","Offset(V)","Threads","Handles","SessionId","Wow64","CreateTime","ExitTime"
1,768,632,"services.exe","0xac8190e52100",7,,0,False,"2021-04-01 05:05:01.000000 ", 
2,1164,768,"svchost.exe","0xac8191053340",3,,0,False,"2021-04-01 05:05:02.000000 ",
"TreeDepth","PID","PPID","ImageFileName","Offset(V)","Threads","Handles","SessionId","Wow64","CreateTime","ExitTime"
0,4,0,"System","0xac818d45d080",158,,,False,"2021-04-01 05:04:58.000000 ",
1,88,4,"Registry","0xac818d5ab040",4,,,False,"2021-04-01 05:04:54.000000 ",
1,404,4,"smss.exe","0xac818dea7040",2,,,False,"2021-04-01 05:04:58.000000 ",
0,556,548,"csrss.exe","0xac81900e4140",10,,0,False,"2021-04-01 05:05:00.000000 ",
0,632,548,"wininit.exe","0xac81901ee080",1,,0,False,"2021-04-01 05:05:00.000000 ",
1,768,632,"services.exe","0xac8190e52100",7,,0,False,"2021-04-01 05:05:01.000000 ",
2,1152,768,"svchost.exe","0xac8191034300",2,,0,False,"2021-04-01 05:05:02.000000 ",
2,2560,768,"svchost.exe","0xac8191485080",6,,0,False,"2021-04-01 05:05:03.000000 ",
2,1668,768,"svchost.exe","0xac8191238080",6,,0,False,"2021-04-01 05:05:03.000000 ",
2,1924,768,"svchost.exe","0xac819132b340",6,,0,False,"2021-04-01 05:05:03.000000 ",
2,908,768,"svchost.exe","0xac8190076080",1,,0,False,"2021-04-01 05:05:01.000000 ",
2,1164,768,"svchost.exe","0xac8191053340",3,,0,False,"2021-04-01 05:05:02.000000 ",
2,2956,768,"svchost.exe","0xac81915d5080",3,,0,False,"2021-04-01 05:05:04.000000 ",
2,652,768,"svchost.exe","0xac8194af2080",11,,0,False,"2021-04-05 21:59:50.000000 ",
2,1680,768,"svchost.exe","0xac819123a700",9,,0,False,"2021-04-01 05:05:03.000000 ",
2,1172,768,"svchost.exe","0xac8191055380",4,,0,False,"2021-04-01 05:05:02.000000 ",
2,2964,768,"svchost.exe","0xac819163e080",7,,0,False,"2021-04-01 05:05:04.000000 ",
2,4500,768,"svchost.exe","0xac8192760080",4,,0,False,"2021-04-01 05:48:25.000000 ",
2,2196,768,"svchost.exe","0xac8191ff0080",4,,0,False,"2021-04-02 01:20:04.000000 ",
2,2456,768,"svchost.exe","0xac8191333080",6,,0,False,"2021-04-01 05:05:03.000000 ",
2,1688,768,"svchost.exe","0xac819267c2c0",7,,0,False,"2021-04-01 05:48:24.000000 ",
2,1180,768,"svchost.exe","0xac8191058700",4,,0,False,"2021-04-01 05:05:02.000000 ",
2,2588,768,"spoolsv.exe","0xac81914db0c0",15,,0,False,"2021-04-01 05:05:03.000000 ",
2,2716,768,"svchost.exe","0xac8192615340",4,,2,False,"2021-04-01 05:48:24.000000 ",
4
  • What does your "existing dataframe" look like? Commented May 3, 2022 at 11:54
  • I didn't do any filtering so I just read the csv into a dataframe so the existing field names within the csv as shown above is what it has dfprocs = pd.read_csv( args.path + '/PsTree.csv') Commented May 3, 2022 at 12:00
  • Could you add your expected output column for maybe the first few rows to your question? It's not clear what you're after Commented May 3, 2022 at 12:17
  • I updated my question to clarify futher. ta Commented May 3, 2022 at 12:19

2 Answers 2

1

I think I understand what you're after.

I've made a smaller df with only the relevant columns for my answer (so you can assume Another Col replaces all the other columns):

     PID  PPID ImageFileName  Another Col
0      4     0        System            1
1     88     4      Registry            2
2    404     4      smss.exe            3
3    556   548     csrss.exe            4
4    632   548   wininit.exe            5
                 ...

Firstly, I got all of the PIDs with their corresponding name, and removed any duplicates (if they exist):

df_PID = df[['PID', 'ImageFileName']].drop_duplicates()

     PID ImageFileName
0      4        System
1     88      Registry
2    404      smss.exe
3    556     csrss.exe
4    632   wininit.exe
5    768  services.exe
6   1152   svchost.exe
        ...

I then renamed these columns to PPID and PPIDName, to make it easier to merge onto the original df to get the desired result. That and the merge are below:

df_PID.columns = ['PPID', 'PPIDName']
df = df.merge(df_PID, on='PPID', how='left')

This gives the below output, which I think is what you want:

     PID  PPID ImageFileName  Another Col      PPIDName
0      4     0        System            1           NaN
1     88     4      Registry            2        System
2    404     4      smss.exe            3        System
3    556   548     csrss.exe            4           NaN
4    632   548   wininit.exe            5           NaN
5    768   632  services.exe            6   wininit.exe
6   1152   768   svchost.exe            7  services.exe
7   2560   768   svchost.exe            8  services.exe
8   1668   768   svchost.exe            9  services.exe
9   1924   768   svchost.exe           10  services.exe
                          ...
Sign up to request clarification or add additional context in comments.

1 Comment

Pretty good ! yep
0

This does the job,

ppid_name = df.loc[df["PID"].isin(df["PPID"]), ["PID", "ImageFileName"]].set_index("PID", drop = False)
replace_with = (ppid_name["PID"].astype(str) + "_" + ppid_name["ImageFileName"]).to_dict()
df["PPID"] = df["PPID"].replace(replace_with)

Output -

TreeDepth PID PPID ImageFileName Offset(V) Threads Handles SessionId Wow64 CreateTime ExitTime
0 0 4 0 System 0xac818d45d080 158 nan nan False 2021-04-01 05:04:58.000000 nan
1 1 88 4_System Registry 0xac818d5ab040 4 nan nan False 2021-04-01 05:04:54.000000 nan
2 1 404 4_System smss.exe 0xac818dea7040 2 nan nan False 2021-04-01 05:04:58.000000 nan
3 0 556 548 csrss.exe 0xac81900e4140 10 nan 0.0 False 2021-04-01 05:05:00.000000 nan
4 0 632 548 wininit.exe 0xac81901ee080 1 nan 0.0 False 2021-04-01 05:05:00.000000 nan

6 Comments

That is not the one I'm looking for. Following is an example services.exe has the PID of 768 and svchost.exe has its PPID as 768. I want to make a new column in this so that for every row I print out the actual name of the parent process rather than its PPID 1,768,632,"services.exe","0xac8190e52100",7,,0,False,"2021-04-01 05:05:01.000000 ", 2,1164,768,"svchost.exe","0xac8191053340",3,,0,False,"2021-04-01 05:05:02.000000 ",
@universepp Update the question itself so everyone can understand it more clearly.
@universepp From what I understand you want the Parent name with the PPID for each row?
Yes. that is what i'm trying to do
@universepp I have updated the answer. It should work now! And thanks for such an awesome question. It helped me in learning a few new things!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.