I've seen a few related posts about the numpy module, etc. I need to use the csv module, and it should work for this. While a lot has been written on using the csv module here, I didn't quite find the answer I was looking for. Thanks so much in advance
Essentially I have the following function/pseudocode (tab didn't copy over well...):
import csv
def copy(inname, outname):
infile = open(inname, "r")
outfile = open(outname, "w")
copying = False ##not copying yet
# if the first string up to the first whitespace in the "name" column of a row
# equals the first string up to the first whitespace in the "name" column of
# the row directly below it AND the value in the "ID" column of the first row
# does NOT equal the value in the "ID" column of the second row, copy these two
# rows in full to a new table.
For example, if inname looks like this:
ID,NAME,YEAR, SPORTS_ALMANAC,NOTES
(first thousand rows)
1001,New York Mets,1900,ESPN
1002,New York Yankees,1920,Guiness
1003,Boston Red Sox,1918,ESPN
1004,Washington Nationals,2010
(final large amount of rows until last row)
1231231231235,Detroit Tigers,1990,ESPN
Then I want my output to look like:
ID,NAME,YEAR,SPORTS_ALMANAC,NOTES
1001,New York Mets,1900,ESPN
1002,New York Yankees,1920,Guiness
Because the string "New" is the same string up to the first whitespace in the "Name" column, and the ID's are different. To be clear, I need the code to be as generalizable as possible, since a regular expression on "New" is not what I need, since the common first string could be really any string. And it doesn't matter what happens after the first whitespace (ie "Washington Nationals" and "Washington DC" should still give me a hit, as should the New York examples above...)
I'm confused because in R there is a way to do: inname$name to search easily by values in a specific row. I tried writing my script in R first, but it got confusing. So I want to stick with Python.
New York Yankeesrows with differentIDs and you want them all to have the sameID?