0

https://drive.google.com/open?id=1aQkJYojDNMjNjJYlggxbkTq-KmzALDDb

I have this file (citations.dmp) and im tryin' to insert the data separed by | into a mysql database using the following code:

import MySQLdb

file = open('citations.dmp', 'r').readlines()
list = []
for x in file:
    a = str(x.replace('\t', ''))
    a = str(a).split('|')
    a.pop(len(a) - 1)
    list.append(a)

db = MySQLdb.connect(
    host='127.0.0.1',
    user='root',
    passwd='',
    db='tururu'

)

c = db.cursor()


print('Inserting...')

query = """ INSERT INTO `citations` (`cit_id`,`cit_key`,`pubmed_id`,`medline_id`,`url`,`text`,`taxid_list`)
            VALUES (%s,%s,%s,%s,%s,%s,%s)           
        """

c.executemany(query, list)
db.commit()

db.close()

the table has the following format:

CREATE TABLE `citations` (
                  `cit_id` VARCHAR(200) NULL,
                  `cit_key` VARCHAR(200) NULL,
                  `pubmed_id` VARCHAR(200) NULL,
                  `medline_id` VARCHAR(200) NULL,
                  `url` LONGTEXT NULL,
                  `text` LONGTEXT NULL,
                  `taxid_list` LONGTEXT NULL);

For some reason i recieve the following error everytime:

Traceback (most recent call last):
  File "C:/Users/lucas/PycharmProjects/bruno/tst.py", line 27, in <module>
    c.executemany(query, list)
  File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 281, in executemany
    self._get_db().encoding)
  File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 306, in _do_execute_many
    v = values % escape(arg, conn)

TypeError: not all arguments converted during string formatting

Can you help me? i'm in the last 3 days trying to fix it.

3
  • The google docs link you provided is not public, can you fix this? Wild guess is that when you read the file you end up with the wrong number of columns, but I would need to debug to be sure. Commented Nov 27, 2017 at 19:23
  • Link is working now, sorry. Thank you for help. Commented Nov 27, 2017 at 20:17
  • try to print your list and see if all the values are correct !!! Commented Nov 28, 2017 at 8:14

1 Answer 1

0

You have a problem on this line in your import file:

  5384    |       Associate Editor IJSEM (2001) (Trichlorobacter thiogenes)       |       0       |       0       |                     |       Associate Editor, IJSEM \"Validation List no. 78 (footnote ||).\" Int. J. Syst. Evol. Micr      obiol. (2001) 51:1-2. (Note: type strain information) |       115783  |

it splits into 9 fields. Your format string for the insert query expects only 7. You need to add some validation in the file parsing to make sure the sub-lists that you push into the big import list (list variable in your code) always have exactly 7 elements, and deal with the exceptions accordingly - if the line has too few columns, either ignore it or populate with defaults, if it has too many, decide which ones are correct.

I believe in this particular case the problem is caused by the incorrect assumption that the delimiter | can never appear in the string, as it does in footnote ||. You can solve it by adding some regex logic to filter out such abnormalities.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for your time and support

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.