![]() ![]() ![]() Left is the PDF rendered layout - Middle is raw extraction order - Right is what we expect (from a clipboard output that uses screen layout) This is a common problem with PDF and tabular text # getting a specific page from the pdf file Many thanks in advance! :) # importing required modules I wrote the code (please see below), it can extract the data well, but the data is read in a very unusual way: (it reads in first row with two entries first, columns in order 1, 3, 4, 5, 6, 7, 8, 2, 9).ĭoes anyone have a suggestion, how I could adjust the code to make it read as a proper table? Or do I need to make the table in the pdf have lines around each cell to make it work? The first page of the pdf contains the following table: I am quite new to Python and was given a task to try to write code that would read in a pdf (generated as an output by a scientific instrument) and transform it into a csv.
0 Comments
Leave a Reply. |