Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: Extracting number text from a pdf table  (Read 10670 times)

L3NTON

  • Newbie
  • *
  • Karma: +0/-0
  • Posts: 8
Extracting number text from a pdf table
« on: May 07, 2018, 07:02:42 AM »

I am trying to extract numbers from a table to save me typing them out.

I have tried "Export"_"Excel Workbook"_"Page Region" but it doesn't do it correctly. It provides a table but some numbers come out as letters and some as an image.

I have tried using OCR and then "PDF Content"_"Select Text" but again it is not perfect as its a bit random as to what gets highlighted.

Any suggestions? the file I am trying to get numbers from is attached. Cheers
Logged

Steve

  • Administrator
  • Sr. Member
  • *****
  • Karma: +7/-0
  • Posts: 367
    • RevuHelp
Re: Extracting number text from a pdf table
« Reply #1 on: May 07, 2018, 08:44:11 AM »

I don't see any "pretty" way to do this.  Best I could do was:

1) Run OCR on the document
2) Crop the document down to one table
3) Right click and select "Select All Text" (if you didn't crop, it will select text in both tables and the title block.  Erase content might have worked too, to "delete" the text you don't want).
4) Paste into Excel

It looks like the data came in OK - didn't check it too close for accuracy.  But table formatting is all gone.  Not sure which would be quicker - typing it all into Excel manually or having to rebuild the table structure for the pasted data.

I've attached the Excel spreadsheet I got in doing these steps on the left table.
Logged
Steve Jones
RevuHelp Forum Admin
steve@revuhelp.com
www.revuhelp.com
www.facebook.com/RevuHelp

L3NTON

  • Newbie
  • *
  • Karma: +0/-0
  • Posts: 8
Re: Extracting number text from a pdf table
« Reply #2 on: May 07, 2018, 04:51:08 PM »

Steve

Thank you for giving this a go I think your in the right lines with this, I’ll have a tinker and see what results I can manage.

Cheers
Logged