We’d like to remind Forumites to please avoid political debate on the Forum.

This is to keep it a safe and useful space for MoneySaving discussions. Threads that are – or become – political in nature may be removed in line with the Forum’s rules. Thank you for your understanding.

📨 Have you signed up to the Forum's new Email Digest yet? Get a selection of trending threads sent straight to your inbox daily, weekly or monthly!

PDF to Excel

Robm1955
Robm1955 Posts: 553 Forumite
Part of the Furniture 100 Posts Photogenic
edited 17 May 2019 at 10:13AM in Techie Stuff
I know I have asked about Transpose before, but I'm stuggling with this one. I want to turn a pdf into an Excel spreadsheet, with the coloured text below, as 5 colums, but can't find the quickest way to do it. They seem to have changed the layout of the table/pdf this time.


FRANKENSTEINS PORTER 5.4 BLACK DB
Full favoured porter. Strawberries, liquorice, citrus and cofee
all present. Smooth to the touch with a strong mouth feel.
ABSTRACT JUNGLE (Langley Mill)
BLUE BOAR 4.5 RUBY MB1
Dark with a hoppy aftertaste.
STURDY IPA 5.6 PALE MB1
A classic style IPA using pale malt and lots of big hitting
American hops. Resinous pine and citrusfruits.

HARE OF THE HILL on hand pump 4.6 GOLDEN DB
Strawberries and cream. Hazy sunshine with a whisper of
[FONT=&quot]raspberry. A hoppy golden ale. [/FONT]
There are 14 pages, all set out like this. There is a heading for the brewery as well, but that doesn't appear on every line, so I can add that later. TIA.

Comments

  • John_Gray
    John_Gray Posts: 5,845 Forumite
    Part of the Furniture 1,000 Posts Name Dropper Photogenic
    You might make it a bit more intelligible if you surrounded the required text with the tags [ code] and [ /code] (without the blank space after each left-square-bracket).

    As it stands I have no idea where the five columns should be.
  • Cornucopia
    Cornucopia Posts: 16,513 Forumite
    Part of the Furniture 10,000 Posts Name Dropper Photogenic
    edited 17 May 2019 at 4:15PM
    I imagine the OP wants these 5 data items isolated into columns:-

    1/ FRANKENSTEINS PORTER
    2/ 5.4
    3/ BLACK
    4/ DB
    5/ Full favoured porter. Strawberries, liquorice, citrus and coffee all present. Smooth to the touch with a strong mouth feel.

    Unless there are usable separators between the column items that originate from the PDF, then I think it will be tricky to do it fully automatically.

    It might be possible to key onto the number (assuming it is consistent) to split columns 1, 2 & 3. Then maybe a list of the possible codes for item 4?

    edit: Using Regular Expression matching in Google Sheets is probably the easiest way.

    This formula matches everything up to a number that looks like X.X...
    =REGEXEXTRACT(B1,"(^.*)[0-9][.][0-9]")
    
  • Robm1955
    Robm1955 Posts: 553 Forumite
    Part of the Furniture 100 Posts Photogenic
    John_Gray wrote: »
    You might make it a bit more intelligible if you surrounded the required text with the tags [ code] and [ /code] (without the blank space after each left-square-bracket).

    As it stands I have no idea where the five columns should be.
    Sorry if I wasn't clear. Thanks Cornucopia, that's what I want.
  • Cornucopia
    Cornucopia Posts: 16,513 Forumite
    Part of the Furniture 10,000 Posts Name Dropper Photogenic
    If your source data comes in continuous lines then I think the idea above may be your best chance of splitting it without too much effort.

    As above:
    =REGEXEXTRACT(B1,"(^.*)[0-9][.][0-9]")
    
    will isolate the text in front of any number that is formatted as X.Y.

    If that text is placed in C1, then the formula below will isolate the number itself:
    =REGEXEXTRACT(B1,"^" & C1 & "([0-9][.][0-9])")
    

    This formula isolates everything after the number:-
    =REGEXEXTRACT(B1,"^" & C1 & "[0-9][.][0-9](.*$)")
    

    The next step depends on how many options there are for the code that is "DB" in the first item.
Meet your Ambassadors

🚀 Getting Started

Hi new member!

Our Getting Started Guide will help you get the most out of the Forum

Categories

  • All Categories
  • 351.7K Banking & Borrowing
  • 253.4K Reduce Debt & Boost Income
  • 454K Spending & Discounts
  • 244.7K Work, Benefits & Business
  • 600.1K Mortgages, Homes & Bills
  • 177.3K Life & Family
  • 258.4K Travel & Transport
  • 1.5M Hobbies & Leisure
  • 16.2K Discuss & Feedback
  • 37.6K Read-Only Boards

Is this how you want to be seen?

We see you are using a default avatar. It takes only a few seconds to pick a picture.