Convert PDF or Word to Excel, repeating information

Julie Waddle 0 Reputation points
2025-03-04T19:53:37.23+00:00

How can I use Power Automate to convert a pdf or word file with repeating information into Excel? I want to load components of the document into a database without having to copy / paste everything or worry about formatting. From here on, I'll use 'PDF' as default file type, but it could be either.

I have listed the progress I've made at the bottom, after the example

Example:
PDF has the following:

  • Project #1 Clean House (3/2/2025-3/3/2025)
  • Description: Sweep, mop, scrub bathroom, launder clothes & sheets
  • Status: Complete
  • Project #2 Work on Yard (3/4/2025-3/10/2025)
  • Description: Pick up sticks, rake yard, cut down dead trees, reseed
  • Status: In Progress
  • Project #3 Start Garden (3/4/2025-3/31/2025)
  • Description: Get materials, plant, water
  • Status: Not started

Excel file should have some specified data headers, then pull the text between the specifier (eg Description: or Status:) and the next paragraph break:

Number Title StartDate EndDate Description Status
1 Clean House 3/2/2025 3/3/2025 Sweep, mop, scrub bathroom, launder clothes & sheets Complete
2 Work on Yard 3/4/2025 3/10/2025 Pick up sticks, rake yard, cut down dead trees, reseed In Progress
3 Start Garden 3/4/2025 3/31/2025 Get materials, plant, water Not started

I have trained an AI well enough that it can extract the first line and have used Power Automate to insert it into Excel, but I don't know how to get the AI to iterate the extraction for each Project Number. I might also have sub-projects, like a. Pick up sticks, b. Rake yard, c. Reseed, that I'd want to assign to someone. I'd expect the Number, Title, StartDate, and EndDate to repeat for each subproject line in Excel (example at end) (if the subs have different dates or other fields, I know how to add them in, I just need to know the repeating)

Additionally, my model will not select part of a string that isn't separated by spaces. Any suggestions on how to train it to do so? Example: My model gives #1 instead of 1, (3/2/2025 instead of 3/2/2025 without the parenthesis.

Example with subproject:

Number Title StartDate EndDate Description Status Sub Assign
2 Work on Yard 3/4/2025 3/10/2025 Pick up sticks, rake yard, cut down dead trees, reseed In Progress Sticks Son
2 Work on Yard 3/4/2025 3/10/2025 Pick up sticks, rake yard, cut down dead trees, reseed In Progress Rake Dad
2 Work on Yard 3/4/2025 3/10/2025 Pick up sticks, rake yard, cut down dead trees, reseed In Progress Cut Trees Mom
Microsoft 365 Developer Program
Microsoft 365 Developer Program
Microsoft 365: Formerly Office 365, is a line of subscription services offered by Microsoft which adds to and includes the Microsoft Office product line.Developer Program: A Microsoft program designed to help developers build cross-platform productivity experiences by giving them access to a developer subscription, sample data packs, documentation, training, experts, recommendations, and community events.
63 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.