Convert PDF or Word to Excel, repeating information
How can I use Power Automate to convert a pdf or word file with repeating information into Excel? I want to load components of the document into a database without having to copy / paste everything or worry about formatting. From here on, I'll use 'PDF' as default file type, but it could be either.
I have listed the progress I've made at the bottom, after the example
Example:
PDF has the following:
- Project #1 Clean House (3/2/2025-3/3/2025)
- Description: Sweep, mop, scrub bathroom, launder clothes & sheets
- Status: Complete
- Project #2 Work on Yard (3/4/2025-3/10/2025)
- Description: Pick up sticks, rake yard, cut down dead trees, reseed
- Status: In Progress
- Project #3 Start Garden (3/4/2025-3/31/2025)
- Description: Get materials, plant, water
- Status: Not started
Excel file should have some specified data headers, then pull the text between the specifier (eg Description: or Status:) and the next paragraph break:
Number | Title | StartDate | EndDate | Description | Status |
---|---|---|---|---|---|
1 | Clean House | 3/2/2025 | 3/3/2025 | Sweep, mop, scrub bathroom, launder clothes & sheets | Complete |
2 | Work on Yard | 3/4/2025 | 3/10/2025 | Pick up sticks, rake yard, cut down dead trees, reseed | In Progress |
3 | Start Garden | 3/4/2025 | 3/31/2025 | Get materials, plant, water | Not started |
I have trained an AI well enough that it can extract the first line and have used Power Automate to insert it into Excel, but I don't know how to get the AI to iterate the extraction for each Project Number. I might also have sub-projects, like a. Pick up sticks, b. Rake yard, c. Reseed, that I'd want to assign to someone. I'd expect the Number, Title, StartDate, and EndDate to repeat for each subproject line in Excel (example at end) (if the subs have different dates or other fields, I know how to add them in, I just need to know the repeating)
Additionally, my model will not select part of a string that isn't separated by spaces. Any suggestions on how to train it to do so? Example: My model gives #1 instead of 1, (3/2/2025 instead of 3/2/2025 without the parenthesis.
Example with subproject:
Number | Title | StartDate | EndDate | Description | Status | Sub | Assign |
---|---|---|---|---|---|---|---|
2 | Work on Yard | 3/4/2025 | 3/10/2025 | Pick up sticks, rake yard, cut down dead trees, reseed | In Progress | Sticks | Son |
2 | Work on Yard | 3/4/2025 | 3/10/2025 | Pick up sticks, rake yard, cut down dead trees, reseed | In Progress | Rake | Dad |
2 | Work on Yard | 3/4/2025 | 3/10/2025 | Pick up sticks, rake yard, cut down dead trees, reseed | In Progress | Cut Trees | Mom |