Variable with Word's Content.Text has differences from its' Set-Content'ed simple text file; contents handelted differenly by regex

Evgeny Fishgalov 0 Reputation points
2024-11-21T12:02:02.6566667+00:00

$documentText = @"

Frau

Anna Mustermanowa

Hauptstr. 1

48996 Ministadt

per beA

per Mail: anna.mustermanowa@example.com

AKTEN.NR:     SACHBEARBEITER/SEKRETARIAT    STÄDTL,

2904/24/SB    Sonja Bearbeinenko    +49 211 123190.00    21.11.2024

    Telefax:     +49 211 123190.00

    E-Mail: anwalt@ra.example.com

Superman ./. Mustermanowa

Worum es da so geht

Sehr geehrte Frau Mustermanowa,

"@

$Mandant = [regex]::match($documentText, '[^\r\n].*(?=\.\/\.)').Value

$Gegner = [regex]::match($documentText, '(?<=\.\/\.\s)[^\r\n]*').Value

$Az = [regex]::match($documentText, '\d{4}/\d{2}').Value

Write-Output "$Mandant"

Write-Output "./."

Write-Output "$Gegner"

Write-Output "$Az"

outputs


Superman

./.

Mustermanowa

2904/24

whereas


$wordApp = [Runtime.Interopservices.Marshal]::GetActiveObject('Word.Application')

$doc = $wordApp.ActiveDocument

$documentText = $doc.Content.Text

Set-Content -Path "debug.txt" -Value $documentText -Encoding UTF8

$Mandant = [regex]::match($documentText, '[^\r\n].*(?=\.\/\.)').Value

$Gegner = [regex]::match($documentText, '(?<=\.\/\.\s)[^\r\n]*').Value

$Az = [regex]::match($documentText, '\d{4}/\d{2}').Value

Write-Output "$Mandant"

Write-Output "./."

Write-Output "$Gegner"

Write-Output "$Az"

[System.Runtime.Interopservices.Marshal]::ReleaseComObject($wordApp) | Out-Null

outputs


Superman -Mail: anwalt@ra.example.com0.0049 211 123190.00       21.11.2024

./.

Mustermanowa

2904/24

here-string from the first example is generated via Set-Content -Path "debug.txt" -Value $documentText -Encoding UTF8 from the second one.

How do I achieve the same Content.Text special symbols and line breaks structure inside a variable as is archievable by Set-Content'ing it into a text file?

Basically I want the same regex behaviour in the second code sample as in the first one.

Word
Word
A family of Microsoft word processing software products for creating web, email, and print documents.
890 questions
PowerShell
PowerShell
A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
2,634 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.