I have a text file which includes different packages (name, id, current version, new version, source) extracted from winget (winget upgrade) (I removed the first two lines and the last line)
Content of the text file:
Brave Brave.Brave 111.1.49.120 111.1.49.128 winget
Git Git.Git 2.39.2 2.40.0 winget
Notepad++ (64-bit x64) Notepad++.Notepad++ 8.5 8.5.1 winget
Spotify Spotify.Spotify 1.2.7.1277.g2b3ce637 1.2.8.907.g36fbeacc winget
Teams Machine-Wide Installer Microsoft.Teams 1.5.0.30767 1.6.00.4472 winget
PDFsam Basic PDFsam.PDFsam 5.0.3.0 5.1.1.0 winget
I am trying to use Python3 to filter out all package ids, cause the output of winget upgrade is just text based.
What I have tried so far:
import re
with open(r"C:\Users\Username\Desktop\winget_upgrade.txt", "r") as f:
for line in f:
match = re.search(r"\b([a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\.[a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\+*)\b", line)
if match:
print(match.group(1))
The output is:
Brave.Brave
Git.Git
Notepad++.Notepad
Spotify.Spotify
Microsoft.Teams
PDFsam.PDFsam
The problem here is that the package notepad is missing two + characters at the end. How can I edit my regex syntax to successfully display:
notepad++.notepad++
instead of notepad++.notepad
I think I must change something at the + filter: ()+\-.]*\+*)
But I am not sure what.
Can you help me?
Problem is caused by \b
, as transition from +
to space is not word boundary.
Use lookahead (?=\s)
instead:
import re
lines = [
'Brave Brave.Brave 111.1.49.120 111.1.49.128 winget',
'Git Git.Git 2.39.2 2.40.0 winget',
'Notepad++ (64-bit x64) Notepad++.Notepad++ 8.5 8.5.1 winget',
'Spotify Spotify.Spotify 1.2.7.1277.g2b3ce637 1.2.8.907.g36fbeacc winget',
'Teams Machine-Wide Installer Microsoft.Teams 1.5.0.30767 1.6.00.4472 winget',
'PDFsam Basic PDFsam.PDFsam 5.0.3.0 5.1.1.0 winget',
]
for line in lines:
match = re.search(r"\b([a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\.[a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\+*)(?=\s)", line)
if match:
print(match.group(1))
Output:
Brave.Brave
Git.Git
Notepad++.Notepad++
Spotify.Spotify
Microsoft.Teams
PDFsam.PDFsam