I want to extract the hyperlink from pptx, I know how to do it in word, but anyone knows how to extract it from pptx?
For example, I have a text below in pptx and I want to get the url https://stackoverflow.com/ :
Hello, stackoverflow
I tried to write the Python code to get the text:
from pptx import Presentation
from pptx.opc.constants import RELATIONSHIP_TYPE as RT
ppt = Presentation('data/ppt.pptx')
for i, sld in enumerate(ppt.slides, start=1):
print(f'-- {i} --')
for shp in sld.shapes:
if shp.has_text_frame:
print(shp.text)
But I just want to print the text and the URL when the text with hyperlink.
In python-pptx
, a hyperlink can appear on a Run
, which I believe is what you're after. Note that this means zero-or-more hyperlinks can appear in a given shape. Note also that a hyperlink can also appear on an overall shape, such that clicking on the shape follows the link. In that case, the text of the URL does not appear.
from pptx import Presentation
prs = Presentation('data/ppt.pptx')
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
address = run.hyperlink.address
if address is None:
continue
print(address)
The relevant sections of the documentation are here:
https://python-pptx.readthedocs.io/en/latest/api/text.html#run-objects
and here:
https://python-pptx.readthedocs.io/en/latest/api/action.html#hyperlink-objects