pythonloaderlangchainlarge-language-modellibmagic

Langchain UnstructuredURLLoader shows Libmagic Unavailble


Attempting to use UnstructuredURLLoader but getting a 'libmagic is unavailable'.

I have:

Code:

from langchain.document_loaders import UnstructuredURLLoader
loader = UnstructuredURLLoader(
    urls = [
        "https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html",
        "https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html"
    ]
)
data = loader.load()
len(data)

Error:

libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html, exception: Invalid file. The FileType.UNK file type is not supported in partition.
libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html, exception: Invalid file. The FileType.UNK file type is not supported in partition.

Solution

  • Resolution: The path to the libmagic.dll folder in the venv has to be added to system variables.

    In my instance: D:\ds_projects\code-basic-LLM-finance-domain.venv\Lib\site-packages\magic\libmagic

    For others, it will likely be: your_path\ .venv\Lib\site-packages\magic\libmagic