langchainpy-langchain

UnstructuredURLLoader not able to see libmagic


I tried to use UnstructuredURLLoader as below

from langchain.document_loaders import UnstructuredURLLoader

loaders = UnstructuredURLLoader(urls=urls)
data = loaders.load()

but some pages report that

libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://wellfound.com/company/chorus-one, exception: Invalid file. The FileType.UNK file type is not supported in partition.

while in my conda env I seem to have it

%pip list | grep libmagic
libmagic                      1.0

but I do not have the python-libmagic. When I try to install it:

pip install python-libmagic

I keep getting error:

Collecting python-libmagic
  Using cached python_libmagic-0.4.0-py3-none-any.whl
Collecting cffi==1.7.0 (from python-libmagic)
  Using cached cffi-1.7.0.tar.gz (400 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pycparser in /opt/conda/envs/cho_env/lib/python3.10/site-packages (from cffi==1.7.0->python-libmagic) (2.21)
Building wheels for collected packages: cffi
  Building wheel for cffi (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [254 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-310
      creating build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/ffiplatform.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/cffi_opcode.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/verifier.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/commontypes.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/vengine_gen.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/setuptools_ext.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/vengine_cpy.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/recompiler.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/cparser.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/lock.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/backend_ctypes.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/__init__.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/model.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/api.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/_cffi_include.h -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/parse_c_type.h -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/_embedding.h -> build/lib.linux-x86_64-cpython-310/cffi
      running build_ext
      building '_cffi_backend' extension
      creating build/temp.linux-x86_64-cpython-310
      creating build/temp.linux-x86_64-cpython-310/c
      gcc -pthread -B /opt/conda/envs/cho_env/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/cho_env/include -fPIC -O2 -isystem /opt/conda/envs/cho_env/include -fPIC -DUSE__THREAD -I/usr/include/ffi -I/usr/include/libffi -I/opt/conda/envs/cho_env/include/python3.10 -c c/_cffi_backend.c -o build/temp.linux-x86_64-cpython-310/c/_cffi_backend.o
      In file included from c/_cffi_backend.c:274:
      c/minibuffer.h: In function ‘mb_ass_slice’:
      c/minibuffer.h:66:5: warning: ‘PyObject_AsReadBuffer’ is deprecated [-Wdeprecated-declarations]
         66 |     if (PyObject_AsReadBuffer(other, &buffer, &buffer_len) < 0)
            |     ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/genobject.h:12,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:110,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/abstract.h:343:17: note: declared here
        343 | PyAPI_FUNC(int) PyObject_AsReadBuffer(PyObject *obj,
            |                 ^~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:277:
      c/file_emulator.h: In function ‘PyFile_AsFile’:
      c/file_emulator.h:54:14: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
         54 |         mode = PyText_AsUTF8(ob_mode);
            |              ^
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h: In function ‘_my_PyUnicode_AsSingleWideChar’:
      c/wchar_helper.h:83:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
         83 |     Py_UNICODE *u = PyUnicode_AS_UNICODE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:84:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {
            |     ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:84:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {
            |     ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:84:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {
            |     ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h: In function ‘_my_PyUnicode_SizeAsWideChar’:
      c/wchar_helper.h:99:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:99:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:99:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h: In function ‘_my_PyUnicode_AsWideChar’:
      c/wchar_helper.h:118:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
        118 |     Py_UNICODE *u = PyUnicode_AS_UNICODE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c: In function ‘ctypedescr_dealloc’:
      c/_cffi_backend.c:352:23: error: lvalue required as left operand of assignment
        352 |         Py_REFCNT(ct) = 43;
            |                       ^
      c/_cffi_backend.c:355:23: error: lvalue required as left operand of assignment
        355 |         Py_REFCNT(ct) = 0;
            |                       ^
      c/_cffi_backend.c: In function ‘cast_to_integer_or_char’:
      c/_cffi_backend.c:3331:26: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);
            |                          ^~~~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c:3331:26: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);
            |                          ^~~~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c:3331:26: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);
            |                          ^~~~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c: In function ‘b_complete_struct_or_union’:
      c/_cffi_backend.c:4251:17: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
       4251 |                 do_align = PyText_GetSize(fname) > 0;
            |                 ^~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
            |                                           ^~~~~~~~~~~~~~~~~
      c/_cffi_backend.c:4283:13: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
       4283 |             if (PyText_GetSize(fname) == 0 &&
            |             ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
            |                                           ^~~~~~~~~~~~~~~~~
      c/_cffi_backend.c:4353:17: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
       4353 |                 if (PyText_GetSize(fname) > 0) {
            |                 ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
            |                                           ^~~~~~~~~~~~~~~~~
      c/_cffi_backend.c: In function ‘prepare_callback_info_tuple’:
      c/_cffi_backend.c:5214:5: warning: ‘PyEval_InitThreads’ is deprecated [-Wdeprecated-declarations]
       5214 |     PyEval_InitThreads();
            |     ^~~~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:130,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/ceval.h:122:37: note: declared here
        122 | Py_DEPRECATED(3.9) PyAPI_FUNC(void) PyEval_InitThreads(void);
            |                                     ^~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c: In function ‘b_callback’:
      c/_cffi_backend.c:5255:5: warning: ‘ffi_prep_closure’ is deprecated: use ffi_prep_closure_loc instead [-Wdeprecated-declarations]
       5255 |     if (ffi_prep_closure(closure, &cif_descr->cif,
            |     ^~
      In file included from c/_cffi_backend.c:15:
      /opt/conda/envs/cho_env/include/ffi.h:347:1: note: declared here
        347 | ffi_prep_closure (ffi_closure*,
            | ^~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      c/ffi_obj.c: In function ‘_ffi_type’:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
            |                             ^~~~~~~~~~~~~~~~
      c/_cffi_backend.c:72:25: note: in expansion of macro ‘_PyUnicode_AsString’
         72 | # define PyText_AS_UTF8 _PyUnicode_AsString
            |                         ^~~~~~~~~~~~~~~~~~~
      c/ffi_obj.c:191:32: note: in expansion of macro ‘PyText_AS_UTF8’
        191 |             char *input_text = PyText_AS_UTF8(arg);
            |                                ^~~~~~~~~~~~~~
      c/lib_obj.c: In function ‘lib_build_cpython_func’:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
            |                             ^~~~~~~~~~~~~~~~
      c/_cffi_backend.c:72:25: note: in expansion of macro ‘_PyUnicode_AsString’
         72 | # define PyText_AS_UTF8 _PyUnicode_AsString
            |                         ^~~~~~~~~~~~~~~~~~~
      c/lib_obj.c:129:21: note: in expansion of macro ‘PyText_AS_UTF8’
        129 |     char *libname = PyText_AS_UTF8(lib->l_libname);
            |                     ^~~~~~~~~~~~~~
      c/lib_obj.c: In function ‘lib_build_and_cache_attr’:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
            |                             ^~~~~~~~~~~~~~~~
      c/_cffi_backend.c:71:24: note: in expansion of macro ‘_PyUnicode_AsString’
         71 | # define PyText_AsUTF8 _PyUnicode_AsString   /* PyUnicode_AsUTF8 in Py3.3 */
            |                        ^~~~~~~~~~~~~~~~~~~
      c/lib_obj.c:208:15: note: in expansion of macro ‘PyText_AsUTF8’
        208 |     char *s = PyText_AsUTF8(name);
            |               ^~~~~~~~~~~~~
      In file included from c/cffi1_module.c:16,
                       from c/_cffi_backend.c:6636:
      c/lib_obj.c: In function ‘lib_getattr’:
      c/lib_obj.c:506:7: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        506 |     p = PyText_AsUTF8(name);
            |       ^
      In file included from c/cffi1_module.c:19,
                       from c/_cffi_backend.c:6636:
      c/call_python.c: In function ‘_get_interpstate_dict’:
      c/call_python.c:20:30: error: dereferencing pointer to incomplete type ‘PyInterpreterState’ {aka ‘struct _is’}
         20 |     builtins = tstate->interp->builtins;
            |                              ^~
      c/call_python.c: In function ‘_ffi_def_extern_decorator’:
      c/call_python.c:73:11: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
         73 |         s = PyText_AsUTF8(name);
            |           ^
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for cffi
  Running setup.py clean for cffi
Failed to build cffi
ERROR: Could not build wheels for cffi, which is required to install pyproject.toml-based projects```

How can I fix or bypass this?

Solution

  • Got the same issue. Root cause: the python-magic library does not include required binary packages for windows, mac and linux. However, the python-magic-bin fork does include them.

    Note that python-libmagic (which you have tried) would not work for me either. Go for python-magic-bin instead.

    So, try the following solution (found in this GitHub issue page) which worked for me:

    # uninstall what you initially tried, to avoid conflicts
    pip uninstall python-libmagic
    pip uninstall python-magic 
    
    # install the working one
    pip install python-magic-bin
    

    If you are using conda (instead of PyPI), then you can use conda install -c conda-forge libmagic, as per this GH issue page.