pythonpandastextblob

Detect language in pandas column in python


I would like to detect language in pandas column in python. After detecting it I want to write the language code as a column in pandas dataframe. Below is my code and what I tried. But I got an error please help.

Thank you.

  data = {'text':  ["It is a good option","Better to have this way","es un portal informático 
  para geeks","は、ギーク向けのコンピューターサイエンスポータルです"]}
  # Create DataFrame
  df = pd.DataFrame(data)
  #get the language
 
  for i in  df['text']:
  # Language Detection
  df['lang'] = TextBlob(i)

enter image description here


Solution

  • You can use langdetect library in Python for language detection.

    pip install langdetect
    
    import pandas as pd
    from langdetect import detect
    
    data = {'text':  ["It is a good option","Better to have this way","es un portal informático para geeks","は、ギーク向けのコンピューターサイエンスポータルです"]}
    
    df = pd.DataFrame(data)
    
    df['lang'] = df['text'].apply(lambda x: detect(x))