I have a simple Telegram Bot with python-telegram-bot package and i use pandas to import a csv:
df = pd.read_csv('data.csv')
key | text |
---|---|
00 | text1 \n text2 \n text3 |
. | some \n text |
. | other \n text |
99 | text4 \n text5 \n text6 |
Then I do a search inside the dataframe from user message:
answer = df['text'].str.contains(update.message.text, case=False)]
and the bot sends a reply message to the user:
await update.message.reply_text(answer)
but the output is showing the "\n" tag:
text1 \n text2 \n text3
and I want to show the text:
text1
text2
text3
I'm struggling with this problem. Before dataframe I used TinyDb and everything worked fine. How can I resolve?
Thanks
I try to change dtype of column to string, to export csv to list, encoding of the file.
I tried what happened in your case, but it worked for me.
import pandas as pd
# I am using json format for my case
data = [
{"key": "01", "text": "text1 \n text2 \n text3"},
{"key": "01", "text": "some \n text"},
{"key": "02", "text": "other \n text"},
{"key": "99", "text": "text4 \n text5 \n text6"},
]
df = pd.DataFrame(data)
# ...
answer = df[df["text"].str.contains("some", case=False)]
# answer = df[df["text"].str.contains(update.message.text, case=False)]
if not answer.empty:
print(answer.values[0][1].encode()) # check raw text
# out: b'some \n text'
print(type(answer.values[0][1])) # check type
# out: <class 'str'>
await update.message.reply_text(answer.values[0][1])
With csv data:
key,text
01,text1 \n text2 \n text3
01,some \n text
02,other \n text
99,text4 \n text5 \n text6
101,this is emoji \n ✅ \n \U0001F600\
# ...
df = pd.read_csv("data.csv")
answer = df[df["text"].str.contains("some", case=False)]
if not answer.empty:
print(answer.values[0][1].encode()) # check raw text
# out: b'some \n text'
print(type(answer.values[0][1])) # check type
# out: <class 'str'>
print(answer.values[0][1]) # print text
# out: some \n text
print(answer.values[0][1].replace("\\n", "\n")) # replace text
# out: some
# text
# ...
Output:
b'some \\n text'
<class 'str'>
some \n text
some
text
When we work with csv, we will get \\n
, we can change \\n
to \n
to get the result we need.
print(answer.values[0][1].replace("\\n", "\n")) # replace text
Working with unicode, we need a unicode escape sequence and replace it with Unicode characters using the unicode_escape
codec.
A simple way we can use a regex expression like that:
import re
def replace_unicode_escape(text):
def replace(match):
return match.group(0).encode().decode("unicode_escape")
text = re.sub(r"\\n", "\n", text)
return re.sub(r"\\U[0-9a-fA-F]{8}", replace, text)
# ...
if not answer.empty:
# ...
print(replace_unicode_escape(answer.values[0][1])) # replace text
# out: this is emoji
# ✅
# 😀
# ...
Output:
b'this is emoji \\n \xe2\x9c\x85 \\n \\U0001F600'
<class 'str'>
this is emoji \n ✅ \n \U0001F600
this is emoji
✅
😀