pythonfacebookunicodeencodingcharacter-encoding

Detect hindi encoding, response received from Facebook API in Python


I m trying to access a post on a Facebook Page which has the content in Hindi. So the raw response that I get from Facebook API is as below.

In this response the message is actually in Hindi, how do I detect the encoding of the message and print it in Hindi?

  {
     "id": "182929845081087_579535732087161",
     "from": {
        "id": "182929845081087",
        "category": "Non-profit organization",
        "name": "Brahma Kumaris"
     },
     "message": "\u092e\u0941\u0930\u0932\u0940 \u0938\u093e\u0930:-     \u092e\u0940\u0920\u0947 \u092c\u091a\u094d\u091a\u0947-\u0924\u0941\u092e\u094d\u0939\u0947\u0902 \u0905\u0928\u094d\u0924 \u0924\u0915 \u092f\u0939 \u092e\u0940\u0920\u0940 \u0928\u0949\u0932\u0947\u091c \u0938\u0941\u0928\u0924\u0947 \u0930\u0939\u0928\u093e \u0939\u0948 \u091c\u092c \u0924\u0915 \u091c\u0940\u0928\u093e \u0939\u0948-\u092a\u0922\u093c\u0928\u093e \u0914\u0930 \u092f\u094b\u0917 \u0938\u0940\u0916\u0928\u093e \u0939\u0948      \n \u092a\u094d\u0930\u0936\u094d\u0928:- \u092c\u093e\u092a \u0915\u0947 \u0938\u093e\u0925-\u0938\u093e\u0925 \u0924\u0941\u092e \u092c\u091a\u094d\u091a\u0947 \u0915\u093f\u0938 \u0938\u0947\u0935\u093e \u0915\u0947 \u0928\u093f\u092e\u093f\u0924\u094d\u0924 \u092c\u0928\u0947 \u0939\u0941\u090f \u0939\u094b?  \n \u0909\u0924\u094d\u0924\u0930:- \u091c\u0948\u0938\u0947 \u092c\u093e\u092a \u0938\u093e\u0930\u0947 \u0935\u093f\u0936\u094d\u0935 \u0915\u094b \u0932\u093f\u092c\u0930\u0947\u091f \u0915\u0930\u0924\u0947 \u0939\u0948\u0902, \u0938\u092c \u092a\u0930 \u092c\u094d\u0932\u093f\u0938 \u0915\u0930\u0924\u0947 \u0939\u0948\u0902, \u092a\u0940\u0938 \u092e\u0947\u0915\u0930 \u092c\u0928 \u092a\u0940\u0938 \u0938\u094d\u0925\u093e\u092a\u0928 \u0915\u0930\u0924\u0947 \u0939\u0948\u0902 \u0910\u0938\u0947 \u0924\u0941\u092e \u092c\u091a\u094d\u091a\u0947 \u092d\u0940 \u092c\u093e\u092a \u0915\u0947 \u0938\u093e\u0925 \u0907\u0938 \u0938\u0947\u0935\u093e \u0915\u0947 \u0928\u093f\u092e\u093f\u0924\u094d\u0924 \u0939\u094b\u0964 \u0924\u0941\u092e \u0939\u094b \u0938\u0948\u0932\u0935\u0947\u0936\u0928 \u0906\u0930\u094d\u092e\u0940\u0964 \u0924\u0941\u092e\u094d\u0939\u0947\u0902 \u092d\u093e\u0930\u0924 \u0915\u0947 \u0921\u0942\u092c\u0947 \u0939\u0941\u090f \u092c\u0947\u095c\u0947 \u0915\u094b \u0938\u0948\u0932\u0935\u0947\u091c \u0915\u0930\u0928\u093e \u0939\u0948\u0964 21 \u091c\u0928\u094d\u092e\u094b\u0902 \u0915\u0947 \u0932\u093f\u090f \u0938\u092c\u0915\u094b \u0938\u092e\u094d\u092a\u0924\u094d\u0924\u093f\u0935\u093e\u0928 \u092c\u0928\u093e\u0928\u093e \u0939\u0948\u0964 \u0910\u0938\u0940 \u0938\u0947\u0935\u093e \u0924\u0941\u092e \u092c\u091a\u094d\u091a\u094b\u0902 \u0915\u0947 \u0938\u093f\u0935\u093e\u090f \u0914\u0930 \u0915\u094b\u0908 \u0915\u0930 \u0928\u0939\u0940\u0902 \u0938\u0915\u0924\u093e\u0964 \n \u0927\u093e\u0930\u0923\u093e \u0915\u0947 \u0932\u093f\u090f \u092e\u0941\u0916\u094d\u092f \u0938\u093e\u0930:-  \n 1) \u0935\u093f\u0915\u0930\u094d\u092e\u093e\u091c\u0940\u0924 \u092c\u0928\u0928\u0947 \u0915\u0947 \u0932\u093f\u090f \u091a\u0932\u0924\u0947 \u092b\u093f\u0930\u0924\u0947 \u092c\u093e\u092a \u0915\u094b \u092f\u093e\u0926 \u0915\u0930\u0928\u0947 \u0915\u093e \u0905\u092d\u094d\u092f\u093e\u0938 \u0915\u0930\u0928\u093e \u0939\u0948\u0964 \u092f\u093e\u0926 \u0915\u093e \u091a\u093e\u0930\u094d\u091f \u091c\u0930\u0942\u0930 \u0930\u0916\u0928\u093e \u0939\u0948\u0964  \n 2) \u0905\u092a\u0928\u0940 \u0939\u0930 \u091a\u0932\u0928 \u0938\u0947 \u092e\u093e\u0924-\u092a\u093f\u0924\u093e \u0914\u0930 \u091f\u0940\u091a\u0930 \u0915\u093e \u0936\u094b \u0915\u0930\u0928\u093e \u0939\u0948\u0964 \u0935\u093f\u0928\u093e\u0936 \u0915\u093e\u0932 \u092e\u0947\u0902 \u092a\u094d\u0930\u0940\u0924 \u092c\u0941\u0926\u094d\u0927\u093f \u092c\u0928\u0915\u0930 \u0930\u0939\u0928\u093e \u0939\u0948\u0964 \u0930\u0942\u0939\u093e\u0928\u0940 \u0938\u0947\u0935\u093e \u0915\u0930\u0928\u0940 \u0939\u0948\u0964  \n \u0935\u0930\u0926\u093e\u0928:- \u0935\u093e\u092f\u0926\u094b\u0902 \u0915\u0940 \u0938\u094d\u092e\u0943\u0924\u093f \u0926\u094d\u0935\u093e\u0930\u093e \u095e\u093e\u092f\u0926\u093e \u0909\u0920\u093e\u0928\u0947 \u0935\u093e\u0932\u0947 \u0938\u0926\u093e \u092c\u093e\u092a \u0915\u0940 \u092c\u094d\u0932\u0948\u0938\u093f\u0902\u0917 \u0915\u0947 \u092a\u093e\u0924\u094d\u0930 \u092d\u0935  \n  \u091c\u094b \u092d\u0940 \u0935\u093e\u092f\u0926\u0947 \u092e\u0928 \u0938\u0947, \u092c\u094b\u0932 \u0938\u0947 \u0905\u0925\u0935\u093e \u0932\u093f\u0916\u0915\u0930 \u0915\u0930\u0924\u0947 \u0939\u094b, \u0909\u0928\u094d\u0939\u0947\u0902 \u0938\u094d\u092e\u0943\u0924\u093f \u092e\u0947\u0902 \u0930\u0916\u094b \u0924\u094b \u0935\u093e\u092f\u0926\u0947 \u0915\u093e \u092a\u0942\u0930\u093e \u092b\u093e\u092f\u0926\u093e \u0909\u0920\u093e \u0938\u0915\u0924\u0947 \u0939\u094b\u0964 \u091a\u0947\u0915 \u0915\u0930\u094b \u0915\u093f \u0915\u093f\u0924\u0928\u0947 \u092c\u093e\u0930 \u0935\u093e\u092f\u0926\u093e \u0915\u093f\u092f\u093e \u0939\u0948 \u0914\u0930 \u0915\u093f\u0924\u0928\u093e \u0928\u093f\u092d\u093e\u092f\u093e \u0939\u0948! \u0935\u093e\u092f\u0926\u093e \u0914\u0930 \u095e\u093e\u092f\u0926\u093e - \u0907\u0928 \u0926\u094b\u0928\u094b\u0902 \u0915\u093e \u092c\u0948\u0932\u0947\u0928\u094d\u0938 \u0930\u0939\u0947 \u0924\u094b \u0935\u0930\u0926\u093e\u0924\u093e \u092c\u093e\u092a \u0926\u094d\u0935\u093e\u0930\u093e \u092c\u094d\u0932\u0948\u0938\u093f\u0902\u0917 \u092e\u093f\u0932\u0924\u0940 \u0930\u0939\u0947\u0917\u0940\u0964 \u091c\u0948\u0938\u0947 \u0938\u0902\u0915\u0932\u094d\u092a \u0936\u094d\u0930\u0947\u0937\u094d\u0920 \u0915\u0930\u0924\u0947 \u0939\u094b \u0910\u0938\u0947 \u0915\u0930\u094d\u092e \u092d\u0940 \u0936\u094d\u0930\u0947\u0937\u094d\u0920 \u0939\u094b\u0902 \u0924\u094b \u0938\u092b\u0932\u0924\u093e \u092e\u0942\u0930\u094d\u0924 \u092c\u0928 \u091c\u093e\u092f\u0947\u0902\u0917\u0947\u0964  \n \u0938\u094d\u0932\u094b\u0917\u0928:- \u0938\u094d\u0935\u092f\u0902 \u0915\u094b \u0910\u0938\u093e \u0926\u093f\u0935\u094d\u092f \u0906\u0907\u0928\u093e \u092c\u0928\u093e\u0913 \u091c\u093f\u0938\u092e\u0947\u0902 \u092c\u093e\u092a \u0939\u0940 \u0926\u093f\u0916\u093e\u0908 \u0926\u0947 \u0924\u092c \u0915\u0939\u0947\u0902\u0917\u0947 \u0938\u091a\u094d\u091a\u0940 \u0938\u0947\u0935\u093e\u0964",
     "actions": [
        {
           "name": "Comment",
           "link": "http://www.facebook.com/182929845081087/posts/579535732087161"
        },
        {
           "name": "Like",
           "link": "http://www.facebook.com/182929845081087/posts/579535732087161"
        }
     ],
     "privacy": {
        "description": "Public",
        "value": "EVERYONE",
        "friends": "",
        "networks": "",
        "allow": "",
        "deny": ""
     },
     "type": "status",
     "status_type": "mobile_status_update",
     "application": {
        "name": "UpdateYou",
        "id": "351985104836764"
     },
     "created_time": "2013-05-30T03:00:08+0000",
     "updated_time": "2013-05-30T15:08:42+0000",
     "shares": {
        "count": 2
     },
     "likes": {
        "data": [
           {
              "name": "Bhumika Mahant",
              "id": "100002238635044"
           },
           {
              "name": "Kumar DrVinay",
              "id": "100002736938311"
           },
           {
              "name": "Namrata Trehan Pathria",
              "id": "100000281688593"
           },
           {
              "name": "Devesh Sharma",
              "id": "100001192346711"
           }
        ],
        "count": 37
     },
     "comments": {
        "data": [
           {
              "id": "579535732087161_6364194",
              "from": {
                 "name": "Namrata Trehan Pathria",
                 "id": "100000281688593"
              },
              "message": "Om shanti meet he baba",
              "can_remove": true,
              "created_time": "2013-05-30T15:08:42+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6363607",
              "from": {
                 "name": "Cetan Patil",
                 "id": "100003155153074"
              },
              "message": "om shanti",
              "can_remove": true,
              "created_time": "2013-05-30T11:06:27+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6363549",
              "from": {
                 "name": "Maya Ramchandani",
                 "id": "100003705148351"
              },
              "message": "Omshanti",
              "can_remove": true,
              "created_time": "2013-05-30T10:38:39+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6363525",
              "from": {
                 "name": "Subhash Bambal",
                 "id": "100002808519452"
              },
              "message": "Om Shanti",
              "can_remove": true,
              "created_time": "2013-05-30T10:29:05+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6363354",
              "from": {
                 "name": "Poonam Dhanuka",
                 "id": "100004088191006"
              },
              "message": "om shanti baba",
              "can_remove": true,
              "created_time": "2013-05-30T09:12:35+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6363232",
              "from": {
                 "name": "Hemprakash Pant",
                 "id": "100004354350224"
              },
              "message": "Om Shanti Baba",
              "can_remove": true,
              "created_time": "2013-05-30T07:45:38+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362963",
              "from": {
                 "name": "Barun Sharma",
                 "id": "100005696734282"
              },
              "message": "om shanti....My baba beloved baba sweet baba.....",
              "can_remove": true,
              "created_time": "2013-05-30T05:33:26+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362770",
              "from": {
                 "name": "Arya Singh",
                 "id": "100001924554892"
              },
              "message": "om shanti...................",
              "can_remove": true,
              "created_time": "2013-05-30T04:41:45+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362737",
              "from": {
                 "name": "Khushi Dhurve",
                 "id": "100001700564503"
              },
              "message": "Om shanti...gm...mere pyare baapdada...awm...love lots...\u003C3:-):-*:-*",
              "can_remove": true,
              "created_time": "2013-05-30T04:23:53+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362675",
              "from": {
                 "name": "NNibedita Behera",
                 "id": "100002645048155"
              },
              "message": "Om shanti baba",
              "can_remove": true,
              "created_time": "2013-05-30T03:59:49+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362654",
              "from": {
                 "name": "Sonali Supe",
                 "id": "100002108817901"
              },
              "message": "OM SHANTI MERE PYARE BABA......................",
              "can_remove": true,
              "created_time": "2013-05-30T03:52:05+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362647",
              "from": {
                 "name": "Tejsingh Gurjar",
                 "id": "100004301563182"
              },
              "message": "om shanti",
              "can_remove": true,
              "created_time": "2013-05-30T03:47:46+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362620",
              "from": {
                 "name": "Swati Sonar",
                 "id": "100002927228747"
              },
              "message": "om shanti",
              "can_remove": true,
              "created_time": "2013-05-30T03:34:13+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362605",
              "from": {
                 "name": "Megha Gulati",
                 "id": "100004777265970"
              },
              "message": "gd mrng baba.om shanti",
              "can_remove": true,
              "created_time": "2013-05-30T03:28:23+0000",
              "like_count": 0,
              "user_likes": false
           },
           {
              "id": "579535732087161_6362579",
              "from": {
                 "name": "Jay Rathod",
                 "id": "100005154643627"
              },
              "message": "OM SHANTI",
              "can_remove": true,
              "created_time": "2013-05-30T03:14:18+0000",
              "like_count": 0,
              "user_likes": false
           }
        ],
        "paging": {
           "cursors": {
              "after": "MQ==",
              "before": "MTU="
           }
        }
     }
  },

Solution

  • Correctly detecting encoding is it impossible. There are libraries that try to guess and work very well but you can't completely trust them. Normally in web environments encodings come in response headers (Content-Encoding), have you checked there?

    Then, when you know what encoding is (by guessing or by looking at charset encoding header) you then will have to parse the JSON dictionary and in the proper fields where the strings are encoding (message, for example) you may do message.decode('').

    That will return unicode decode string and you may work with that.

    Now, it seems to me that you're already getting the response decoded in unicode. The reason why I think that is that in message you get \u092e character which is DEVANAGARI LETTER MA.

    So probably you have already decode in Unicode the string and you may processing as you want in python. For instance, do something like: message.encode('utf-8') and there you'll have it encoded in utf-8.

    Hope this bring some light!