pythonpowershellsharepointxmltodictxml-to-json

Conversion of Atom or OData XML file to OData Json file using Python


I have been trying to convert the PowerShell script to Python code to download the list files from Sharepoint. Most of the coding part is completed and well executed as of now. However, when I download the file from Sharepoint to local drive with .json extension, the file content is not as expected.

The Sharepoint list content type is => content-type: application/atom+xml;type=feed;charset=utf-8, which is in xml format. Since I could not save the content in .json format, I have downloaded the file as .xml and converted it to .json using xmltodict python package which is good so far.

Here is my actual query : How can we download the xml content with .json or convert the xml file to json file without the attribute types, tags and namespaces, etc.? We need the file to be downloaded in below PowerShell script generated output format without any tags and just with the key-value pairs.

I'm just sharing the sample file content instead of copying entire stuff as it involves some sensitive data.

This is the Sharepoint web url content which is in atom xml format/Odata xml.

<?xml version="1.0" encoding="utf-8"?><feed xml:base="https://myorg.sharepoint.com/sites/pwaeng/_api/" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"

<d:Created m:type="Edm.DateTime">2018-05-09T21:21:03Z</d:Created><d:AuthorId m:type="Edm.Int32">1344</d:AuthorId><d:EditorId m:type="Edm.Int32">1344</d:EditorId><d:OData__UIVersionString>1.0</d:OData__UIVersionString><d:Attachments m:type="Edm.Boolean">false</d:Attachments><d:GUID m:type="Edm.Guid">9ef38bd1-a098-4610-98a4-dbf7488a5a27</d:GUID></m:properties></content></entry></feed>

This is Python converted json data

{"feed": {"@xml:base": "https://myorg.sharepoint.com/sites/pwaeng/_api/", "@xmlns": "http://www.w3.org/2005/Atom", "@xmlns:d": "http://schemas.microsoft.com/ado/2007/08/dataservices",

"d:Created": {"@m:type": "Edm.DateTime", "#text": "2018-05-09T21:21:03Z"}, "d:AuthorId": {"@m:type": "Edm.Int32", "#text": "1344"}, "d:EditorId": {"@m:type": "Edm.Int32", "#text": "1344"}, "d:OData__UIVersionString": "1.0", "d:Attachments": {"@m:type": "Edm.Boolean", "#text": "false"}, "d:GUID": {"@m:type": "Edm.Guid", "#text": "9ef38bd1-a098-4610-98a4-dbf7488a5a27"}}}}}}

PowerShell downloaded Json file

{"odata.metadata":"https://myorg.sharepoint.com/sites/pwaeng/_api/$metadata#SP.ListData.Program_x0020_RisksListItems","value":[{"odata.type":"SP.Data.Program_x0020_RisksListItem","odata.id":"a878d166-c19d-4c16-82b4-e150e7e49626","odata.etag":""2"","odata.editLink":"Web/Lists

"Created":"2018-05-09T21:21:03Z","AuthorId":1344,"EditorId":1344,"OData__UIVersionString":"1.0","Attachments":false,"GUID":"9ef38bd1-a098-4610-98a4-dbf7488a5a27"}]}

Below is some part of Python code. I have tried the most of the options but no luck in getting the desired output.

     listURL = webAbsoluteURL + 
    "/_api/web/lists/GetByTitle('" + List + "')/items"
   

   #print(listURL)
   count = 0
   #print(type(str(count)))
   fileName = "file_" + ListFolder.strip() + "_" + str(count) + "_" + date
   #print(fileName)
   xml_output = Filepath + "/" + fileName + ".xml"  ##USe backslash in Windows
   json_output = Filepath + "/" + fileName + ".json"
   #print(output)
   #print(userName, Password)
   url = listURL
   #ctx = ClientContext(url).with_credentials(UserCredential(userName, Password))
   #web = ctx.web.get().execute_query()
   #print("Web title: {0}".format(web.properties['Title']))
   ctx_auth = AuthenticationContext(webAbsoluteURL)
   token = ctx_auth.acquire_token_for_user(userName, Password)
   #ctx = ClientContext(webAbsoluteURL, ctx_auth)
   #print(token)
   options = RequestOptions(webAbsoluteURL)
   ctx_auth.authenticate_request(options)
   #options.headers = {
   #'accept' : 'text/html,application/xhtml+xml,application/xml',
   #'content-type': 'application/atom+xml;type=feed;charset=utf-8',
   #'X-RequestForceAuthentication' : 'true'
   #}
   response = requests.get(url, headers=options.headers, allow_redirects=True, timeout=60000)
   #print(req.status_code)
   #headers = {
   #'accept' : 'application/json;odata=verbose',
   #'content-type' : 'application/json;odata=verbose',
   #'X-RequestForceAuthentication' : 'true'
   #}
   #response = requests.get(url, allow_redirects=True, headers=headers, timeout=60000)
   #print(response.status_code)
   with open(xml_output, 'wb') as file_save:
      file_save.write(response.content)
   with open(xml_output, 'r', encoding = "UTF-8") as xml_file:
      data_dict = xmltodict.parse(xml_file.read()) # , attr_prefix='')
      xml_file.close()
      #json_data = json.dumps(data_dict, separators=(',', ':'))
      #json_data = json.dumps(data_dict, indent=2)
      json_data = json.dumps(data_dict)
   #with open(json_output, 'w') as json_file:
   #   json.dump(data_dict, json_file)
   #   json_file.close()
   with open(json_output, 'wb') as json_file:
      json_file.write(json_data.encode("UTF-8"))
      json_file.close()

Solution

  • Found the solution, Instead of going with xml to json parser(xmltodict.parse) etc., the simple solution is add this "?&$format=json" to the end of the web URL.

    XML_DATA_URL = https://myorg.sharepoint.com/sites/pwaeng/_api/projectdata/Tasks

    JSON_FORMAT_URL = https://myorg.sharepoint.com/sites/pwaeng/_api/projectdata/Tasks?&$format=json

    However, this is not working with below type of URL.

    https://myorg.sharepoint.com/sites/pwaeng/_api/web/lists/GetByTitle('Program Risks')/items

    https://myorg.sharepoint.com/sites/pwaeng/_api/web/lists/GetByTitle('Program Risks')/items?&$format=json

    If anyone has any suggestions, Please add your comment here..