How to parse a XML file stored in my google drive but which stands out as a html type ?!
I save on my google Drive cloud a copie of an xml of the source: http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621 I can parsing the source but i cant'xml parsing the copie that look like a html type !! i have parsing error like: The element type "meta" must be terminated by the matching end-tag "" or Element type "a.length" must be followed by either attribute specifications, ">" or "/>" I shared it on https://drive.google.com/file/d/16kJ5Nko-waVb8s2T12LaTEKaFY01603n/view?usp=sharing to give you an access and test my script. I know that i can using cacheService and it works but for have other control of the buffering i woud try this way
function xmlParsingXmlStoreOnGoogleDrive(){
//So , this is the original xml that is good parsed
var fetched=UrlFetchApp.fetch("http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621")
var blob=fetched.getBlob();
var getAs=blob.getAs("text/xml")
var data=getAs.getDataAsString("UTF-8")
Logger.log(data.substring(1,350)); // substring to not saturate the debug display this expected code XML:
/*
?xml version="1.0" encoding="utf-8"?>
<!-- Copyright © 2019 AlloCiné -->
<movie code="265621" xmlns="http://www.allocine.net/v6/ns/">
<movieType code="4002">Long-métrage</movieType>
<originalTitle>Mise à jour sur Google play</originalTitle>
<title>Mise à jour sur Google play</title>
<keywords>Portrait of a Lady on Fire </keywords>
*/
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords); // Display the expected result :"Portrait of a Lady on Fire "
// And this my copie of the original xml, that i can't parsing
var fetched=UrlFetchApp.fetch("https://drive.google.com/file/d/1K3-9dHy-h0UoOOY5jYfiSoYPezSi55h1/view?usp=sharing")
var blob=fetched.getBlob();
var getAs=blob.getAs("text/xml")
var data=getAs.getDataAsString("UTF-8")
Logger.log(data.substring(1,350)); // substring to not saturate the debug display this non expected code HTML !:
/*
!DOCTYPE html><html><head><meta name="google" content="notranslate"><meta http-equiv="X-UA-Compatible" content="IE=edge;">
<style>@font-face{font-family:'Roboto';font-style:italic;font-weight:400;src:local('Roboto Italic'),local('Roboto-Italic'),
url(//fonts.gstatic.com/s/roboto/v18/KFOkCnqEu92Fr1Mu51xIIzc.ttf)format('truetype');}@font-face{font-fam......
*/
var xmlDocument=XmlService.parse(data); // ABORT WITH THE ERROR: Element type "a.length" must be followed by either attribute specifications, ">" or "/>"
var root=xmlDocument.getRootElement();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log(keywords);
}
I read on this similar ask :Parse XML file (which is stored on GoogleDrive) with Google app script
that "Unfortunately we can't directly get xml files in the google drive" !! Is it right and would that simply mean that I can not realize my script?
Wonderful ! You are write. Your two suggestions are working. I just made a mistake elsewhere in my code. So that solution 1 does not work anymore. That is why give a new script to test it . For my training only, because my project is safe thanks to you :)
function storeXmlOnGoogleDriveThenParsIt(url){
url=url||"http://api.allocine.fr/rest/v3/movie?media=mp4-lc&partner=YW5kcm9pZC12Mg&profile=large&version=2&code=265621"; // to test
// on my Google Drive i make a copi of the url called. (This to preserve the server from too many request.)
var bufferedXml=DriveApp.getRootFolder().searchFolders('title = "BufferFiles"').next().createFile("xmlBuffered.xml", UrlFetchApp.fetch(url).getContentText(),MimeType.PLAIN_TEXT);
var urlBufferedXml=bufferedXml.getUrl() // The new url ,of the buffered file
var fileId=urlBufferedXml.match(/https:\/\/drive.google.com\/file\/d\/(.*)\/view.*/)[1];
//Now i want to pars the buffered xml file
//[ Your seconde way to get data is working perect ! THANK YOU A LOT !!!
var data = DriveApp.getFileById(fileId).getBlob().getDataAsString();
var xmlDocument=XmlService.parse(data);
var root=xmlDocument.getRootElement();
var mynamespace=root.getNamespace();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log("keywords:"+keywords) // and parsing success ]
//[ The first way to get data was ok BUT DAMNED it now aborting ! Since modifications on the line code that create the xml, and i cant' retrieve the right code
var downloadUrlBufferedXml="https://drive.google.com/uc?id="+fileId+"&export=download";
var data = UrlFetchApp.fetch(downloadUrlBufferedXml).getContentText(); // was good but now data is here again like a html text ! :(
Logger.log("data"+data.substring(1,350)); // this show that data is HTML type and not XML type ! :(
var xmlDocument=XmlService.parse(data); // So i have Error like: The element type "meta" must be terminated by the matching end-tag "</meta>" ]
var root=xmlDocument.getRootElement();
var mynamespace=root.getNamespace();
var keywords=root.getChild("keywords",root.getNamespace()).getText();
Logger.log("keywords:"+keywords)
}