phpfacebook-php-sdkfacebook-instant-articles

Extracting Caption from Image using Facebook PHP SDK Transformer


I have problems extracting attribute text from Image tag using the Facebook Instant Articles SDK Transformer

I cannot figure out the rules.json required to extract the text from alt attribute and make a caption out of it.

//MARKUP
<img src="https://upload.wikimedia.org/wikipedia/commons/8/84/Example.svg" alt="Foto By: Bla Bla"/>

//RULES.JSON
{
   "class": "ImageRule",
   "selector" : "img",
   "properties" : 
   {
      "image.url" : 
      {
         "type" : "string",
         "selector" : "img",
         "attribute": "src"
      },
      "image.caption" : 
      {
         "type" : "string",
         "selector" : "img",
         "attribute" : "alt"
      }
   }
}

Expected results are Facebook Instant Article compliant markup like:
<figure>
    <img src="https://upload.wikimedia.org/wikipedia/commons/8/84/Example.svg"/>
    <figcaption>Foto By: Bla Bla</figcaption>
</figure>

What I get is Uncaught Error: Call to a member function hasChildNodes() on string in /Facebook/InstantArticles/Transformer/Transformer.php on line 305.

Somehow image gets processed, the caption gets processed, I get the correct value but then it recursively again enters transform function passing in the extracted "alt" string and it fails because it expects an HTML Node input not a String.

Facebooks documentation on the matter is extremely vague so if anyone has some experience dealing with Facebook Instant Articles please chime in.

shitty docs can be found here:
https://developers.facebook.com/docs/instant-articles/sdk/transformer/
https://developers.facebook.com/docs/instant-articles/sdk/transformer-rules


Solution

  • main committer of SDK here.

    You can check the setup we have into the SimpleTransformerTest.php that covers exactly your need. You can also use any tests to play around with the Transformer.

    What you are doing wrong is the selector for the image.caption that should be a type of element.

    For your Rules.json it should look like:

        {
            "class": "CaptionRule",
            "selector" : "//img[@alt]",
            "properties" : {
                "caption.default": {
                    "type": "string",
                    "selector": "img",
                    "attribute": "alt"
                }
            }
        },
        {
           "class": "ImageRule",
           "selector" : "figure",
           "properties" : 
           {
              "image.url" : 
              {
                 "type" : "string",
                 "selector" : "img",
                 "attribute": "src"
              },
              "image.caption" : 
              {
                 "type" : "element",
                 "selector" : "img"
              }
           }
        }
    

    Check that I'm using a different strategy instead of going straight to the <img> element on the ImageRule, I'm picking the <figure> tag, so then we can keep the transformer intact. Note that the rules.json are applied bottom up.

    Let me know if this covers your need.