Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we change the DOMDocument instance that get passed in, and is this an issue? #174

Closed
Zegnat opened this issue May 26, 2018 · 2 comments
Closed
Assignees

Comments

@Zegnat
Copy link
Member

Zegnat commented May 26, 2018

See microformats/mf2py#104. For backwards compatibility parsing, the Python parser changes the DOM on the fly. I believe the PHP parser does a similar thing. It turns out that – in the case of the Python parser – the same DOM object can’t be parsed successfully a second time. The microformats in the base document have been “damaged”.

How can we best test if this is the case with our parser too? Maybe also add a test case where we check that a second parse gives the same result?

Needs investigating. Thanks @kartikprabhu for bringing this up!

(This is basically a todo for myself, therefore also assigning myself.)

@Zegnat Zegnat self-assigned this May 26, 2018
@gRegorLove
Copy link
Member

Confirmed in php-mf2 if you pass in a DOMDocument, it's modified during parsing:

Input HTML:

<div class="hentry">
    <div class="entry-content">
        <p class="entry-summary">This is a summary</p> 
        <p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
    </div>
</div>
$doc = new DOMDocument();
$doc->loadHTML($html);
echo $doc->saveHTML();
$parse = Mf2\parse($doc);
echo $doc->saveHTML();

Output (trimmed doctype and html, body elements):

<div class="hentry">
    <div class="entry-content">
        <p class="entry-summary">This is a summary</p> 
        <p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
    </div>
</div>

<div class="hentry h-entry">
    <div class="entry-content e-content">
        <p class="entry-summary p-summary">This is a summary</p> 
        <p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
    </div>
<data class="category p-category" value="mytag"></data></div>
@gRegorLove
Copy link
Member

Appears to be a simple fix: $doc = clone $input; at this line. Only tested locally with the above HTML.

Zegnat added a commit to Zegnat/php-mf2 that referenced this issue May 27, 2018
A DOMDocument instance being passed to the parser should not have
changed after parsing. This could potentially trip-up further use of
the same DOMDocument instance.

See microformats#174.
Zegnat added a commit to Zegnat/php-mf2 that referenced this issue May 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants