Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce instances when p-name is implied #6

Closed
tantek opened this issue Mar 15, 2017 · 24 comments
Closed

reduce instances when p-name is implied #6

tantek opened this issue Mar 15, 2017 · 24 comments

Comments

@tantek
Copy link
Member

tantek commented Mar 15, 2017

This is a split-off of part of http://microformats.org/wiki/microformats2-parsing-issues#implied_properties_when_an_explicit_class_is_provided that was left unresolved without consensus (since that issue was resolved with just dealing with u-url and nothing more).

Experience has shown there are a number of instances where implied p-name produces something unhelpful. Typically this happens with otherwise large microformats which for some reason omitted the name (e.g. an h-feed, or h-entry that has no author supplied name etc.).

The use-case for which implied p-name was designed was for small microformats, e.g. just a hyperlink with h-* class on it, or maybe just a simple set of nested elements (without siblings). It has some similar use-cases too. But by the point you have elements with multiple explicit properties specified inside, rarely does the current implied p-name rule produce anything useful.

One thing that would help is links to specific examples where excessive or otherwise "useless" p-names are being implied, where no p-name would actually be preferable (from a consuming code point of view).

@tantek
Copy link
Member Author

tantek commented Mar 15, 2017

From the end of the wiki discussion, one straw proposal was:

"any explicit p-* property on an element stops implied p-name"

(this sounds a bit ambiguous and could be reworded, but I think the general intent / principle is workable)

@aaronpk
Copy link
Member

aaronpk commented Mar 15, 2017

I'm not clear on what that proposal actually means. Does "stop" mean no implied name is generated at all? Does it mean that property with p-* is excluded from the implied name?

@tantek
Copy link
Member Author

tantek commented Mar 15, 2017

Either possibility could be pursued. Let's start documenting examples of existing failures / misbehaviors so we can look at specifics.

@aaronpk
Copy link
Member

aaronpk commented Mar 15, 2017

Here is an example that came up today: https://huffduffer.com/tags/indieweb

This boils down to a structure like below:

<ol class="h-feed">
  <li class="h-entry">
    <h3 class="p-name">Episode Name</h3>
    <div class="e-content">episode description...</div>
    <audio class="u-audio" src="..."></audio>
    <p class="p-author h-card">...</p>
  </li>
</ol>

The result of the current implied p-name rule is all the contents of the individual h-entrys end up being in the implied p-name of the h-feed.

@mblaney
Copy link

mblaney commented Mar 16, 2017

Would it make sense for implied p-name to skip content from h-* children? It would mean in the above case that the implied p-name for the above h-feed is empty.

@tantek
Copy link
Member Author

tantek commented Jan 5, 2018

Restating for clarity, and adding children as another way to stop implying p-name

"p-name MUST NOT be implied if there are any explicit p-* properties or any nested microformats"

Need consensus positive feedback from parser developers and one proof of implementation to proceed (and no objections obviously). Yes you can thumbs-up to indicate positive feedback 😃

@gRegorLove
Copy link
Member

Here's a Bridgy example: https://brid-gy.appspot.com/comment/twitter/miklb/948601132397588481/949306079623766016

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "author": [
                    {
                        "type": [
                            "h-card"
                        ],
                        "properties": {
                            "uid": [
                                "tag:twitter.com,2013:deaaqua"
                            ],
                            "numeric-id": [
                                "26367081"
                            ],
                            "name": [
                                "Amanda Rocksalot"
                            ],
                            "nickname": [
                                "deaaqua"
                            ],
                            "url": [
                                "https://www.instagram.com/deaaqua/",
                                "https://twitter.com/deaaqua"
                            ],
                            "photo": [
                                "https://pbs.twimg.com/profile_images/1811120958/Buttercups_and_Books_by_pinkparis1233.jpg"
                            ]
                        },
                        "value": "Amanda Rocksalot"
                    }
                ],
                "category": [
                    {
                        "type": [
                            "h-card"
                        ],
                        "properties": {
                            "uid": [
                                "tag:twitter.com,2013:GatoIshwary"
                            ],
                            "name": [
                                "Gato Ishwary"
                            ],
                            "url": [
                                "https://twitter.com/GatoIshwary"
                            ]
                        },
                        "value": "https://twitter.com/GatoIshwary"
                    },
                    {
                        "type": [
                            "h-card"
                        ],
                        "properties": {
                            "uid": [
                                "tag:twitter.com,2013:miklb"
                            ],
                            "name": [
                                "Michael Bishop"
                            ],
                            "url": [
                                "https://twitter.com/miklb",
                                "https://miklb.com/"
                            ]
                        },
                        "value": "https://twitter.com/miklb"
                    }
                ],
                "uid": [
                    "tag:twitter.com,2013:949306079623766016"
                ],
                "url": [
                    "https://twitter.com/deaaqua/status/949306079623766016"
                ],
                "video": [
                    "https://video.twimg.com/tweet_video/DSyc1j3VQAswKgR.mp4"
                ],
                "in-reply-to": [
                    "https://twitter.com/GatoIshwary/status/949250988946395137",
                    "https://miklb.com/2018/01/3012/"
                ],
                "published": [
                    "2018-01-05T15:46:21+00:00"
                ],
                "name": [
                    "tag:twitter.com,2013:949306079623766016\n  \n  2018-01-05T15:46:21+00:00\n      Amanda Rocksalot\n\n    deaaqua\n    \n\n  https://twitter.com/deaaqua/status/949306079623766016\n  \n  \n  \n   \nYour browser does not support the video tag. Click here to view directly. \n     Gato Ishwary\n    \n    \n  \n\n  \n     Michael Bishop"
                ]
            }
        }
    ],
    "rels": {},
    "debug": {
        "package": "https://packagist.org/packages/mf2/mf2",
        "version": "v0.3.2",
        "note": [
            "This output was generated from the php-mf2 library available at https://github.com/indieweb/php-mf2",
            "Please file any issues with the parser at https://github.com/indieweb/php-mf2/issues"
        ]
    }
}
@aaronpk
Copy link
Member

aaronpk commented Jan 12, 2018

I encountered a similar case today working on aaronpk/XRay#52 where a photo property seemed to come out of nowhere.

In this example, I would not expect the photo property to be implied:

<div class="h-entry"><p class="e-content p-name">Hello World <img src="example.jpg"></p></div>

From the authoring perspective, I would expect the act of defining the e-content class should prevent the img tag inside from turning into an implied property on the parent h-entry.

@Zegnat
Copy link
Member

Zegnat commented Jan 12, 2018

The implied photo in @aaronpk’s example is done through the rule:

  • else if .h-x>:only-child:not[.h-*]>img[src]:only-of-type:not[.h-*], then use that img’s src for photo

This rule looks pretty complex. It seems to exist mostly in case of some wrapper element being inserted between the root h-x and the data we want to use for extracting implied values. That’s what I assume to be the original use case because the in-between element has to be the only element within the root h-x.

Either the existence of explicit properties should stop the implied parsing, or the rules should incorporate some form of limitation in case of the wrapper element. Adding :not[.p-*]:not[.u-*]:not[.dt-*]:not[.e-*] to the :only-child in-between element selector would be an option here, but I personally don’t like making these implied values rules even bigger and more convoluted. They are already a source of many questions.

@Zegnat
Copy link
Member

Zegnat commented Jan 14, 2018

Just a thought I just had in chat when I realised e-content wasn’t going to stop a name property to be implied per @tantek’s proposed new limitation:

"p-name MUST NOT be implied if there are any explicit p-* properties or any nested microformats"

Is there any reason why we limit this to p- properties only and not all explicit properties? I still would not be able to remove the redundant p-name from my blog posts. Here is my HTML:

    <article class="h-entry">
      <main class="e-content p-name">
        <p>Post</p>
      </main>
      <footer>
        <p>Published on <time class="dt-published" datetime="2017-11-07T13:27:49+01:00">7<sup>th</sup> November 2017 13:27:49</time>.</p>
                <p class="editorlink">[<a rel="edit" href="#">edit</a>] [<a href="#" class="u-url u-uid">permalink</a>]</p>
      </footer>
    </article>

This h-entry includes an e- property, a dt- property, and two u- properties. So without p-name there are no other p- properties to stop the name from being implied.

@Zegnat
Copy link
Member

Zegnat commented Jan 14, 2018

Tantek’s home page is another example of h-entry markup that will need to keep explicitly setting p-name because it would get implied wrong otherwise:

<li class="h-entry hentry as-note">
<p class="p-name entry-title e-content entry-content article">Post</p>
<span class="info footer"><a href="#" class="dt-published published dt-updated updated u-url u-uid"><time class="value" datetime="05:11-0800">05:11</time> on <time class="value">2018-01-14</time></a></span>
</li>

It (like mine) has several other properties defined for a post but no additional p- properties.

@Zegnat
Copy link
Member

Zegnat commented Jan 28, 2018

Today I spotted another live example of a post where the parsed name will not be fixed by the proposed change. Hat tip to @gRegorLove, who linked to Upon 2020 first. See chat. The post he linked to includes a p-in-reply-to and will indeed be fixed by the proposed change. But…

@jernst’s website uses the following HTML for note posts (edited to remove non-mf2 classes and emty unrelated attributes for easier reading):

<li id="" class="h-entry">
  <a href="" title="Posts by jernst ( @jernst )" class="">
    <img alt='' src='' srcset='' class='' height='48' width='48' />
  </a>
  <h4>
    <a href="" title="Posts by jernst ( @jernst )">jernst</a>
    <span class="">
      <abbr class="dt-published" title="2018-01-24T16:56:18Z">
        08:56 <em>on</em> January 24, 2018
      </abbr>
      <span class="">
        <a href="" class="u-url" title="Permalink">Permalink</a>
        <a rel="nofollow" class="" href="">Log in to leave a Comment</a>
      </span>
      <span class="">&nbsp;</span>
    </span>
  </h4>
  <div id="" class="e-content">
    <p>
      My &#8220;invited guest post&#8221; (is that a thing?) on opportunities
      of the #IndieWeb for small businesses went live on GoDaddy&#8217;s site:
      <a href="" rel="nofollow">https://www.godaddy.com/garage/indieweb-facebook-opportunities/</a>
    </p>
  </div></li>

This has a u-url, dt-published, and e-content. Even under the current proposal, as there is no other p- property, the name will be implied and include meta data. (As shown by Loqi).

Also note that if the reply post linked by @gRegorLove had used u-in-reply-to it wouldn’t be fixed either. This is a great example of why I do not think we should limit it to p- properties. That post would not be expressing any difference in intent between u-in-reply-to and p-in-reply-to but the parsers will assign it an implied name in one case and not for the other. I think this will lead to extra confusion.

@kevinmarks
Copy link
Member

Currently all mf2 items have a name property, because of implied name. If we change this, some code using the parsers could fail if it assumes a name is present.
Not sure if this a key issue.

@Zegnat
Copy link
Member

Zegnat commented Feb 16, 2018

Documenting from yesterday’s chat, because nobody could remember this and I can’t find it elsewhere:

So I think the (as of now) latest proposed spec change would be:

  • p-name MUST NOT be implied if there are any explicit p-* or e-* properties, or any nested microformats.

Currently all mf2 items have a name property, because of implied name. If we change this, some code using the parsers could fail if it assumes a name is present.

Here too I will just document an answer given in chat, this time by @aaronpk:

may just mean it has to be released as a major version number [of the parsers]

Assuming something like semver is being used, any major version bump should signify possible API changes to the user. I too don’t think that would be an issue.

There might be an issue if someone is using parsers-as-a-service, e.g. always getting their mf2 parser output from php.microformats.io. But I don’t think anyone ever advertised their online parsers as a service?

@aaronpk
Copy link
Member

aaronpk commented Feb 16, 2018

I (and likely others) use xray.p3k.io as a service, so I will have to consider what to do in that case. It doesn't return the Microformats JSON, it converts it to its jf2 format first. I may just return an empty string for name if there is no mf2 name property.

@Zegnat
Copy link
Member

Zegnat commented Feb 16, 2018

It doesn’t return the Microformats JSON, it converts it to its jf2 format first.

It might be worth opening an issue on jf2 to see if they want to keep an explicit name property. Their “author” syntax says it “consists of a name” and more, but that’s not marked as a MUST. For a jf2feed on the other hand name is a SHOULD.

The real question is, do you see any reasons for postponing this change because of your use of a mf2 parser as a service? I think not?

@aaronpk
Copy link
Member

aaronpk commented Feb 16, 2018

Nope, wasn't intending to hold things up, just wanted to put that there for the record.

I agree with the current proposal of having p-* and e-* and h-* stop the implied name.

@kartikprabhu
Copy link
Member

Is there a consensus on this issue? If yes, I can look into adding this to mf2py.

@Zegnat
Copy link
Member

Zegnat commented Feb 17, 2018

At @kartikprabhu’s request here are some super simplified examples of HTML where unwanted metadata (a footer of an article) would be swept up in the name even though I only want content or like-of to show. Before and afters have been included. Output taken from the PHP parser.

Case with e-content and no p-name

HTML

<article class="h-entry">
  <div class="e-content">
    <p>Wanted content.</p>
  </div>
  <footer>
    <p>Footer to be ignored.</p>
  </footer>
</article>

Current output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "content": [
                    {
                        "html": "\r\n    <p>Wanted content.</p>\r\n  ",
                        "value": "Wanted content."
                    }
                ],
                "name": [
                    "Wanted content. \r\n   \r\n  Footer to be ignored."
                ]
            }
        }
    ],
    "rels": {}
}

Expected output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "content": [
                    {
                        "html": "\r\n    <p>Wanted content.</p>\r\n  ",
                        "value": "Wanted content."
                    }
                ]
            }
        }
    ],
    "rels": {}
}

Case with p-content and no p-name

HTML

<article class="h-entry">
  <div class="p-content">
    <p>Wanted content.</p>
  </div>
  <footer>
    <p>Footer to be ignored.</p>
  </footer>
</article>

Current output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "content": [
                    "Wanted content."
                ],
                "name": [
                    "Wanted content. \r\n   \r\n  Footer to be ignored."
                ]
            }
        }
    ],
    "rels": {}
}

Expected output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "content": [
                    "Wanted content."
                ]
            }
        }
    ],
    "rels": {}
}

Case with nested microformat and no p-name

HTML

<article class="h-entry">
  <div class="u-like-of h-cite">
    <p>I really like <a class="p-name u-url" href="http://microformats.org/">Microformats</a></p>
  </div>
  <footer>
    <p>Footer to be ignored.</p>
  </footer>
</article>

Current output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "like-of": [
                    {
                        "type": [
                            "h-cite"
                        ],
                        "properties": {
                            "name": [
                                "Microformats"
                            ],
                            "url": [
                                "http://microformats.org/"
                            ]
                        },
                        "value": "http://microformats.org/"
                    }
                ],
                "name": [
                    "I really like Microformats \r\n   \r\n  Footer to be ignored."
                ]
            }
        }
    ],
    "rels": {}
}

Expected output

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "like-of": [
                    {
                        "type": [
                            "h-cite"
                        ],
                        "properties": {
                            "name": [
                                "Microformats"
                            ],
                            "url": [
                                "http://microformats.org/"
                            ]
                        },
                        "value": "http://microformats.org/"
                    }
                ]
            }
        }
    ],
    "rels": {}
}
@kartikprabhu
Copy link
Member

Implemented in mf2py here https://github.com/kartikprabhu/mf2py/tree/implied-name-fix Will push to the main version if this gets consensus and makes it to the spec.

@Zegnat
Copy link
Member

Zegnat commented Feb 18, 2018

With an implementation for mf2py by @kartikprabhu and an intent to implement in mf2php by @gRegorLove, is that enough to update the spec? @tantek?

@kartikprabhu
Copy link
Member

it seems the output from mf2py depends on which internal HTML parser is being used ( see: kartikprabhu/mf2py#58 (comment) ), but this should count as an internal bug for mf2py to be fixed.

So I am +1 for this being included in the spec.

@kartikprabhu
Copy link
Member

implied-name-stopping has been implemented in an experimental version of mf2py.

so +1

@Zegnat
Copy link
Member

Zegnat commented Mar 4, 2018

This has been added to the spec, at revision 20:27, 4 March 2018. Resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants