Plugin Directory

import-wodpress-1x

Opened 14 years ago

Last modified 12 years ago

#1162 reopened defect

[PATCH] get_tag() regex bug fix

Reported by: snarfed's profile snarfed's profile snarfed Owned by: caugb's profile caugb's profile caugb
Priority: normal Severity: normal
Plugin: import-wodpress-1x Keywords: wordpress-importer import wxr
Cc: briancolinger, ryan, nbachiyski

Description

hi all! i don't see a wordpress-importer component, or a way for normal users to make new components, so i picked the closest one i could find.

the tag regex in WP_Import::get_tag() has a bug that makes it overly loose, which can result in incorrect imported data. for example, this snippet of a comment in a WXR file:

<wp:comment_author_IP>1.2.3.4</wp:comment_author_IP>
<wp:comment_author_email>a@…</wp:comment_author_email>
<wp:comment_author>ryan</wp:comment_author>

results in this imported data:

mysql> select comment_author_IP, comment_author_email, comment_author from wp_comments where comment_post_id=22;
+-------------------+----------------------+--------------------+
| comment_author_IP | comment_author_email | comment_author |
+-------------------+----------------------+--------------------+
| 1.2.3.4 | a@… | 1.2.3.4 a@… ryan |
+-------------------+----------------------+--------------------+

comment_author should be just 'ryan', but it's actually '1.2.3.4 a@… ryan'.

this happens because in the first part of the tag regex on wordpress_importer.php:72:

"|<$tag.*?>(.*?)</$tag>|is"

the .*? in the initial <$tag.*?> can consume opening and closing tags as well as contents. in the example above, if you call get_tag('comment_author'), the regex actually matches everything from <wp:comment_author_IP> through </wp:comment_author>. the first .*? matches '_IP', and then the inner (.*?) matches everything through the closing tag.

the patch fixes this by changing the regex to:

"|<$tag( +.*)?>(.*?)</$tag>|is"

which still handles tag attributes, if any, but requires that the opening tag is actually the requested tag string.

along with the patch, i've attached example WXR files that demonstrate this.

the patch is against svn r265279.

Attachments (3)

wordpress_importer_get_tag_fix.patch (694 bytes) - added by snarfed 14 years ago.
bad.xml (771 bytes) - added by snarfed 14 years ago.
importing this reproduces the bug
ok.xml (771 bytes) - added by snarfed 14 years ago.
this is almost identical to bad.xml, but the wp:comment_author element appears first, so it doesn't reproduce the bug

Download all attachments as: .zip

Change History (5)

@snarfed
14 years ago

importing this reproduces the bug

@snarfed
14 years ago

this is almost identical to bad.xml, but the wp:comment_author element appears first, so it doesn't reproduce the bug

#1 @garyc40
12 years ago

  • Cc changed from briancolinger,ryan,nbachiyski to briancolinger, ryan, nbachiyski
  • Resolution set to fixed
  • Status changed from new to closed

In [586518]:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

#2 @snarfed
12 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

reopening. that fix is unrelated. garyc40, i'm guessing you meant a different bug.

Note: See TracTickets for help on using tickets.