import-wodpress-1x
Opened 14 years ago
Last modified 12 years ago
#1162 reopened defect
[PATCH] get_tag() regex bug fix
Reported by: |
|
Owned by: |
|
---|---|---|---|
Priority: | normal | Severity: | normal |
Plugin: | import-wodpress-1x | Keywords: | wordpress-importer import wxr |
Cc: | briancolinger, ryan, nbachiyski |
Description
hi all! i don't see a wordpress-importer component, or a way for normal users to make new components, so i picked the closest one i could find.
the tag regex in WP_Import::get_tag() has a bug that makes it overly loose, which can result in incorrect imported data. for example, this snippet of a comment in a WXR file:
<wp:comment_author_IP>1.2.3.4</wp:comment_author_IP>
<wp:comment_author_email>a@…</wp:comment_author_email>
<wp:comment_author>ryan</wp:comment_author>
results in this imported data:
mysql> select comment_author_IP, comment_author_email, comment_author from wp_comments where comment_post_id=22;
+-------------------+----------------------+--------------------+
| comment_author_IP | comment_author_email | comment_author |
+-------------------+----------------------+--------------------+
| 1.2.3.4 | a@… | 1.2.3.4 a@… ryan |
+-------------------+----------------------+--------------------+
comment_author should be just 'ryan', but it's actually '1.2.3.4 a@… ryan'.
this happens because in the first part of the tag regex on wordpress_importer.php:72:
"|<$tag.*?>(.*?)</$tag>|is"
the .*? in the initial <$tag.*?> can consume opening and closing tags as well as contents. in the example above, if you call get_tag('comment_author'), the regex actually matches everything from <wp:comment_author_IP> through </wp:comment_author>. the first .*? matches '_IP', and then the inner (.*?) matches everything through the closing tag.
the patch fixes this by changing the regex to:
"|<$tag( +.*)?>(.*?)</$tag>|is"
which still handles tag attributes, if any, but requires that the opening tag is actually the requested tag string.
along with the patch, i've attached example WXR files that demonstrate this.
the patch is against svn r265279.
Attachments (3)
Change History (5)
@
14 years ago
this is almost identical to bad.xml, but the wp:comment_author element appears first, so it doesn't reproduce the bug
importing this reproduces the bug