Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji causes failed slug generation #1702

Closed
cleverdevil opened this issue Apr 14, 2017 · 9 comments
Closed

Emoji causes failed slug generation #1702

cleverdevil opened this issue Apr 14, 2017 · 9 comments

Comments

@cleverdevil
Copy link
Contributor

While trying to do this:

Perform a "repost" from Quill of this Twitter permalink.

I encountered this error:

The repost works, but the generated slug doesn't actually work, so you can't link to the post on my site. This is the slug that is generated:

https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this

As you can see, the link returns a 404. However, you can scroll down and find the post itself in my timeline.

Also of note, once you find the post in my timeline, is that it also displays the improper encoding of special characters from this related issue:

Aaron Parecki on Twitter: "Micropub PR published today! 🎉 https://t.co/B4se7jOPPr This is the last step before REC status! We'd love your impl reports and feedback!"

Some other notes:

🕷  cat version.known
version = "0.9.5"
build = 2017041101
@cleverdevil
Copy link
Contributor Author

Some more context, here. I am using MySQL for my backing storage for my Known instance. For the above example, here it what is stored in the entities table:

_id is f489a455d3570e046c63751a5755d141
uuid is https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this

Contents:

{"access":"PUBLIC","owner":"http:\/\/cleverdevil.io\/profile\/cleverdevil","body":"https:\/\/twitter.com\/aaronpk\/status\/852613547922042880","repostof":"https:\/\/twitter.com\/aaronpk\/status\/852613547922042880","description":false,"tags":false,"pageTitle":"Aaron Parecki on Twitter: \"Micropub PR published today! \ud83c\udf89 https:\/\/t.co\/B4se7jOPPr This is the last step before REC status! We'd love your impl reports and feedback!\"","slug":"aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this","created":1492118357,"updated":1492118357,"publish_status":"published","_id":"f489a455d3570e046c63751a5755d141","uuid":"https:\/\/cleverdevil.io\/2017\/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this","entity_subtype":"IdnoPlugins\\Like\\Like"}

As far as I can tell, everything looks good. The entity is there in MySQL, the JSON can be parsed and is valid. But, I get a 404 when I try and visit the permalink.

@cleverdevil
Copy link
Contributor Author

That said, there seems to be nothing in the metadata table for this entity.

@cleverdevil
Copy link
Contributor Author

Okay, I just manually inserted the following rows into my MySQL database, and now the post shows up:

insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'access',
  'PUBLIC'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'body',
  'https://twitter.com/aaronpk/status/852613547922042880'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'created',
  '2017-04-13 09:04:17'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'description',
  '0'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'entity_subtype',
  'IdnoPlugins\Like\Like'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'likeof',
  'https://twitter.com/aaronpk/status/852613547922042880'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'owner',
  'http://cleverdevil.io/profile/cleverdevil'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'pageTitle',
  'Aaron Parecki on Twitter: "Micropub PR published today..."'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'publish_status',
  'published'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'slug',
  'aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'tags',
  '0'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'updated',
  '1492118357'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  'uuid',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this'
);
insert into metadata(_id, collection, entity, name, value) values(
  'f489a455d3570e046c63751a5755d141',
  'entities',
  'https://cleverdevil.io/2017/aaron-parecki-on-twitter-micropub-pr-published-today-b4se7joppr-this',
  '_id',
  'f489a455d3570e046c63751a5755d141'
);
@cleverdevil
Copy link
Contributor Author

So, confirmed, the source of this bug is the metadata rows not getting inserted, even though the entity itself shows up in the entities table.

@mapkyca
Copy link
Member

mapkyca commented May 3, 2017

Interesting, that implies it's an encoding issue on mysql. Entity is stored in the main entity table, but metadata is used to search. Would explain why I've been unable to replicate it on my localhost (which is mongo).

@mapkyca
Copy link
Member

mapkyca commented May 3, 2017

Interestingly on my localhost, switching to mysql, sharing and posting content with an emoji works fine.

Could you confirm your exact steps to replicate? and/or from my IRC readback I notice you've got a unit test, that'd be handy to have.

@mapkyca
Copy link
Member

mapkyca commented May 3, 2017

... wondering if this is a local mysql version/encoding issue...

@cleverdevil
Copy link
Contributor Author

Okay, I was able to fix this issue entirely with changes to MySQL, and no changes to Known's code, by following the advice from this post about mysql and utf8mb4.

TL;DR:

  1. Modify the Known MySQL database to use the right character set and collation:

    ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

  2. Modify each table in Known (specifically the metadata and entities tables):

    ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

  3. In order for Check your cases! #2 to actually work, you have to resize columns from VARCHAR(255) down to VARCHAR(191).

  4. Modify each column that may store emoji to also use the proper character set:

    ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

  5. Change MySQL server settings:

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

I believe that the installation instructions and any automated installation bits and pieces when running against MySQL should create the database and tables properly in the first place, and detect any misconfiguration regarding character sets and inform the user.

@mapkyca
Copy link
Member

mapkyca commented May 4, 2017

Going to close this as it's not a known specific bug.

In the latest master branch I've added a stub troubleshooting section in the docs... I'd welcome a pull request to this with your solution!

@mapkyca mapkyca closed this as completed May 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants