Stripping MS Word Garbage from HTML form
I recently noticed that my mysql database was filling up with a lot of strange junk code, like StartFragment and <w:WordDocument>, and other miscellaneous xml garbage. I quickly discovered that the problem stemmed from some users copying and pasting text from Microsoft Word into the HTML form editor. This was creating havoc in the database, as frequently the tags would be broken, and then formatting would suffer elsewhere. I couldn't find any decent solutions online, so I created my own, albeit imperfect, solution.
Step 1 : Recognize problematic code in the string-- assign variables for these code fragments.
$bad_fragment = "<w:";
$bad_fragment2 = "StartFragment";
Step 2: Check to see if the form submitted contains either bad code fragment.
$contains_junk_code = strpos($submitted_form, $bad_fragment);
$contains_junk_code2 = strpos($submitted_form, $bad_fragment2);
Step 3: If the bad code was found, go ahead and strip out ALL formatting.
$submitted_form = strip_tags($submitted_form);
Of course, you could just always strip the tags off every single string formatted using the above function, but I want to keep HTML formatting if it's not problematic. If the bad MS Word xml code fragments are found, they are stripped and I alert the user.
echo "Your submission has been saved as text only. This occurs when copying and pasting from MS Word. To add formatting, edit the text by clicking below.";
While this solution is not perfect, it was the most simple and straightforward approach I could think of for my web app. Hopefully it will help you or inspire you to create a better fix.
Aucun trackbacks pour l'instant
Pages
Catégories
Liste de Liens
Archive
- février 2021
- juin 2020
- novembre 2019
- janvier 2017
- septembre 2016
- avril 2015
- janvier 2011
- février 2010