Entering Clean Text (or: avoiding unwanted characters)
Reader Comments at the bottom of this page
A frequently asked question is: Can illegal characters in a field harm my solution? The answer is twofold.
One: There is no such thing as an illegal character in FileMaker. FileMaker can store any utf-8 character (roughly 65,000 different chars) without any negative effects inside FileMaker. Characters of any code whatsoever in a text field can not corrupt a FileMaker file.
Two: There are unwanted characters, those with a character code in the range between 0 and 31 inclusive (decimal), with some exceptions. The Filter function can't be used because you want to only disallow a small number of characters, while the allowed rest is huge and thus can't be put in a filter definition. The issue with those characters becomes apparent when they are transferred to the outside of FileMaker, may it be export, HTML or XML output, printing, SQL, etc. You may encounter problems you can't explain but which have its origin in these invisible characters.
A FileMaker native solution in FileMaker Pro Advanced 10 and up
We create a Custom Function "cleanText" that is defined as follows. FileMaker Pro 10 provides a function to directly generate any bytecode.
Function Name and Parameters:
cleanText ( text )
Function Definition:
Substitute ( text ;
[ char ( 0 ) ; "" ] ; // This requires FileMaker Pro 10
[ char ( 1 ) ; "" ] ;
[ char ( 2 ) ; "" ] ;
[ char ( 3 ) ; "" ] ;
[ char ( 4 ) ; "" ] ;
[ char ( 5 ) ; "" ] ;
[ char ( 6 ) ; "" ] ;
[ char ( 7 ) ; "" ] ;
[ char ( 8 ) ; "" ] ;
// char(9) = Tab*
// char(10) = LineFeed*
[ char ( 11 ) ; "" ] ;
[ char ( 12 ) ; "" ] ;
// char(13) = Return*
[ char ( 14 ) ; "" ] ;
[ char ( 15 ) ; "" ] ;
[ char ( 16 ) ; "" ] ;
[ char ( 17 ) ; "" ] ;
[ char ( 18 ) ; "" ] ;
[ char ( 19 ) ; "" ] ;
[ char ( 20 ) ; "" ] ;
[ char ( 21 ) ; "" ] ;
[ char ( 22 ) ; "" ] ;
[ char ( 23 ) ; "" ] ;
[ char ( 24 ) ; "" ] ;
[ char ( 25 ) ; "" ] ;
[ char ( 26 ) ; "" ] ;
[ char ( 27 ) ; "" ] ;
[ char ( 28 ) ; "" ] ;
// char ( 29 ) = Group Separator, separates repetitions
of Repeating Fields on export and import
[ char ( 30 ) ; "" ] ;
[ char ( 31 ) ; "" ]
)
You may simply copy and paste this Custom Function from this web site with FileMaker Pro 10 Advanced. This does NOT work for earlier FileMaker versions described further down below.
Once you have saved this Custom Function you may remove those "unwanted" characters (* see character table below) in FileMaker Pro as
myTextField = cleanText ( Self )
For FileMaker Pro versions below 10
For those who don't like to implement external tools (plug-ins calling Perl), AppleScript, or VBScript, a self made calculation or Custom Function comes handy. Nevertheless one needs some computer experience besides just using FileMaker.
What do we need to solve the problem. On a Macintosh you need either AppleScript, HexEdit or equivalent.
- a global field g_NULL
- a Custom Function "cleanText" with one parameter "text"
Function Name and Parameters: cleanText ( text ) Function Definition: Substitute ( text ; [ g_NULL ; "" ] ; // 0 This can't be defined as text !!! [ "" ; "" ] ; // 1 [ "" ; "" ] ; // 2 [ "" ; "" ] ; // 3 [ "" ; "" ] ; // 4 [ "" ; "" ] ; // 5 [ "" ; "" ] ; // 6 [ "" ; "" ] ; // 7 [ "" ; "" ] ; // 8 [ "" ; "" ] ; // 11 [ "" ; "" ] ; // 12 [ "" ; "" ] ; // 14 [ "" ; "" ] ; // 15 [ "" ; "" ] ; // 16 [ "" ; "" ] ; // 17 [ "" ; "" ] ; // 18 [ "" ; "" ] ; // 19 [ "" ; "" ] ; // 20 [ "" ; "" ] ; // 21 [ "" ; "" ] ; // 22 [ "" ; "" ] ; // 23 [ "" ; "" ] ; // 24 [ "" ; "" ] ; // 25 [ "" ; "" ] ; // 26 [ "" ; "" ] ; // 27 [ "" ; "" ] ; // 28 [ "" ; "" ] ; // 29 [ "" ; "" ] ; // 30 [ "" ; "" ] // 31 )
Now comes the tedious part
In this function definition the characters to be eliminated have to be entered within the first quotes. The character code is added as a comment, so you know where is what, since these characters usually don't show any sign of existence except for a small space between the quotes.
One way to enter these invisible codes is to use Ctrl-A, Ctrl-B, etc, but not all codes can be entered that way, depending on platform and locale keyboard.
On Windows click between the left pair of quotes, hold down the Alt key and type the sequence of the decimal equivalent, like Alt+'0001' which results in a unicode 0x0001 being entered. (No testing has been done there.)
On a Mac this is not as easy. One reader contributed an AppleScript to generate the control characters. See Reader Comments at the bottom of this page.
You may use some other software to create these character codes, like HexEdit for Mac (a free download). Create a file that contains the codes by entering the hexadecimal equivalent into the left column:
Now select and copy one character at a time from the right column and paste it between the left pair of quotes in your function definition. Do not enter a NULL code in any way!
Don't be surprised if some characters appear as line breaks.
Problematic Characters
WARNING! Do NOT try to enter a literal string containing a NULL character (a character with a code 0) into a FileMaker calculation definition - nowhere! (You could do this via Paste or direct entry on a Windows keyboard.) When closing the definition dialog, FileMaker will hang and the only way to exit is a force quit. (You can do this without any problems in a text field though. But that's NOT why we're here!)
So how should we substitute a NULL character if we aren't allowed to enter one? We use a field containing a NULL character, hence the global field. We put this field somewhere on a layout and paste the NULL character from HexEdit into it. Then we remove the field from the layout. In the Custom Function we use this field instead of the first quotes.
Please note: The characters 09 (tab), 0A (line feed), and 0D (return) will not be modified and remain as entered.
WARNING! Most characters with codes between 0 and 31 do not survive a transfer of the script or a single script step from one file to another, they appear as "?". A bug report has been filed and the problem has been resolved in FileMaker 10.
Apply the Custom Function to an input field
First we select the tab Auto-Enter from the field options, mark Calculate value or click the button Specify... on the right. In FileMaker Pro 9 you can enter
in FileMaker Pro 9 and up: cleanText ( Self ) in earlier versions: cleanText ( <current field name> )
and click OK. Now you have to uncheck "Do not replace existing value of field (if any)" to make sure the function is called every time the field is being modified.
A ready made solution for you
If you want to avoid the hassle of defining all the definitions you can download a file that can easily be integrated in your solutions. The file contains a Custom Function that has to be transferred into every other file that need the character filter. Additionally a File Reference (now External Data Source) has to be created in the Relationship graph, but no relations are necessary.
The file contains global fields with all the character codes that need to be removed. Instead of defining the codes in the Custom Function, it references the globals.
Since the contents of globals might get lost (by cloning for example) a separate text file with a tab-delimited list of the questionable character codes is supplied which can be imported if necessary. Code 1D can not be imported (see technical details here below), that's why it is entered by a Set Field step.
Download the files (cross platform) (file not available) for your personal use. Information and files provided "AS IS" without any guarantee or liability.
Technical Details how FileMaker treats characters 0..31
As mentioned above, a NULL character causes FileMaker 9 or below to hang and need a force quit when entered into a calculation as string literal ("") - with all its negative consequences. But what about all the other characters in the range 1..31? Notified by a user about difficulties transferring one of these characters as part of a calculation, I tested them all.
Since these problems do not apply for field contents, the tests were extended on how those characters behave on import and export. The results are interesting, to say the least. The table below shows all results in a compacted form.
| Numeric | ASCII | Win2 | Mac2 | Copy | FileMaker related | ||||
|---|---|---|---|---|---|---|---|---|---|
| Dec | Hex | char | Meaning1 | Ctrl- | Ctrl- | Calc3 | Exp. | Imp. | Comment |
| 0 | 00 | NUL | Null | @ | 00 | 00 | DO NOT USE literally in calculations! | ||
| 1 | 01 | SOH | A | "?" | 01 | 01 | |||
| 2 | 02 | STX | B | B | 02 | 02 | |||
| 3 | 03 | ETX | End of text | C | C | "?" | 03 | 03 | |
| 4 | 04 | EOT | D | "?" | 04 | 04 | |||
| 5 | 05 | ENQ | E | E | "?" | 05 | 05 | ||
| 6 | 06 | ACK | F | "?" | 06 | 06 | |||
| 7 | 07 | BEL | Bell | G | "?" | 07 | 07 | Does not ring any bell today. | |
| 8 | 08 | BS | Backspace | H | H | 08 | 08 | Deletes character | |
| 9 | 09 | HT | Horizontal tab | I | Tab | 09 | 09 | ||
| 10 | 0A | LF | Line feed | J | J | CR | 0D | 0A | Does not export as line feed! Must use XSLT instead. Import terminates Record. |
| 11 | 0B | VT | Vertical tab | K | K | "?" | 0B | 0B | Return characters within Text export as 0B |
| 12 | 0C | FF | Form feed | L | "?" | 0C | 0C | ||
| 13 | 0D | CR | Carriage return | M | CR | 0D | Import terminates Record. | ||
| 14 | 0E | SO | N | N | "?" | 0E | 0E | ||
| 15 | 0F | SI | O | O | "?" | 0F | 0F | ||
| 16 | 10 | DLE | P | P | "?" | 10 | 10 | ||
| 17 | 11 | DC1 | Q | Q | "?" | 11 | 11 | ||
| 18 | 12 | DC2 | R | R | "?" | 12 | 12 | ||
| 19 | 13 | DC3 | S | S | "?" | 13 | 13 | ||
| 20 | 14 | DC4 | T | T | "?" | 14 | 14 | ||
| 21 | 15 | NAK | U | U | "?" | 15 | 15 | ||
| 22 | 16 | SYN | V | V | "?" | 16 | 16 | ||
| 23 | 17 | ETB | W | W | "?" | 17 | 17 | ||
| 24 | 18 | CAN | Cancel | X | X | "?" | 18 | 18 | |
| 25 | 19 | EM | Y | Y | "?" | 19 | 19 | ||
| 26 | 1A | SUB | Z | Z | "?" | 1A | 1A | ||
| 27 | 1B | ESC | Escape | [ | "?" | 1B | 1B | ||
| 28 | 1C | FS | \ | "?" | 1C | 1C | |||
| 29 | 1D | GS | Group separator | ] | "?" | 1D4 | Separates repetitions on export and import. | ||
| 30 | 1E | RS | ^ | "?" | 1E | 1E | |||
| 31 | 1F | US | _ | "?" | 1F | 1F | |||
1 irrelevant entries omitted
2 Entering of control characters could not be fully tested on a German keyboard. The Mac characters listed were confirmed to work here.
3 Control characters listed as either "?" or are left blank, are lost when copied as script step or calculation. They have to be re-entered. A Line feed (0A) is silently converted to a Carriage return (0D) - which I consider a bug.
4 On export and import the character 1D is special. When exporting a repeating field the repetitions are separated by 1D on export to a text field. A field may also contain a character 1D which is simply exported as well. On import 1D separates the text for a field into repetitions if the target field is a repeating field. If the target field has no or not enough repetitions, everything following 1D will be ignored after all repetitions are exhausted. Note: character 1D is normally not visible and requires a special editor to see them.
Keeping Formatting Out On Data Entry
If you just want to make sure your fields will always contain Text Only, regardless of what sources you have copied from to paste in a field, there is a simple method to achieve that:
Define an auto-enter calculation,
in FileMaker Pro 9: TextFormatRemove ( Self ) in FileMaker Pro 8, 8.5: TextFormatRemove ( <current field name> ) in FileMaker Pro 7: Evaluate ( Quote ( <current field name> ) )
and uncheck "Do not replace existing value of field (if any)" to make sure the function is called every time the field is being modified.
Needless to say that this additional functionality could be added to the Custom Function described earlier.
Any comments are welcome
If you have any comments, critics, or improvements to share, please don't hesitate to send an email to Winfried Huslik.
Reader Comments
Thank you, Winfried, for FMMenulet, FMDiff, and the excellent articles on your site. The "under-the-hood" details you provide are an oasis in the desert of technical knowledge that professional FileMaker developers currently inhabit.
Regards, Tom
Nice write-up, Winfried
I submitted a bug for FM7 a couple of years ago regarding crashes with null characters ...
Here's an AppleScript that could help with "the tedious part" on Mac; run it then paste in the FM dialog (it only builds part):
set the_construct to ""
repeat with i from 1 to 31
set the_construct to the_construct & "[ \"" & (ASCII character i) & "\" ; \"\" ] ; "
end repeat
set the clipboard to the_construct
I haven't tested it with the actual substitution.
Steve
Q: I'm looking at your web page and file and one thing confuses me. Why does your file have 32 records? One would be plenty if I understand correctly - or am I missing something?
Crispin
A: Thanks for the hint, you're right! This is the outcome of another bug in FileMaker, let's call it Lazy I/O. If you look at the original script, it does a Delete All Records after the import. In fact we don't need any records at all since the table contains just globals, but the number of records doesn't matter either. (Strange to be on record 1 of 0 while having found 32!) I've now added a Flush Cache to Disk to give FileMaker a break before it deletes the records. You may download the new version now.
Q: I have just discovered that the set field step in your import script does not survive a FMP cut and paste! It might be worth warning people.
Crispin
A: Thanks for this important note!
Q: Have you any thoughts on how to test the custom function when it is in place?
A: You may use the same method as described to enter the codes. Set up a calculation for the length of the field content and check this after pasting one or more codes in the field. You could also import the provided file with the special codes into the field (make sure to check auto-enter options on).
Q: You say that these unwanted characters will not cause corruption ... can you say exactly what IS the problem with these unwanted characters? I'm left wondering if they are something that can have an impact on performance, stability, etc. ... or if they are merely unwanted because they do nothing for us (in which case I would want to remove them, but wouldn't worry about them quite as much).
Ray
A: The main issue with those characters is when these are transferred to the outside of FileMaker, may it be export, HTML or XML output, printing, SQL, etc. You may later encounter problems you can't explain but which have its cause in these invisible characters.
Q: Thanks, Winfried. This information and associated file will be very useful to all developers.
I would caution about the use of the Self function until it is fixed.
Jason
A: Thanks for the heads up. The Self function issue only strikes on related files. In this case it is always applied locally to a table in the same file and should not be an issue.
Examples are provided "AS IS" without warranties of any kind. Use at your own risk.
Copyright © 2005 - 2012 Huslik Verlag GmbH. All Rights Reserved. FMDiff and FMChecker are trademarks of Huslik Verlag GmbH, Augsburg, Germany. FileMaker is a trademark of FileMaker Inc., Santa Clara, CA, USA. Other trademarks mentioned are property of their respective owners. This web site has not been authorized, sponsored, or otherwise approved by FileMaker, Inc.
