Diacritics

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
Line 11: Line 11:
  
  
3. Search and replace all found problems these in the Excel (.xlsx) file. Use Excel's built in “character map” and choose Unicode as the encoding.
+
3. Search and replace all found problems in the Excel (.xlsx) file. Use Excel's built in “character map” and choose Unicode as the encoding.
  
 
[http://intranet.lib.ua.edu/wiki/digcoll/images/e/e6/French_encoding_map.xlsx This Excel file] may be useful in getting one started in searching/replacing diacritics for French language metadata.
 
[http://intranet.lib.ua.edu/wiki/digcoll/images/e/e6/French_encoding_map.xlsx This Excel file] may be useful in getting one started in searching/replacing diacritics for French language metadata.

Revision as of 12:01, 3 December 2009

As metadata spreadsheets exchange hands and are often even created from diverse sources, issues arise regarding diacritics. These characters often do no translate from encoding to encoding, creating poor results in the resultant MODS metadata files.


Based on our experience with French language diacritics, the following process allowed MODS to be created without the numerous encoding problems that we initially encountered with the collection: u0002_0000006 (French Revolutionary Metadata).


1. Export the metadata from Excel as a tab delimited text file (Do Not use the Unicode option). If the Excel file is in the legacy .xls, first convert it to .xlsx prior to export.


2. Open in the text file in Encodinator, which will help identify problems. Also, you can open in the file in NotePad ++ and choose "Encode in UTF-8", as this will also help show where the problems are.


3. Search and replace all found problems in the Excel (.xlsx) file. Use Excel's built in “character map” and choose Unicode as the encoding.

This Excel file may be useful in getting one started in searching/replacing diacritics for French language metadata.


4. From Excel, export the metadata as a Unicode tab-delimited file.


5. Open the Unicode export in Notepad ++ choose "Convert to ANSI".


6. Load the ANSI text file into Encodinator and make sure all is well. If there are still problems, go back to step 3.


7. If all is well, open the ANSI text file in NotePad ++ and choose "Convert to UTF-8".

  This UTF-8 file is now the file from which to create MODS with Utility.txt Archivist Utility. This is also the text version of the metadata that will go into long term Storage.
Personal tools