Skip to Main Content

Middle East

What is Unicode?

While we have tested most of what follows, and we have no reason to think that any of it will cause problems, we assume no responsibility for any negative effects that might be caused to any software or hardware.

Unicode:

What is Unicode? The explanation at Unicode.org is a good place to begin (see links, below). Put simply, Unicode is a method of programming fonts that assigns a unique code to every symbol in every writing system. Thus, no matter what Unicode font is being used, software will always use precisely the symbol being called for (assuming that the font has that glyph). In older font technologies, each font might designate glyphs with codes that meant something else (or nothing at all) in another font. This causes problems when a user changes a document's font, or does not have the font, and an incorrect glyph (or nothing at all) appears in the place of a specific character or symbol. Not all fonts are Unicode fonts, and not all Unicode fonts have all possible characters (as this would make them too huge for convenient use).

Users of Windows 7 and later or Mac OSX can use Unicode without a great deal of trouble, though not all software supports it equally (if at all). Microsoft Word does support Unicode under both platforms, but the degree of compatibility varies from one version to the next. There are several methods for entering Unicode in a document, some more complex than others. Older operating systems support Unicode to lesser degrees.

Rather than attempt to cover every operating system and software version here, we suggest using a search engine to find what your system and software can and can not handle. Chances are someone else has had the same questions and there are probably answers out there that will help you determine what you need to do. Please see http://mamluk.uchicago.edu/unicode.html for more information.

Unicode Fonts

To use Unicode, you will need to install (or find already installed) Unicode fonts on your computer. This is neither difficult nor costly. Most new fonts are OpenType format, which works on both Mac and Windows. For each of the following fonts, see the links below.

  • The MEDOC font is a modification of Jaghbub, which is Knut Vikør's legal modification of Times. Vikør's site has a wealth of information about Unicode, transliteration and related topics. The only difference between MEDOC and Jaghbub is the shape of the ʿayn and hamzah characters. All weights (regular, italic, bold and bold italic) are included.
  • Linux Libertine is another good, free Unicode font with all of the necessary glyphs and good small caps (though some letters with diacritics are not included in the small caps). It comes with a second font, Linux Biolinium, a sans-serif that is meant to work with Libertine for titles and headings. The font has not been updated in several years but is a good choice.
  • Source Sans Pro is a free sans-serif with all the necessary diacritics. It comes in 6 weights, from extra-light to black, each with both regular and italic, and seems to have true small caps. It is an open source typeface from Adobe, and is available for free from Google Fonts.
    To download, visit fonts.google.com/specimen/Source+Sans+Pro/, click "Select this font" in the upper right, then click to expand the little box that appears in the lower right. Click the download icon (an arrow with a line below it) to get a .zip file with all the font's styles and weights, unzip it, then install the weights you want.

NOTE: If you are using Google Docs, you can choose Source Sans Pro from the fonts menu (you may need to find it in "additional fonts") and use it to type transliterated Arabic with diacritics. If you have a keyboard layout installed (such as Alt-Latin), Docs will recognize the keystrokes and use the proper diacritics. If not, you can go to the Insert menu, select Special Characters, and choose Latin in the mddle drop-down on the window that pops up. (To avoid having to hunt for the characters each time, quickly go through and pick all the ones you are likely to need. You can then switch the first drop-down from Categories to Recent Characters and see only the ones you need.) The symbols used for ayn and hamza do not seem to be available in the special character picker.

  • Gentium is a free Unicode font with the characters necessary for transliterating Arabic, Turkish, Persian, and other languages and scripts. Gentium Plus has the most characters and OpenType support, but it lacks bold and bold italic weights. Gentium Basic and Gentium Book Basic have regular, bold, italic and bold italic weights, but lack many diacritics and special characters and do not have OpenType features..
  • Charis SIL is a free, quality serif font. It has all the characters needed for transliteration, and unlike Gentium it has bold, italic, and bold-italic weights.
  • Junicode is a high-quality font with a wide variety of glyphs. Regular, bold, italic, and bold italic weights are included, along with various other forms including decent small caps.
  • FreeSerif, FreeMono and FreeSans are enormous, no-nonsense fonts that have thousands of glyphs, ranging from Latin and Greek to Cyrillic, Hebrew, Thaana, Devanagari, Telugu, Ethiopic, Hiragana, and so on. Aesthetically, FreeSerif is very similar to Times.
  • Microsoft includes some Unicode fonts in most versions of its software, but not all included fonts are Unicode. Arial Unicode MS is present in most versions of Windows. Keep in mind that the degree to which Microsoft supports Unicode is impacted both by the operating system AND the software. For example, some Macintosh versions of their software are less able to use Unicode, even when the operating system is compliant. Likewise, older versions of Windows are less compliant. In Windows XP, 7, 8, and 10, Unicode should be fully usable, depending on the software. Macintosh's OSX is Unicode compliant, but not all software for OSX is.

Keyboard Layouts for entering Unicode characters and diacritics

Because operating systems change frequently, it is impossible to summarize all the ways Unicode does and does not work in OS X, Windows or Linux. When this page was first written (2012) most operating systems and software were able to use Unicode. Now, several years later, compliance has mostly improved. To determine the limitations of your own system, you will need to experiment and search the Web for information.

The most common problem users have with Unicode is figuring out how to type special characters that do not appear on a keyboard. For scholars of the Middle East, this usually has to do with transliteration of the Arabic alphabet as Roman characters.

A variety of keyboard layouts exist to facilitate this. A keyboard layout is a small piece of software that tells the computer what to do when certain keys, or combinations of keys, are pressed. In the basic US keyboard layout, holding the shift key and typing a letter results in a capital letter. The problem is that standard keyboard layouts do not have ways to type less commonly used characters.

Many specialized keyboard layouts exist to solve this problem. For example, Kino created the excellent and extremely simple Alt-Latin and LatinTL keyboards specifically for this purpose. The Alt-Latin keyboard comes in versions for either Macintosh or Windows operating systems, and is one of the simplest, most transparent methods we have found for entering special characters, diacritics, and symbols not found on the standard keyboard. These keyboard layouts are made available for download here with Kino's permission, as his site no longer exists. Installation is simple and usage even simpler.

Using the Alt-Latin keyboard does not require making any changes to the physical keyboard connected to your computer; it is simply a piece of software that tells the computer what to do when you press a key or a combination of keys. As long as you are typing in the Latin alphabet (i.e. in English or most other European languages) you will not have to change from Alt-Latin to another layout. As can be seen in the diagrams below, Alt-Latin's key layout is identical to the standard US QWERTY keyboard. If your ordinary keyboard is different from the US standard layout, it may take some effort to get used to these. Spend some time with a search engine and you may find keyboard layouts that match your usual one.

Contact

Marlis Saleh, Middle East Bibliographer

Send Email

Contact:

Middle East Department

Regenstein Library, 5th Floor, Rm 560

(773) 702-8425