A Unicode character has two bytes, as opposed to one byte for each ASCII character. This allows many more possible Unicode characters (256²?) , in order to accomodate the many characters of the many languages of the world.
That's all I know.
Xander ,'Conversations with Dead People'
Got a question about technology? Ask it here. Discussion of hardware, software, TiVos, multi-region DVDs, Windows, Macs, LINUX, hand-helds, iPods, anything tech related. Better than any helpdesk!
A Unicode character has two bytes, as opposed to one byte for each ASCII character. This allows many more possible Unicode characters (256²?) , in order to accomodate the many characters of the many languages of the world.
That's all I know.
Very basically, since computers fundamentally deal with (binary) numbers, computers have have had to assign numbers to each of the characters on the keyboard so that it can process them.
Back in the Dark Ages of computing, different computer manufacturers used different sets of numbers to encode letters. In the US, they finally put an end to this madness by specifiying a standard encoding for all the letters and symbols, called ASCII.
But when computers around the world started connecting to each other on the Internet, another problem became apparent: computers in different countries each used their own encoding for their own alphabets, even if their alphabets used the same symbols.
Unicode is an attempt to provide a single encoding for all the alphabet systems in the world. It's a really big, complicated subject.
A Unicode character has two bytes, as opposed to one byte for each ASCII character. This allows many more possible Unicode characters (256²?) , in order to accomodate the many characters of the many languages of the world.
Unicode characters can be as long as 22 bits. [link]
Unicode characters can be as long as 22 bits.
Oh.
Now I know I knew less than I thought I knew.
Unicode is an attempt to provide a single encoding for all the alphabet systems in the world
How does/would/will that affect typesetting? We're switching to a paperless system of editing (which might be the straw that breaks my back and gets me to quit), and my boss keeps saying "At the seminar, they said Unicode was important -- do we *have* Unicode? Or do we need to tell the authors *they* need Unicode?"
She has no idea what it is, and seems to think I should.
Basically, we're going to give ourselves carpal tunnel syndrome and eyestrain by editing everything on the computer, in Word. Then the Word files get dumped into Quark (or InDesign, because we might as well change EVERYTHING ALL AT ONCE AND OH NO THAT WON'T CAUSE ANY SNAGS NOT ONE BIT) for layout.
How does Unicode come into play there? Or does it even?
The very, very, short answer to your question, Steph.
Whatever tool you're using to edit your documents ought to give you the option save your documents using a Unicode encoding. If you're using XML, they're probably already Unicode to begin with.
There are several different ways of storing Unicode files, the most typical ones are UTF-8, and UTF-16. It probably doesn't matter which one you pick, as long as each stage of your workflow is aware of which encoding you're using.
Whatever tool you're using to edit your documents ought to give you the option save your documents using a Unicode encoding. If you're using XML, they're probably already Unicode to begin with.
Um. So a Word document has the option of being saved as Unicode?
So a Word document has the option of being saved as Unicode?
If you're saving it as a .doc file, then no. But your tools that read .doc files ought to be able to make sure that all the characters are encoded properly.
If you're saving it as a .txt file, then Unicode would be one of the options for text files.
If you're saving it as a WordML (XML) file, then you get Unicode for free.
Unicode is one of a millions ways of assigning a numeric value to each letter.
ASCII was the one of the first, but it is limited to 128 values and thus had no room for anything but the basic Roman alphabet, numbers, punctuation and a number of extra values used to make printing terminals go bing.
A little while later it became commont to use 256 values to to represent letters, allowing accented characters and a few things like dashes to be included. Sadly, Apple and Microsoft invented two different sets of values for the same characters. Also, many different countries had different sets. For example, the set for Mac in Hungary isn't the same as the set in the United States.
Unicode was invented to be universal, comprehensive assignment of values to letters. It was initially 65,536 values, but has since expanded to a much larger set. It can express the letter from every language you can imagine, including Klingon. It's the default for most text files on the Mac.
There are about a million more details to how Unicode works that might effect typesetting, but I'm afraid descrinbing them would make your head hurt. Is there something specific you need to know or do?
World alphabets covered by the Unicode standard: [link]