Length is important! How SMS are segmented and billed

In previous posts like “Unicode in your SMS: Emojis and Non-English Characters” we discussed how using different encodings to send your messages will also change the way they are billed. Today, we are going to dig a bit more (way more) deep into the subject in order to clarify why it works like this, and what to expect when sending messages with different characters and encodings.

Note that this is a technical post, so you can skip the parts you are not interested in. But it’s highly recommended to fully understand it, in order to make a better, more effective use of the SMS technology when engaging your customers.

Why your SMS messages need to be segmented?

There is always a limit somewhere in real life. SMS messages need to be sent in “packages” of information through the Mobile Network, that have a specific maximum number of bytes allowed.

When your messages exceed a specific number of characters, they need to use more than 1 package to be transmitted, and that’s why they are “segmented” and sent in parts.

Later on, the receiver will order those parts in the right way and display the full message to the user.

Where does this limit come from?

To make a long story short: SMS messages are transmitted by using MAP (or Mobile Application Part). MAP is an Application-Layer protocol, which is part of the SS7 protocol. MAP is also used by many other GSM services like HLR, VLR, etc.

In turn, SS7 is a set of protocols created in 1975 and used in the worldwide PSTN (Public Switched Telephone Network) since then. It’s the core of the telephone networks across the globe since it replaced the analog R2 signaling protocol.

What is the maximum number of bytes per SMS Segment?

SMS Messages are sent with the MAP MO- and MT-ForwardSM operations, whose payload length is limited to 140 bytes, and each byte is 8bits long (so 140 bytes * 8 bits = 1120 bits)

Heads up! Bytes and Characters, are NOT the same thing

So does that mean that you can send 140 characters per SMS? Well, not exactly.

There is one more additional thing to know before answering a “simple” question like “how many characters can I send in one SMS?”.

Bytes are used to encode the characters of your message. And it would be wonderful if we could encode any character in a byte. But as we stated earlier, in real life limits exist.

A byte is composed of 8 bits, and each bit can represent one of 2 values: either a “0” (zero) or a “1”. So with 8 bits, you can represent up to 28 different values. To be exact, 256 different values.

Unfortunately in the world we have many different languages, with many different characters (including Emojis!) so 256 different values are not enough to represent all the available characters in the different existing languages.

That’s why we have Character encoding schemes that use 1 or more bytes in a specific way in order to represent different characters.

So in the end, there is a limit of 140 bytes imposed by the transport protocol on one hand, but then there is also a limit on the number of characters imposed by how we use those bytes to encode characters.

User Data Header: The character-eater ninja hidden in your long messages

As stated earlier, a message may need to be segmented. If it doesn’t, well, there’s nothing more to say 🙂

But when it does need to be segmented, the UDH (or User Data Header) has to be included in the SMS that you send and this will silently eat part of the available space for your characters. UDH is used for a variety of things, and among those, the signaling for concatenated messages.

Big messages (the ones that exceed the maximum number of characters available for 1 segment, according to the encoding used) need the UDH so mobile devices and intermediate SMSCs will be able to figure out how to correctly order the parts and compose the final message.

For each UDH added to the message, the total available space for characters is reduced. The UDH needed to send concatenated messages take up 6 bytes (48 bits) of that space.

Maximum length for UCS2 encoded messages (Unicode supported)

For UCS2 encoding we need 16 bits per character. So the math becomes 1120 bits / 16 bits = 70 characters per message.

When concatenating messages, that would be 1120 bits – 48 bits for the UDH becomes 1072 bits for 16 bits characters. That means that there are 1072 bits / 16 bits = 67 characters per concatenated UCS2 message.

Maximum length for an 8bit encoded GSM 03.38 message

When encoding GSM 03.38 characters using 8 bits (called octets), we have the 140 bytes payload available for the message, so that means 1120 bits / 8 bits = 140 characters.

When sending a concatenated message, and taking into account the UDH length, we have 1120 bits – 48 bits = 1072 bits, giving a maximum of 1072 bits / 8 bits per character = 134 characters per GSM 03.38 message encoded in 8 bit.

Maximum length for a 7bit encoded GSM 03.38 message

When encoding GSM 03.38 messages in 7 bits (called septets), we can calculate the total of characters by doing 1120 bits / 7 bits, for a total of 160 characters per 7bit encoded GSM 03.38 message.

Concatenated messages will be limited by the inclusion of the UDH part in the message, giving a total of 1120 – 49 = 1071 bits, so in that case we calculate the number of characters available as 1071 bits / 7 bits = 153 characters available for 7bit encoded GSM 03.38 concatenated messages.

NOTE: We use 1071 bits instead of 1072 because if a UDH is present and the data encoding is the 7-bit alphabet, the user data must be 7-bit word aligned after the UDH, so 1 bit of padding need to be added, making the total UDH length 49 bits. Thus 1120 bits – 49 bits = 1071 bits available for characters.

Careful: 7bit encoded GSM 03.38 and Escaped Characters

Note that some GSM 03.38 characters will need to use 2 characters to be encoded correctly. This is because those characters need to include the special escape character, reducing even more the number of characters available per message, and perhaps using more segments per message. This means that you should take these into account when calculating how many segments you will need to send a given message.

Let’s see this in action!

Take a look at the following screenshots from an Android device and how it counts the number of characters left (and number of segments) per message as you type.

GSM 03.38 7bit encoded, 1 segment

Note how the number of characters available is 160.

GSM 03.38 7bit encoded, 1 segment but with an escaped character

When we type the “{” character, because it requires an escape character, the number of characters available goes from 160 to 158 instead of 159.

GSM 03.38 7bit encoded, multiple segments taking into account UDH bytes

In this case the number of segments go from 1 to 2 because there are more than 160 characters. But also, the number of characters available for the second segment goes from 160 to 145.

UCS2 encoding, 1 segment

Note how the number of characters goes to 70 once we type a character that is outside of the GSM 03.38 table (it requires UCS2 to be sent correctly and Android picks it up immediately).

UCS2 encoding, multiple segments

If we continue typing enough characters, the number of segments goes up to 2, and the number of available characters per segments goes to 64.

To sum up

If you want to know the maximum number of characters available for a message, you have to first know beforehand if you are going to use UCS2, 7bit GSM 03.38, or 8bit GSM 03.38.

Then, if the number of characters exceed the maximum number of characters per segment according to the encoding (70, 160, and 140 respectively) you have to take into account the space needed for the UDH that signals concatenation, so the maximum number of characters per segment per encoding results in:

  • UCS2: 70non-concatenated, 67 concatenated.
  • 7bit GSM 03.38: 160 non-concatenated, 153 concatenated.
  • 8bit GSM 03.38: 140 non-concatenated, 134 concatenated.

And for 7bit encoding of GSM 03.38 characters, add 1 needed character per every character that need to be escaped.

Questions?

Don’t despair! This is a complex subject after all. Feel free to get in touch with us if you have any questions, we’ll be glad to help.

— The PortaText Team.