So, like, what is a byte?

A friend of mine in the filmmaking business, who is exceedingly bright but has never worked with SQL Server before, was reading through the first five posts of this Database Fundamentals series, and asked a great question:

“I guess I’m not understanding what a byte is. I think I’m circling the drain in understanding it, but not floating down.”

She has a way with words.

I answered her immediately, but it reminded me that I did get a little carried away with data types, assuming that everyone reading that post would understand what a byte is.

In the innards of the computer is the CPU, or Central Processing Unit (there might be more than one in a server). The CPU is best described as a hot mess of on-off switches.

several assorted power switches mounted on white wall

Just as it is in your house, a switch only has two states. This is what “binary” means. When the CPU clock ticks over, billions of times per second, if a switch is open, it’s a 0 (electricity cannot pass through it). If the switch is closed, it’s a 1 (electricity can flow to complete the circuit).

(Source: https://diytechpro.com/electric-circuit-simple-concept/)

The CPU (and memory, and storage system, and network) understand binary, and the software that sits on top of it uses binary as well.

We end up with a series of 1s and 0s that, when arranged in different combinations, represent information in some form or another. Each of these is a binary digit, or bit.

Through a series of decisions in the old days of computing, when we stick eight of these bits of data together, they form a byte.

Now comes the mathematical part of today’s post.

If we have 8 digits that can store two values each, we get a total of 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 combinations. This is more easily typed as 2^8, or 256. In other words, a byte can store a maximum of 256 values.

Here’s a short list of bytes to give you an example (I have not listed every one of the 256 possibilities). We write the bits in groups of four to make them easier to read.

Binary	Decimal	ASCII	Binary	Decimal	ASCII
0010 0000	32	<space>	1000 0001	129	Å
0010 0001	33	!	1000 0010	130	Ç
0010 0010	34	“	1000 0011	131	É
0010 0011	35	#	1000 0100	132	Ñ
0010 0100	36	$	1000 0101	133	Ö
0010 0101	37	%	1000 0110	134	Ü
0010 0110	38	&	1000 0111	135	á
0010 0111	39	‘	1000 1000	136	à
0010 1000	40	(	1000 1001	137	â
0010 1001	41	)	1000 1010	138	ä
0010 1010	42	*	1000 1011	139	ã
0010 1011	43	+	1000 1100	140	å
0010 1100	44	,	1000 1101	141	ç
0010 1101	45	–	1000 1110	142	é
0010 1110	46	.	1000 1111	143	è
0010 1111	47	/	1001 0000	144	ê
0011 0000	48	0	1001 0001	145	ë
0011 0001	49	1	1001 0010	146	í
0011 0010	50	2	1001 0011	147	ì
0011 0011	51	3	1001 0100	148	î
0011 0100	52	4	1001 0101	149	ï
0011 0101	53	5	1001 0110	150	ñ
0011 0110	54	6	1001 0111	151	ó
0011 0111	55	7	1001 1000	152	ò
0011 1000	56	8	1001 1001	153	ô
0011 1001	57	9	1001 1010	154	ö
0011 1010	58	:	1001 1011	155	õ
0011 1011	59	;	1001 1100	156	ú
0011 1100	60	<	1001 1101	157	ù
0011 1101	61	=	1001 1110	158	û
0011 1110	62	>	1001 1111	159	ü
0011 1111	63	?	1010 0000	160	†
0100 0000	64	@	1010 0001	161	°
0100 0001	65	A	1010 0010	162	¢
0100 0010	66	B	1010 0011	163	£
0100 0011	67	C	1010 0100	164	§
0100 0100	68	D	1010 0101	165	•
0100 0101	69	E	1010 0110	166	¶
0100 0110	70	F	1010 0111	167	ß
0100 0111	71	G	1010 1000	168	®
0100 1000	72	H	1010 1001	169	©
0100 1001	73	I	1010 1010	170	™
0100 1010	74	J	1010 1011	171	´
0100 1011	75	K	1010 1100	172	¨
0100 1100	76	L	1010 1101	173	≠
0100 1101	77	M	1010 1110	174	Æ
0100 1110	78	N	1010 1111	175	Ø
0100 1111	79	O	1011 0000	176	∞
0101 0000	80	P	1011 0001	177	±
0101 0001	81	Q	1011 0010	178	≤
0101 0010	82	R	1011 0011	179	≥
0101 0011	83	S	1011 0100	180	¥
0101 0100	84	T	1011 0101	181	µ
0101 0101	85	U	1011 0110	182	∂
0101 0110	86	V	1011 0111	183	∑
0101 0111	87	W	1011 1000	184	∏
0101 1000	88	X	1011 1001	185	π
0101 1001	89	Y	1011 1010	186	∫
0101 1010	90	Z	1011 1011	187	ª
0101 1011	91	[	1011 1100	188	º
0101 1100	92	\	1011 1101	189	Ω
0101 1101	93	]	1011 1110	190	æ
0101 1110	94	^	1011 1111	191	ø
0101 1111	95	_	1100 0000	192	¿
0110 0000	96	`	1100 0001	193	¡
0110 0001	97	a	1100 0010	194	¬
0110 0010	98	b	1100 0011	195	√
0110 0011	99	c	1100 0100	196	ƒ
0110 0100	100	d	1100 0101	197	≈
0110 0101	101	e	1100 0110	198	∆
0110 0110	102	f	1100 0111	199	«
0110 0111	103	g	1100 1000	200	»
0110 1000	104	h	1100 1001	201	…
0110 1001	105	i	1100 1010	202
0110 1010	106	j	1100 1011	203	À
0110 1011	107	k	1100 1100	204	Ã
0110 1100	108	l	1100 1101	205	Õ
0110 1101	109	m	1100 1110	206	Œ
0110 1110	110	n	1100 1111	207	œ
0110 1111	111	o	1101 0000	208	–
0111 0000	112	p	1101 0001	209	—
0111 0001	113	q	1101 0010	210	“
0111 0010	114	r	1101 0011	211	”
0111 0011	115	s	1101 0100	212	‘
0111 0100	116	t	1101 0101	213	’
0111 0101	117	u	1101 0110	214	÷
0111 0110	118	v	1101 0111	215	◊
0111 0111	119	w	1101 1000	216	ÿ
0111 1000	120	x	1101 1001	217	Ÿ
0111 1001	121	y	1101 1010	218	⁄
0111 1010	122	z	1101 1011	219	€
0111 1011	123	{	1101 1100	220	‹
0111 1100	124	\|	1101 1101	221	›
0111 1101	125	}	1101 1110	222	ﬁ
0111 1110	126	~	1101 1111	223	ﬂ
0111 1111	127		1110 0000	224	‡
1000 0000	128	Ä	1110 0001	225	·

There are values missing from the above table, for characters that cannot be displayed correctly in a web browser. For a complete table showing all 256 characters, visit PC Guide.com.

How does this affect Unicode values? If you remember in our post about CHAR, NCHAR, VARCHAR and NVARCHAR data types, we discovered that the Unicode versions (those types starting with N) will use two bytes in memory and on disk to store a single character, compared to the non-Unicode (sometimes called ASCII or plain text) data types, which use only one byte per character.

The high-level reason for this is that some alphabets have more than 256 characters, so the code page (the full set of characters in upper- and lower-case where applicable, plus all the numbers, punctuation marks, and so forth) won’t fit in the 256 possibilities available in a single byte.

When we stick two bytes together however, we suddenly have as many as 2^16 values that we can store, for a total of 65,536 possibilities. This is mostly good enough if you’re not storing Japanese in SQL Server.

There are exceptions to this, where some kanji takes up four bytes per character. This is known as UTF-32 (Unicode Transformation Format, 32 bits per character). The good news is, SQL Server does support multi-byte characters wider than standard (UTF-16) Unicode, as long as we pick the correct collation.

I hope this answers any burning questions you may have had about bits and bytes.

Feel free to reach out to me on Twitter at @bornsql.

BONUS! Check out this great video on what a CPU looks like at a massive scale (link from Leon Adato):