Blog

Why You Need A Test Environment

In 2017, there’s no excuse not to have at least a testing environment, and preferably a development environment as well, in order to create and deploy code changes (software, database, you name it) outside of a production environment. If you’re making changes in a production environment without testing them first, you’re doing it wrong.

Some companies I’ve worked with have different forms of testing environments, including QA (Quality Assurance), IAT (Internal Acceptance Testing), and UAT (User Acceptance Testing). What they are called doesn’t matter, so long as they exist.

In a typical IT deployment, whether using Waterfall, Agile, or other development methodologies of the month, it pays to have a basic developmenttestproduction deployment path.

Development

This environment can be a general playground, building things, writing code, installing weird and wonderful third-party plugins, getting the job done as far as producing new things is concerned.

In the development environment (which might be a shared machine, or a bunch of sticker-encrusted laptops used by developers), there should be something called Source Control.

Entire books have been written about source control, but the premise is that development environment is a cesspit of trouble, so any code that is written and compiled (without breaking the build), should be checked into source control as soon as possible.

This serves three purposes:

1) Other developers can access the new code without having to plug in a USB drive and risk malware infection on Steve’s laptop (sorry, Steve).

2) If Steve’s laptop is stolen, or he spills coffee on the keyboard, or it gets infected by malware because he doesn’t keep the operating system up to date, the source code is checked in safely somewhere else, and the team doesn’t lose what could amount to days of productivity.

3) The third purpose is for something called Continuous Integration, which is a fancy way of saying that as soon as code is checked into source control, an automated testing process is able to compile the code, deploy it somewhere, and run a battery of tests against it without human intervention. CI is much faster than manual testing.

It’s imperative to keep in mind that all software contains bugs. We may not have a Steve in our company, but developers always make assumptions—they have to; it’s part of the job—so little things can escalate into big things if the software is not used as intended.

Test

The test environment should be set up exactly like production, down to hardware and operating system configuration. The only differences should be application configuration that might point to a test vs. a production database server, for instance.

Why is it so important to make sure that test and production look and behave the same?

In the case of SQL Server for example, a simple thing like data distribution in a table, along with the amount of RAM or CPU cores, will cause a specific query plan to be selected. If the same query is deployed to production with a different underlying hardware configuration or data distribution, a worse query plan might be selected by the query optimizer, and the developers are going to have to work nights and weekends to fix it.

Yeah, that'd be great
Yeah, that’d be great

Whose fault is it, though? In this particular case, it’s the fault of whoever set up the test environment, because it doesn’t mirror production.

When I mentioned the different types of test environments above, this is what they are for:

  • Internal Acceptance Testing – the developers and possibly the product owners have tested the code and it works like the functional specification says it should.
  • User Acceptance Testing – the target audience (users) get their hands on the code and do the things that you never expected them to, and often will go off script or off spec. I worked with a guy called Gordon who I knew would break something. Make your code Gordon-proof.
  • Quality Assurance – the code is tested by a dedicated QA team who test the code against a functional specification as well as a set of guidelines created by the company to ensure software meets certain standards, but don’t make assumptions about it (developers always make assumptions). Think about Microsoft products. Different teams write Word and Excel, but they have to work in similar ways and have a high level of confidence that data won’t go missing or get corrupted.

Usually, there’s a bit of a cycle in the process here, where if something fails in test, it has to be fixed in development and redeployed, until it passes muster.

Production

Once code has been signed off by the end users and QA, it can be deployed into production. This can be done in many ways, and unfortunately one of those ways is that Steve has a USB drive containing files that he transferred from his coffee-and-malware-infested laptop.

Deployment to production should be the exact same process as deployment to test. Why? Because if it is not, then assumptions are being made and things can go wrong.

Remember source control? The best deployments are automated deployments. They are scripted, which makes them repeatable and therefore predictable. An automated process should be able to grab code from source control, compile it, and deploy it, the same way every time.

Deployment

Most companies will have a maintenance window during which deployments can take place. These are usually outside of working hours and on a weekend, but in my experience, weekends are a terrible time to deploy new code, so I prefer it to happen on a Wednesday night. That way, Steve is most likely in the office on Thursday morning to fix his bad code.

Source control should be the single version of the truth. If a developer is deploying from a thumb drive, stop doing that right now. There are free and cheap solutions available that can automate deployments for you that require only a small amount of effort to achieve repeatable success.

Summary

Source Control: Get it now. If you want to use an online provider (which I recommend), check out GitHub (mostly free), Visual Studio Team Services (free for up to five developers), or BitBucket (mostly free). There are probably hundreds more.

Continuous Integration: Check it out. Visual Studio Team Services has it built in, even with the free version. Remember, the whole point of this is automated builds and repeatable results.

If you would like to discuss this with me further, you can find me on Twitter at @bornsql.

How to write a DELETE query

My First DELETE Statement

Here are the links to the previous posts in this series:

  1. My First SELECT Statement
  2. My First INSERT Statement
  3. My First UPDATE Statement

This week is a much shorter post, where we will learn how to remove data from a table using a DELETE statement.

We will also refresh our memories about the importance of the WHERE clause.

DELETE with a WHERE clause

As with the INSERT statement, a DELETE statement affects the entire row in a table.

To delete information, we identify which row(s) to get rid of using the DELETE keyword, the table name, and a WHERE condition:

DELETE FROM [dbo].[Stores]
WHERE [StoreID] = 2;
GO

In this example, we use the Primary Key StoreID as our row identifier, because it is guaranteed to be unique.

Pro tip: Write the WHERE clause before you write the rest of DELETE statement. It is incredibly easy to accidentally delete every single row in a table.

If that happens, and it’s a production environment, we will have to open up the Run Book to figure out how to restore from a backup.

Remember, the purpose of a WHERE clause is to restrict the data modification using a filter or limit.

DELETE without a WHERE clause

This is an exercise for the reader, with the complete understanding that unless we have a full backup of our demo database, we should never run a DELETE on a table without a WHERE clause.

To run a DELETE without a WHERE clause, simply delete the WHERE condition from the above statement, keeping in mind that the entire table will be emptied out, with no way to go back.

The Final Results

If you’re following along in SQL Server Management Studio, we can run the DELETE statement like so:

DELETE FROM [dbo].[Stores]
WHERE [StoreID] = 2;
GO

Once we have executed the statement (using F5 on the keyboard, or clicking the Execute button in the toolbar), we see a message in the Messages pane:

(1 row(s) affected)

Using our familiar SELECT, we will see the new result set, containing only one row, because the West Edmonton Mall row has been erased from existence:

Congratulations! We have deleted a row from our table, and a SELECT query proved that it was deleted.

Next time, we will find out why a test environment is so important (which was hinted at by the warnings about our WHERE clause). Stay tuned.

Look me up on Twitter if you want to ask any questions about this series, at @bornsql.

How to write an UPDATE query

My First UPDATE Statement

Last week we covered how to put information into a table using an INSERT statement.

This week we will learn how to make changes to data that is already in a table using an UPDATE statement. We are also going to learn all about why the WHERE clause is so important.

UPDATE

Updating information stored in a database can be as simple as identifying which column to update and specifying the new data to write in that column:

UPDATE [dbo].[Stores]
SET [StoreName] = N'West Edmonton Mall';
GO

The problem with this particular UPDATE statement, though, is that it will update every row, when we may only want to change one record. That’s why we need to add WHERE. The purpose of a WHERE clause is to restrict the data modification using a filter or limit.

The WHERE clause

In ANSI SQL (remember, ANSI is the standards organisation that defines how SQL works with different database platforms), a WHERE clause goes somewhere near the end of a query, to filter or limit the data being affected.

The WHERE keyword can be used on all four basic data manipulation queries: SELECT, INSERT, UPDATE and DELETE. The reason we’re only learning about it today, is that it is a critical part of the UPDATE and DELETE statements.

Limiting the damage

An UPDATE statement without a WHERE condition, or clause, will update the entire table.

Consider these two examples, and imagine what is happening in the table:

-- Without a WHERE clause
UPDATE [dbo].[Stores]
SET [StoreName] = N'West Edmonton Mall';
GO

-- With a WHERE clause
UPDATE [dbo].[Stores]
SET [StoreName] = N'West Edmonton Mall'
WHERE [StoreID] = 2;
GO;

In the first example, every single row in our table will now have ‘West Edmonton Mall’ for the store name. In the second example, only the row (or rows) that match the condition in the WHERE clause will be updated.

Notice how the WHERE condition relates to a value in the [StoreID] column, which is this table’s Primary Key. When updating individual rows in a table, it’s always better to use a unique value to guarantee that we only update a single row.

We could use a WHERE condition on a different column, or combinations of columns, using AND and OR logic statements. WHERE clauses can be extremely complex.

Note: the UPDATE statement will update whatever we tell it to. Even though the original value of the column was ‘West Edmonton Mall’, the database engine will happily update that value again and again, to the same string, if we ask it to.

SET

Astute readers will recognise a new keyword in the UPDATE statement: SET.

The first part of an UPDATE specifies the table we wish to modify.

Then, the SET keyword specifies the column we want to change. We can add more than one column to our SET clause, separated by commas, if we have more than one column in the same row (or rows) that we wish to update.

For instance, let’s assume the West Edmonton Mall store has a new manager. We can modify the [ManagerName] and [ManagerEmail] columns at the same time.

UPDATE [dbo].[Stores]
SET [ManagerName] = N'Wesley Wensleydale',
[ManagerEmail] = N'wesley@example.com'
WHERE [StoreID] = 2;
GO;

This operation, where more than one thing is being changed at the same time, is called a set-based operation. In other words, a set of data is being updated at once, instead of writing an UPDATE statement for every single column.

Set-based operations can run on the entire table, or on a filtered set of data specified by the WHERE clause.

Ambiguity Verboten

Unlike the INSERT statement, where column names were implied, an UPDATE statement has to explicitly list the column (or columns) that we are updating. In our examples above, we had [StoreName], [ManagerName] and [ManagerEmail]. This is because there can be absolutely no ambiguity when modifying data.

This is also why a WHERE clause is so important. I have personally run an UPDATE or DELETE without adding a WHERE clause, and it happens a lot in this field. Make sure to add a WHERE keyword before writing the rest of the statement.

Some tools that plug into SQL Server Management Studio will detect if we have missed a WHERE clause on an UPDATE or DELETE, but we can’t always rely on plugins. For example, we might have to use a different computer one day, or write our SQL code in a text editor like Notepad, and only good habits can avoid disaster.

The Final Results

If you’re following along in SQL Server Management Studio, we can run the three-column UPDATE statement like so:

UPDATE [dbo].[Stores]
SET [StoreName] = N'West Edmonton Mall',
[ManagerName] = N'Wesley Wensleydale',
[ManagerEmail] = N'wesley@example.com'
WHERE [StoreID] = 2;
GO

Once we have executed the statement (using F5 on the keyboard, or clicking the Execute button in the toolbar), we see a message in the Messages pane:

(1 row(s) affected)

Using our recently-learned skill to SELECT the data from this table, we will see the new result set, containing the new row:

Congratulations! We have modified a row of data in our table, and a SELECT query proved that it was inserted.

Next time, we will be removing data from a table using the DELETE command. Stay tuned.

Look me up on Twitter if you want to ask any questions about this series, at @bornsql.

How to write an INSERT query

My First INSERT Statement

Last week we covered how to get information out of a table, using a SELECT query.

This week, we will discover some of the myriad ways to put data into a table.

The good news is the concept is straightforward: we have a list of columns in a table, and each column has a datatype. We will insert a row into the table, according to the column order and datatype.

In reality, inserting data into a table is fraught with complexity.

Using our Stores table from before, the simplest way to write an INSERT statement is as follows:

INSERT INTO [dbo].[Stores]
VALUES (NULL,
N'West Edmonton Mall',
N'8882-170 Street, Edmonton, AB, T5T 4M2',
N'Stephanie West', N'stephanie@example.com');

Note: Remember that our store’s name and address, and manager’s name and email address, are all stored as Unicode (NVARCHAR), so the string has to be prefixed with an N. This guarantees that whatever is between the quotation marks is Unicode already and won’t be converted behind the scenes and potentially cause problems.

Ambiguity with column order

Our first problem is that it’s not clear which columns are being inserted into, nor their order.

Without a list of columns in the INSERT statement, the database engine will insert in the order provided, into whatever columns are on the table, and it might even do implicit conversions on data types behind the scenes.

It is therefore good practice to include the column names when writing an INSERT statement, like so:

INSERT INTO [dbo].[Stores] (
[StoreCode],
[StoreName],
[Address],
[ManagerName],
[ManagerEmail]
)
VALUES (
NULL,
N'West Edmonton Mall',
N'8882-170 Street, Edmonton, AB, T5T 4M2',
N'Stephanie West', N'stephanie@example.com'
);

Now we are sure that the column order is correct, and even if the table somehow has a different column order (for instance, the ManagerEmail and ManagerName are swapped around), this statement will succeed.

Where’s the StoreID column?

The astute reader has noticed there is no reference to the StoreID column, which happens to be the Primary Key for this table.

This is one of the ways a lot of accidental DBAs (and even experienced DBAs) get tripped up. If we think back to the table definition, we used an IDENTITY value.

To refresh our memories, an IDENTITY value is an auto-incrementing integer value, generated by the database engine, in order to ensure that the Primary Key is unique.

Note: It is because the IDENTITY is set that this column is being excluded from the INSERT statement above. Primary Keys which don’t have an IDENTITY set must be included in INSERT statements, provided that the value(s) being inserted will be unique.

NULL Values

We spoke about null values in the beginning of the series, and in this example we can see how to insert a NULL value into a table, provided the column definition allows nulls in the first place. We happen to know that the StoreCode is a nullable column.

Default Values

It is possible to exclude columns that have default values on them. If we think about it, an IDENTITY column is a column that has a default value assigned to it, which just happens to be an auto-incrementing integer.

We might want to have columns that have a default value of the current date and time. This convention is used when auditing database events by adding a CreatedDate column (for example) that defaults to the current date and time, using DATETIME2(7), when a row is inserted.

Another example might be to use a default value of 0 in a bit column and update that value later on.

In these cases, columns with default values can be excluded from the INSERT statement, because the database engine will automatically put the default value into that column.

That being said, there’s nothing stopping us from using a different value for a column that has a default value. If we have a default value on a column, like our DATETIME2(7) example, we could override that default value as long as we include the column and a new value in the INSERT statement.

Adding a column to the table

What happens if, during the course of regular business, a column is added to the Stores table? Unless that column has a default value, both examples of the INSERT statement above will fail.

The final results

If you’re following along in SQL Server Management Studio, we can run the INSERT statement like so:

INSERT INTO [dbo].[Stores] (
[StoreCode],
[StoreName],
[Address],
[ManagerName],
[ManagerEmail]
)
VALUES (
NULL,
N'West Edmonton Mall',
N'8882-170 Street, Edmonton, AB, T5T 4M2',
N'Stephanie West', N'stephanie@example.com'
);

Once we have executed the statement (using F5 on the keyboard, or clicking the Execute button in the toolbar), we see a message in the Messages pane:

(1 row(s) affected)

Using our recently-learned skill to SELECT the data from this table, we will see the new result set, containing the new row:

Congratulations! We have put a new row of data into a table, and a SELECT query proved that it was inserted.

Notice that the NULL value has a different background colour to the rest of the data. This is a way to distinguish actual null values from string columns that might just have the word “NULL” stored there.

Next time, we will be updating data in a table using the UPDATE command. Stay tuned.

Look me up on Twitter if you want to ask any questions about this series, on @bornsql.

How to write a SELECT query

My First SELECT Statement

Microsoft SQL Server makes it really easy for us to query tables. In SQL Server Management Studio (SSMS) for instance, we can right-click on any table we have access to and select the top 1000 rows from that table.

Don’t do this

Please don’t query tables this way in a production environment. It’s a bad way to do it, and you should feel bad.

Writing a SELECT statement should be done manually, the way Ada Lovelace and Grace Hopper intended.

Let’s assume we want to get a list of the stores in our database that we created in the First Look at Normalization post.

The table had the following columns: StoreID, StoreCode, StoreName, Address, ManagerName, and ManagerEmail.

To get a list of all rows and all columns in a table, we write the following statement:

SELECT
[StoreID],
[StoreCode],
[StoreName],
[Address],
[ManagerName],
[ManagerEmail]
FROM
[dbo].[Stores];

Remember from previous posts that the square brackets are a matter of style, and we could just as easily exclude them.

I leave them in because humans are terrible at naming things, and the square brackets make the code less likely to fail.

You’ll notice that there is a semi-colon at the end of the statement. We could have placed the entire statement on one line, like so:

SELECT [StoreID], [StoreCode], [StoreName], [Address], [ManagerName], [ManagerEmail] FROM [dbo].[Stores];

This is more difficult to read. SQL Server doesn’t really care about white space, including carriage returns, so feel free to format your code nicely so that it’s easy for you to read.

If you’re typing this in Management Studio, you can now press the Execute button in the menu, or the F5 key on the keyboard, to run the command.

Tip: in Management Studio, we can select just the text (T-SQL code) we want to run, using the mouse, and then press F5. This will guarantee that no other part of the script will run as well. It’s a useful way to run portions of code in a longer script.

Once we run the SELECT statement, we see a result set.

Click to enlarge

Congratulations! We have asked SQL Server for data, and it has responded with the data we asked for.

Next time, we will be adding data to a table using the INSERT command. Stay tuned.

Look me up on Twitter if you want to ask any questions about this series, on @bornsql.

Querying a Database

When we want to retrieve information from a database, we query the structure with language appropriate to the database.

Remember right at the start of this series we saw that a database could be a phone book or a recipe book. So how do we find the phone number of Randolph West? By looking up the surnames starting with W, and going through all the Wests in alphabetical order until we get to that entry.

The same goes for finding the recipe for lemon meringue in a recipe book. Start at the index at the back, look through the desserts, and then search for meringue.

In a RDBMS (relational database management system), the language for querying data is called Structured Query Language, or SQL. We can pronounce it like “sequel”, or sound out each letter.

SQL Server is commonly pronounced “Sequel Server”. MySQL is pronounced “My-S-Q-L”, sounding out the letters (some people pronounce it “My-Sequel”). It all depends on who’s saying it. PostgreSQL is just pronounced “Postgres”, because seriously.

These, along with Oracle, are the major players in the RDBMS industry.

ANSI SQL

When it comes to putting information into a database server, and getting information out, we can write queries that look very similar across platforms. This is thanks to a standards body called ANSI (American National Standards Institute), which (with proposals from each vendor) has suggested ANSI SQL syntax that all vendors should use.

For the sake of this series, we will assume that the vendors do follow the standard exactly, but in reality it isn’t that simple.

Putting data in: INSERT

Adding new data to a database is performed using an INSERT operation.

Changing data: UPDATE

Modifying existing data is done with an UPDATE operation.

Getting rid of data: DELETE

Removing rows from a table is performed with a DELETE operation.

Getting data out: SELECT

The vast majority of operations in a database has to do with data retrieval. To get data out, we SELECT it.

CRUD

In technical circles, between software developers and database developers, we might refer to these four operations using the mnemonic CRUD, which stands for Create, Read, Update, Delete.

When referring to a specific database or application, it could mean that the database is just being used as a data store (a virtual box of random stuff) and may not have proper relationships between the tables, nor be normalized.

There’s nothing necessarily evil about denormalized data, because the application code may handle that intelligence. Be wary though. Using an RDBMS to store non-relational data might cause headaches.

Next time we will write our first SELECT statement to query a database. Stay tuned!

Look me up on Twitter if you want to ask any questions about this series, on @bornsql.

SQL Server Management Studio v17.0

Version numbers are confusing. SQL Server Management Studio (SSMS), the client user interface by which most DBAs access SQL Server, was decoupled from the server product for SQL Server 2016.

For the last 18 months or so, we have been receiving semi-regular updates to SSMS (which we can download from Microsoft, for free), which is amazing and awesome.

SQL Server 2017 was recently announced, as I mentioned last week, and the internal server version is going to be some variation of 14.0. The compatibility mode is 140. This follows logically from previous versions.

Management Studio, however, has taken a different tack with version numbers. The latest version, which was released at the end of April, is 17.0.

So if you’re confused, you’re not alone.

To make up for some of that confusion, here’s a neat trick in SSMS 17. Let’s say I want to present a session at a SQLSaturday. In the past, I had to modify the fonts and sizes for myriad settings in Management Studio. Paul Randal has a seminal post about this very thing.

With version 17, we can enable a new feature called Presenter Mode, which automatically sets the fonts and sizes of the SSMS interface to improve visibility when sharing your screen in a conference call or using a projector.

In the Quick Launch textbox on the top right of the screen (press Ctrl+Q), type the word PresentOn.

Our standard SSMS interface changes from this:

to this:

To switch it back to the default view, it’s not PresentOff as we might expect. Instead, we must type RestoreDefaultFonts in the Quick Launch textbox.

Note: the results grid will not take on the new settings until SSMS is restarted. Keep this in mind when switching Presenter Mode on and off.

If you have any more SQL Server Management Studio 17 tips to share, find me on Twitter at @bornsql.

SQL Server 2017 Announced

By now you will have heard that the next version of SQL Server has been announced. There’s no release date yet, but Brent Ozar seems to think it’ll be before June.

There are many new features, but the biggest deal is that SQL Server 2017 runs on both Windows Server and Linux. Yup, SQL Server 2017 is certified to run on Windows Server, Red Hat Enterprise Linux, Ubuntu, and SuSE. (You can even run it on macOS in a Docker container.)

There are some big improvements to the Query Optimizer as well, starting with Adaptive Query Processing. It’s like a smart go-faster button, making incremental improvements to your queries the more often they run.

Despite being in pre-release mode, Microsoft states that 2017 is production-ready, which means that it should work as expected.

Watch the announcement, and download the latest technical preview to try it for yourself.

Something something Production something something you idiot

You’re reading this series of posts because you want to learn about databases and how to use them.

What you should not be doing is learning about databases and how to use them, while working inside a production environment.

Also called “prod”, because we’re lazy, this server (or servers) is not meant for testing things out. We should always make sure that when practising new bits of code, we do it in a development environment (“dev”). At least if we make a mistake (and mistakes happen a lot in this field), it won’t cause the main payroll system to crash.

The best way to set up a development server is to create a virtual machine where you can install anything you like, and if something goes wrong, you can delete and rebuild the virtual machine without causing monetary and/or data loss.

I’m not kidding. Go set up a virtual machine. If you don’t know how, then ask me and I’ll explain it (there’s a future blog post for setting up a virtual machine).

Get off production.

Upcoming SQLSaturdays in Alberta, Canada

Taking a short break from the Database Fundamentals series of the last few weeks, I’d like to mention some upcoming PASS community events in the province of Alberta.

I will be presenting at SQLSaturday #594 in Edmonton on 22 April 2017 (this coming Saturday). My topic is Migrating to Azure SQL Database: Tips, Tricks and Lessons Learned.

Next weekend, I will be hosting SQLSaturday #607 in Calgary on 29 April 2017. This is the first ever SQLSaturday in the city of Calgary, and we even have a special message from our celebrity mayor, Naheed Nenshi.

If you live in or around these two cities, please come and say hi. You can also reach out to me on Twitter at @bornsql or @sqlsatcalgary.

The Database Fundamentals series will continue next week.