Tuesday, May 11, 2010

SQL Meme: Tagged: 5 things SQL Server should drop

I have been tagged by Denis Gobo (Blog/Twitter) in a SQL meme regarding the top 5 things I would drop from SQL Server, Denis’s post. I am sure some of you could spit out a list a mile long, but I am going to focus on my 5 biggest pet peeves, well ones that have not been listed yet :^).

sp_msforeachdb and sp_msforeachtable

First off these two stored procedures are undocumented, so they can be deprecated or the functionality may change. In my opinion, these two stored procedures are useless. If you look underneath the hood, these stored procedures both use basic cursors…. Sure they make fancy work of the “?”, but you can to with your own cursor. Remove these from your code and roll your own cursors.

PRINT

Print is a bit antiquated when it comes to newer versions of SQL Server. I am not saying drop print altogether, but drop the 8000 varchar/4000 nvarchar print limitation. I have seen this byte (sorry couldn’t resist) people over and over. It does not make sense to allow a developer to store 2 GB worth of data in a variable and then only print 8000 characters… Sure we can roll our own print procedure, or use XML, but why should we work around the issue. Allow PRINT to “print” up to the maximum variable size.

ORDER BY Constant in Windowing Function

If you try to order by a constant value using a windowing function, such as Row_Number(), you will get a message stating constants are not allowed; however, there is a workaround. The workaround is to use a subquery (with a constant) in the order by clause. The behavior should be removed because it seemingly gives developers the idea that the data will be order in the order of table. Before anyone says anything, a table does not have a predefined order. So what we have here is a number sequence that is not guaranteed each time it is run. In my book, inconsistent behavior = Remove the functionality and make the user order by an actual column.

**** UPDATE ****

I have been asked to provide a sample of what I am talking about here. Essentially create a very simple table with a few columns as such:

DECLARE @t TABLE(
Id INT IDENTITY(1,1) PRIMARY KEY,
SomeChar CHAR(1)
);

INSERT INTO @t (SomeChar) VALUES ('a') ;
INSERT INTO @t (SomeChar) VALUES ('b') ;
INSERT INTO @t (SomeChar) VALUES ('c') ;

Now try each of the following queries against the table and you will see one windowing function says constants cannot be used, but clearly then can with a little ingenuity. The problem here is the illusion that the data will be sequenced in the order of the table, without actually sorting it; however, what is really occurring is the optimizer is generating a sequence whose order may vary from execution to execution. A column should be supplied; otherwise, unexpected results may occur.

SELECT Id,SomeChar,ROW_NUMBER() OVER(ORDER BY 'a') AS seq
FROM @t
/*Windowed functions do not support constants as ORDER BY clause expressions.*/

SELECT Id,SomeChar,ROW_NUMBER() OVER(ORDER BY (SELECT 'a')) AS seq
FROM @t
/*
Id          SomeChar seq
----------- -------- --------------------
1           a        1
2           b        2
3           c        3
*/

Edit Top X Rows

This is a SSMS feature that I just do not find useful at all, plus this feature gives non database professionals the ability to modify/inserted/delete data, with no understanding of what is occurring in the background. In my opinion, those who use this feature are asking for trouble. I believe all insert/update/delete transactions should be done through CRUD (Create/Read/Update/Delete) stored procedures or TSQL batch operations. If you do not know how to do CRUD through TSQL, you do not need to be doing it at all.

SELECT *

I may take a little heat from this one, but SELECT * should be removed from SQL Server. SQL Server intellisense should auto expand “*” into the column list. SELECT * is a prime candidate for performance problems and wasted network traffic. SELECT * affects the optimizer’s ability to use indexes, increases network bytes, and breaks code when column ordinal position is changed or columns are added or removed. Sure we all use SELECT * for quick ad-hoc queries, but believe me.. it also exists in production code. In my opinion, the benefits of expanding the “*” outweigh the cons because it makes developers/DBAs realize how many columns they are selecting, which may tip them off that they should restrict the number of columns being selected. Also expanding the “*” prevents insert statements from breaking when columns are added or removed.

These are the items I would remove from SQL Server given the chance. I am sure I can come up with a lot more, but I will let others take a stab at this.

Until next time, happy coding.

15 comments:

Denis Gobo said...

Never heard of the poor man's debugger....it is the PRINT statement :-)


Although I like print..I would also add a pet pieve, they have to convert to varchar automatically

so this should work

declare @f float =1
print 'value = ' + @f

but we get this error
Msg 8114, Level 16, State 5, Line 2
Error converting data type varchar to float.

Now I need to do this
declare @f float =1
print 'value = ' + str(@f)

Kent Waldrop said...

Thanks for adding the example, Adam. That is what I thought you were talking about. I have seen a number of ways of faking the ORDER BY clause in this kind of situation. I tend to use something like

SELECT Id,SomeChar,ROW_NUMBER() OVER(order by @@spid) AS seq
FROM @t

but this is no better and perhaps in some ways worse because you are using @@spid.

The point that I would make is that if the reason this kind of hack is done is because we are forced to put in the ORDER BY clause in the OVER() pseudo function, why not just allow the empty OVER() and remove the requirements for the ORDER BY clause. Is what you are saying that you think the work arounds need to also be forbidden?

You might be right.

Kent Waldrop

Adam Haines said...

@Denis,

LOL, I still use print as my debugger of choice; however, it is quite an annoyance when dealing with large dynamic SQL variables. I am just saying the old functionality should support the new functionality...

Adam Haines said...

@Kent,

Yes, IMO the functionality should be removed...simply because it is unreliable. There is no guarantee that the multiple executions will return the same sequence , so this makes it pointless in my book.

A lot of things will affect the sequence including paralleism, nolock hints, index scans etc... IMO it is like using TOP without an order by.

Brian Tkatch said...

"a table does not have a predefined order"

What about a CLUSTERED INDEX? Doesn't that affect the way it is stored and read?

"SELECT * should be removed from SQL Server."

Won't help, they'll just copy the entire COLUMN list.

I asked a developer why he used SELECT *. He said it would be more convenient in TopLink. There's a point there. Get the tools to make it easier to get a COLUMN list, then people will have something as easy as *.

Also, can't remove * because of EXISTS. Sure, a COLUMN could be used. But that expresses the wrong idea. A number could be used, but a lot of people find that confusing. * means "i don't care", similar to COUNT(*).

Finally, it is not upon MS to force people how to write code. Suggest it, recommend it, advertise it, but don't force it. It's bad enough they deprecated classic join syntax.

Adam Haines said...

@Brian,

A clustered index does sort the data logically (not physically) and maintains this logical ordering, but does not have to follow it. For example, a non clustered index is used to satisfy the query. In this scenario, the rows will be returned in nonclustered index order, which is not likely to be the same order as the clustered index. Also, if the nolock hint is used, an allocation order scan may be used, which means the optimizer will not be scanning according to the doubly-linked list, but the order of IAM pages. Other scenarios also play a part in changing the returned order, such as parallelism... There is just no guarantee that the partitioned streams would ever reassembled in clustered index order. There are other scenarios out there, but you get the idea. If you used a constant in a regular ORDER BY clause, would you expect the order to be consist each and every time?

I still think that SQL Server should auto expand the * wild card. IMO this would make developers realize what they are doing and really evaluate their actions. Sure they may leave all the columns, but at least they will have a better understanding of the impact. I have seen developers use SELECT * , without really knowing how many columns are in table, because it is easier to type. At least they can see all the columns and hopefully it clutters the screen enough to make them remove columns. Another benefit is their code is less likely to break.

I also don't agree that Microsoft would be forcing programming habits down people's throats, as in my eyes expanding the "*" is a simple reminder of what we should be doing.

Brian Tkatch said...

@Adam

I didn't realize the ordering was logical not physical. Though, now i am left wondering what a CLUSTERED INDEX helps anyway.

Autoexpanding * is a nice idea. I would probably use it every time so i don't have to look for/type in the COLUMN names. Nonetheless, i feel too strongly about forcing "benefits" in people.

I wonder if there is another way to implement this. Such as make the * show up with an expandable plus-sign next to it showing a collapsible section which when expanded would be the entire COLUMN list. Or, issue a warning stating * isn't a good idea. Or something.

Adam Haines said...

@Brian,

Clustered indexes do have other value. Tables with clustered indexes are typically smaller than HEAPS.MSFT has a great whitepaper on this: http://msdn.microsoft.com/en-us/library/cc917672.aspx They also make better covering indexes (for wide queries) because all the data exists in the clustered index. The key thing to remember is that no sort is guaranteed, unless a non-constant ORDER BY is used.

I think the plus sign is another great idea. I think something subtle like a plus sign or an autoexpand feature is a giant step in the right direction.

Brian Tkatch said...

@Adam

"The key thing to remember is that no sort is guaranteed, unless a non-constant ORDER BY is used."

I never meant to challenge that. Just how the data was stored.

Thanx for the link. Good stuff.

Brad Schulz said...

Great stuff, as always, Adam.

I have to (kind of) disagree about the ORDER BY in the ranking window function, though.

As Kent mentioned, we are forced to specify an ORDER BY in ranking functions... just because "rank" implies an "order" of some kind.

But there are many times when we just need a ROW_NUMBER() applied and we really couldn't care less about HOW the sequencing is done, and so we specify a "don't care" ORDER BY expression... whether it be ORDER BY (SELECT 0) or ORDER BY @@SPID (I liked that one, Kent).

I don't think this should be eliminated. The only thing wrong with it is that it is non-deterministic... that's all. So is applying ROW_NUMBER() OVER (ORDER BY CustomerID) in an Order table. Customers repeat themselves in the order table, so, unless we specify a tie-breaker (by adding SalesOrderID to the ORDER BY list), we can't be guaranteed about the sequencing of rows within a CustomerID's group. But that shouldn't be illegal.

Also, by specifying an ORDER BY (SELECT 0) or ORDER BY @@SPID, the engine won't introduce a Sort operator into the plan. If we are forced to specify a column, then that will bring about unnecessary sorting if we really didn't care about it in the first place.

Maybe to make things clear, we should just use ORDER BY (SELECT 'I really don''t care!').

8^)

If we had to eliminate anything that has to do with ORDER BY, it should be the deprecation of the use of ordinal numbers in the ORDER BY clause of a main query that indicate which columns in the SELECT to sort by... that's asking for trouble.

--Brad

Adam Haines said...

@Brad,

Thanks for the feedback! I totally agree with the column ordinal positioning ORDER BY! That is very annoying and difficult to read.

As for the ranking function, I think an "unordered" or "random" sequence is something that should be handled by the application layer. In most cases, a sequence that labels rows is really only used for grid controls or pagination. At the end of the day, it is the database's job to crunch and slice and dice data into a dataset, but it is up to the application to present the data in a meaningful format.

Brian Tkatch said...

@Brad

Thanx for the explanation of the constant in ROW_NUMBER(). Good stuff. I forgot about that.

@Brad, @Adam

Ordinal numbers in ORDER BY are used by UNION.

Personally, i like the use of ordinal numbers:

1) When the COLUMN is an expression, and not just a COLUMN in a TABLE, the ORDER BY would have to repeat that. Sure, an alias could be used, but why is that different from an ordinal?

Even if names are clearer, that's like forcing me to use a constant. And, i can name the COLUMN "column_1" or "c3" anyway. What's the point?

2) Especially in ad-hoc queries, ORDERing BY ordinal is very convenient. Just the other day i was running a query and ORDERing BY different COLUMNs. If i had to type in the full name each time, it would have added time to figure it out, and mixed me up. The COLUMN names we use here are the standardized type, so they are hard enough to remember. (Table code, name, and datatype code.) I use ordinals frequently.

I actually hope they extend ordinal usage, and include it in GROUP BY. But with a keyword, or special characters. I want to be able to use: SELECT Customer, COUNT(*) FROM Order GROUP BY {1};

As usual, my feelings are that removing a feature because people use it lazily is forcing "good practice" down their throats. I am vehemently opposed to such things. Coder may be stupid, but they aren't children. They should be allowed to shoot themselves in the foot.

Joshua Smith said...

Thanks for your help in describing this. Nice review! Also turn your attention that it needs to search for mobile software development companies if you need mobile software developer.

Joshua Smith said...

Thank you for posting this great info. You have nice chance to check casino affiliate programs. The best gaming programs such as rialto affiliate and great poker rooms such as red bet affiliates.

Mingle Swipe said...

Thank you for sharing with detailed information here..
I am getting dating messages from different sites..
Why am I getting Spam Emails for Dating Sites Can I able to block it..