` Printed Icetips Article

Icetips Article



SQL Related Articles: Better SQL Part 6
2002-07-24 -- Dan Pressnell
 
So this guy walks into a fast food restaurant, stands in line, and when he gets to the counter, he asks the girl, "Are you the server?"
 
"Yes, I'll be serving you," she says. "What can I get for you?"
 
"Good. I would a french fry."
 
"One order of fries. Anything else?" she asks.
 
"No. Not a bag of fries. One french fry," he says.
 
"Just one french fry?"
 
"Yes, please."
 
So the girl makes a plan, and executes it. She goes over to the french fries, gets one, and returns to the counter, then gives it to the man. She completes the transaction by making a record of the purchase. Then she begins to serve the next person in line.
 
The man goes to a table and sits down and eats his french fry. Then he gets up and stands in line again. When he gets to the counter, he orders one more french fry.
 
This happens several times.
 
Finally, one one of his many round trips to this server, the girl asks, "Why are you getting just one french fry at a time? You can get a whole bag at one time, you know."
 
"This is the way I have always done things. It's always worked well in the past," the man answers.
 
"But doing it this way will take a long time, it's keeping me busier than I have to be, and it's making the other customers have to wait longer," the girl says.
 
The man says, "I'm used to waiting a long time."
 
The girl points at the other customers and asks, "Do you see anybody else doing it this way?"
 
The man says, "No, but none of these other people are Clarion programmers."
 
(WAIT! If you think that's funny, here comes the REAL puch line!)
 
So the girl leans over the counter and asks, "What's the job market like for Clarion programmers?"
 
Isn't that a riot? "What's the job market like..." It cracks me up!
 

Do you know what happens at the server when you send a query? Do you care?
 
If you don't know or care, now is the time to start knowing and caring. Understanding this is very important, and a lack of understanding on the part of Clarion programmers is, I think, a major reason the business world is turning away from Clarion.
 
Here is an excerpt from the ODBC help file that is in the Microsoft Data Access Kit.
 
------ begin quote -----
To process an SQL statement, a DBMS performs the following five steps:
 
1. The DBMS first parses the SQL statement. It breaks the statement up into individual words, called tokens, makes sure that the statement has a valid verb and valid clauses, and so on. Syntax errors and misspellings can be detected in this step.
 
2. The DBMS validates the statement. It checks the statement against the system catalog. Do all the tables named in the statement exist in the database? Do all of the columns exist and are the column names unambiguous? Does the user have the required privileges to execute the statement? Certain semantic errors can be detected in this step.
 
3. The DBMS generates an access plan for the statement. The access plan is a binary representation of the steps that are required to carry out the statement; it is the DBMS equivalent of executable code.
 
4. The DBMS optimizes the access plan. It explores various ways to carry out the access plan. Can an index be used to speed a search? Should the DBMS first apply a search condition to Table A and then join it to Table B, or should it begin with the join and use the search condition afterward? Can a sequential search through a table be avoided or reduced to a subset of the table? After exploring the alternatives, the DBMS chooses one of them.
 
5. The DBMS executes the statement by running the access plan.
----- end quote -----
 
Pretty impressive, isn't it? And all that happens BEFORE the server returns you the first byte of data!
 
Generally this happens with every query you send to the server. There are some exceptions, but all your SQL programming should be done with this in mind and with the assumption that this complete process will take place with every SQL statement you send, simple or complex, whether it returns much data, little data, or no data. Your goal is to reduce as much as you reasonably can the number of times you cause this CPU-intensive process to happen.
 
The server is trying to be your friend and is offering to do much work for you. You need to be friendly to it, too.
 
That's next.
 

When you read SQL books and tutorials, everything seems so simple and easy. Just send off ONE query, and presto, your business data is all laid out in front of you.
 
But we know it's not that way in the real world. Sometimes that one SQL statement can turn into a monster. That's assuming your goal can be achieved in one statement to start with.
 
Let's calculate some baseball batting averages. In the sample at http://www.icetips.com/downloadfile.php?FileID=59 , there is an AtBats table in the MS Access database.
 
In the AtBats table, there is a column named "Result". That column is to hold an integer which represents a players "at bat."
0 = Out
1 = Single
2 = Double
3 = Triple
4 = Home Run
For non-US people, I'll describe a very simplified version of what's involved in calculating a players batting average. In baseball, it's more complex, but for the purpose of illustration, I'll eliminate some things.
 
A player's batting average represents the percentage of times the player has made a "hit." In this case, a hit is a Single, Double, Triple, or Home Run. You take all his hits and divide by all the times he was "at bat", and you get his average, which ranges from 0.000 to 1.000.
 
The AtBats table has all the information we need to calculate any player's batting average.
 
Becaue of the multiple values that can represent a hit, the SQL can become quite complex in one statement, if it can be done. I suspect it can be done, but for this discussion, let's stipulate that it CANNOT be done. In other words, I am setting out the condition that to do a report or browse for all the players' batting averages, we MUST use more than one SELECT.
 
The concept here is exactly the same for a report or a browse. I'll illustrate the traditional way of doing it for a report.
 
Let's say you are using the ABC template chain. You use a report procedure to report on all the players' batting averages (or some defined subset of players or averages). You make the Players table your primary table for the report.
 
Then you go to the ThisReport.TakeRecord embed, and just before the PRINT(RPT:DETAIL) line, you embed something like this:
        do GetHitsFromServerForCurrentPlayerAndCalculate
That is, you call your routine that queries the AtBats table, getting the AtBats records for the player currently printing, do your math with the results, and then print the player's statistics.
 
The next time your code encounters the ThisReport.Takerecord embed, it does the same thing for the next player.
 
This is the way we always do things, right? Yes, this is the TRADITIONAL way. But we go one step further, and in that routine where we get the data, we actually use SUM in our SQL, and we think that is way cool!
 
But it's not. It's getting one french fry at a time!
 
If your report prints the batting averages for 1,000 players, how many queries do you have to hit the server with? Let's think about it. First, we know that Clarion's template code will get all the players with one query, because it's the primary table for the report procedure. So that's query number 1. Now we know that we are going to his the AtBats table with one query per player, so that's another 1,000 queries. To print this silly report, we will have to hit the server 1,001 times! It might take 15-20 minutes to generate your report (I'm not counting printing, just generating the pages).
 
This kind of thing works well with a desktop system like TPS files. But an SQL system is far from that! Keep in mind what the server has to do with every query you send it. Now add to that the fact that the server must service other users' requests, and you've got a big problem.
 
If you have any such reports, try an experiment. Ask your DBA to unlock the server room, and start whatever monitor he has that will show you server utilization. Then run your report and watch. You'll see the CPU-usage hover near 100%, although it seems nothing is happening on your workstation. That will go on for a long time. Keep in mind that the server still has to service other users while this nonsense is going on!
 
Remember how back at the beginning of this series I wrote this:
 
"You have to think first of entire result sets--that is, all customers matching a certain criteria, all accounts past a certain age, etc. You will end up processing each eventually on a row-by-row basis, but you first have to concentrate on getting the data back all at once and quicky. [sic]"
 
Sometimes a bare statement of theory fails to show what it really means. In this case, we need to get ALL the AtBat records matching ALL the players we have just queried, and do it with ONE sql statement.
 
So what will we accomplish by that? Won't we still get just as much data from the AtBats table? Yes. But the difference is we have done it in one round trip to the server. We will have reduced the number of queries from 1,001 to 2, and we will have gotten the whole bag of french fries on our first trip. That 15-20 minute report will now be generated in 15-20 SECONDS.
 
The code that illustrates the solution I'm talking about is in the ViewBattingAverages procedure in the demo app.



Printed November 23, 2024, 5:56 am
This article has been viewed/printed 35327 times.
Google search has resulted in 75 hits on this article since January 25, 2004.