Loading...
Advanced SAS
Advanced SAS Programming
Corporate
Improving Query Performance
Online
Training
VirtualNuggets
https://advancedsasonlinetraining.blogspot.com/2015/08/improving-query-performance.html
There are several ways to improve query
performance, including the following:
·
using
indexes and composite indexes
·
using
the keyword ALL in set operations when you know that there are no duplicate
rows, or when it does not matter if you
have duplicate rows in the result table
·
omitting
the ORDER BY clause when you create tables and views
·
using
in-line views instead of temporary tables (or vice versa)
·
using
joins instead of subqueries
·
using
WHERE expressions to limit the size of result tables that are created with
joins
·
using
either PROC SQL options, SAS system options, or both to replace a PUT function
in a query with a logically equivalent expression
·
replacing
references to the DATE, TIME, DATETIME, and TODAY functions in a query with
their equivalent constant values before the query executes
·
disabling
the remerging of data when summary functions are used in a query
Using
Indexes to Improve Performance
Indexes are created with the CREATE
INDEX statement in PROC SQL or with the MODIFY and INDEX CREATE statements in
the DATASETS procedure. Indexes are stored in specialized members of a SAS
library and have a SAS member type of INDEX. The values that are stored in an
index are automatically updated if you make a change to the underlying data.
Indexes can improve the performance of
certain classes of retrievals. For example, if an indexed column is compared to
a constant value in a WHERE expression, then the index will likely improve the
query's performance. Indexing the column that is specified in a correlated
reference to an outer table also improves a subquery's (and hence, query's)
performance. Composite indexes can improve the performance of queries that
compare the columns that are named in the composite index with constant values
that are linked using the AND operator. For example, if you have a compound
index in the columns CITY and STATE, and the WHERE expression is specified as
WHERE CITY='xxx' AND STATE='yy', then the index can be used to select that
subset of rows more efficiently. Indexes can also benefit queries that have a
WHERE clause in this form:
... where var1 in (select item1 from
table1) ...
The values of VAR1 from the outer query
are found in the inner query by using the index. An index can improve the
processing of a table join, if the columns that participate in the join are
indexed in one of the tables. This optimization can be done for equijoin
queries only--that is, when the WHERE expression specifies that
table1.X=table2.Y.
Using
the Keyword ALL in Set Operations
Set operators such as UNION, OUTER
UNION, EXCEPT, and INTERSECT can be used to combine queries. Specifying the
optional ALL keyword prevents the final process that eliminates duplicate rows
from the result table. You should use the ALL form when you know that there are
no duplicate rows or when it does not matter whether the duplicate rows remain
in the result table.
Omitting
the ORDER BY Clause When Creating Tables and Views
If you specify the ORDER BY clause when
a table or view is created, then the data is always displayed in that order
unless you specify another ORDER BY clause in a query that references that
table or view. As with any sorting procedure, using ORDER BY when retrieving
data has certain performance costs, especially on large tables. If the order of
your output is not important for your results, then your queries will typically
run faster without an ORDER BY clause.
Using
In-Line Views versus Temporary Tables
It is often helpful when you are
exploring a problem to break a query down into several steps and create
temporary tables to hold the intermediate results. After you have worked
through the problem, combining the queries into one query by using in-line
views can be more efficient. However, under certain circumstances it is more
efficient to use temporary tables. You should try both methods to determine
which is more efficient for your case.
Comparing
Subqueries with Joins
Many subqueries can also be expressed as
joins. Generally, a join is processed at least as efficiently as the subquery.
PROC SQL stores the result values for each unique set of correlation columns
temporarily, thereby eliminating the need to calculate the subquery more than
once.
Using
WHERE Expressions with Joins
When joining tables, you should specify
a WHERE expression. Joins without WHERE expressions are often time-consuming to
evaluate because of the multiplier effect of the Cartesian product. For
example, joining two tables of 1,000 rows each without specifying a WHERE
expression or an ON clause, produces a result table with one million rows.
PROC SQL executes and obtains the
correct results in unbalanced WHERE expressions (or ON join expressions) in an
equijoin, as shown here, but handles them inefficiently:
where table1.columnA-table2.columnB=0
It is more efficient to rewrite this
clause to balance the expression so that columns from each table are on
alternate sides of the equals condition:
where table1.columnA=table2.columnB
PROC SQL sequentially processes joins
that do not have an equijoin condition evaluating each row against the WHERE
expression: that is, joins without an equijoin condition are not evaluated
using sort-merge or index-lookup techniques. Evaluating left and right outer
joins is generally comparable to, or only slightly slower than, a standard inner
join. A full outer join usually requires two passes over both tables in the
join, although PROC SQL tries to store as much data as possible in buffers.
Thus for small tables, an outer join might be processed with only one physical
read of the data.
Advanced SAS,
Advanced SAS Programming,
Corporate,
Improving Query Performance,
Online,
Training,
VirtualNuggets
VirtualNuggets
4973440053164035657
Post a Comment
Home
item
Blog Archive
Popular Posts
-
String functions can be very useful for the processing of complex data sets and for sub setting data sets according to values contained ...
-
SAS dictionary tables contain the data at the back the scenes in SAS a.k.a. the metadata. The dictionary tables are only directly availa...
-
Creating dynamic, data driven programs is a very powerful tool. Often we can use the metadata or the project data itself to help write o...
-
1. Which of these is false? Ranges in the VALUE statement can specify… 1. a single value, such as 24 or ‘S’ 2. a range ...
-
There are several ways to improve query performance, including the following: · using indexes and composite indexes · ...
-
· Demonstrate advanced data set processing techniques such as updating master data sets, transposing data, combining/merging data...
-
SAS macro variables are part of the macro facility, which is a tool for extending and customizing SAS software. These Macro variables fa...