Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

Friday, 12 October 2018

Purging SQL Table without impacting the users


This happens very rarely but sometime I have been asked to conditional purge the SQL table without impacting the Users (So we can not use the truncate which is quick and harsh  :) ), These kind of request is pain when client says this table is business critical and used by different application 24/7 so can not take offline and delete the data. So I came up with SQL script or Stored Procedure which will purge the data without impacting the users, though this Proc can run slow or fast based on the Batch Size you have selected, Smaller (not Smallest) the batch size lower the deadlock risk. You can select the Batch and Total deleted row count based on your need. 


Hope, This proc will be useful for you and let me know if required any improvement. I have written this for SQL Server, you can rewrite for any other DB as well after small tweaking

-



Like the below page to get update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group     


Sunday, 14 August 2016

NULL behaviour in Oracle


NULL, as known in SQL, a special marker and keyword indicating that something has no value. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown. (ref - https://technet.microsoft.com/en-us/library/ms191504(v=sql.105).aspx)

For understanding it, I am using a practical approach where we play around NULL value, sorry No Value, with Oracle database.



** NULL value doesn't count in aggregate functions.
NULL value doesn't count by aggregate functions such as MAX, MIN, SUM, COUNT in SQL.

Example:


** Inserted null string converted to Null.

Example:


** Where can Null be compared?
Example:


Oracle SQL Reference says "In a DECODE function, Oracle considers two nulls to be equivalent. If expr is null, then Oracle returns the result of the first search that is also null."
But this therory varies with DB to DB internal architecture and the way they handled the NULL.Another place where Null can be compared is in range partition definition, where MAXVALUE is greater than Null

**  Unique constraints
Example:


You are able to insert another Null without getting ORA-1 (unique constraint violated).
--
Example:


So if all columns are null, the unique constraint will not be violated. If one or more columns have non-null values, the constraint takes effect.


** Unknown OR True returns True, Unknown AND False returns False
Example:



 
Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 7 June 2016

Tuning SQL Statement #4



Using Functions in QueriesWhen using functions in SQL statements it is best to keep the functions away from any columns involving index matching unless function-based indexes are available and can be created.

Do’s   
SELECT account_name, trans_date, amount
FROM transaction
WHERE account_name LIKE 'CAPITAL%';    


Don’ts
SELECT account_name, trans_date, amount
FROM transaction
WHERE SUBSTR(account_name,1,7) = 'CAPITAL';




Sequences
A sequence is often used to create unique integer identifiers as primary keys for tables. A sequence is a distinct database object and is accessed as sequence.NEXTVAL and sequence.CURRVAL.
Use sequences for unique integer identifiers; they are centralized, more controllable, more easily maintained, and perform better than other methods of counting.

Do’s  
INSERT INTO supplier (supplier_id, name, ticker)
      VALUES (supplier_seq.NEXTVAL,'A new supplier', 'TICK');    


 Don’ts
INSERT INTO supplier (supplier_id, name, ticker)
      VALUES ((SELECT MAX(supplier_id)+1
      FROM supplier), 'A new supplier', 'TICK');


    

Equi, Anti, and Range
Using an equals sign (equi) is the fastest comparison condition if a unique index exists. Any type of anti comparison such as != or NOT is looking for what is not in a table and thus must read the entire table; sometimes full index scans can be used.

Do’s   
SELECT account_name, trans_date, amount
FROM transaction
WHERE amount > 0;    SELECT account_name, trans_date, amount
FROM transaction
WHERE amount != 0;
 

Don’ts
SELECT account_name, trans_date, amount
FROM transaction
WHERE amount > 0;    SELECT account_name, trans_date, amount
FROM transaction
WHERE amount  NOT = 0;



LIKE Pattern MatchingIn general, since LIKE will match patterns which are in no way related to indexes, LIKE will usually read an entire table.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 23 May 2016

Tuning SQL Statement #3


Tuning SQL Statement #1.......
Tuning SQL Statement #2.......

Consider whether a UNION ALL will suffice in place of a UNION.
The UNION statement performs the equivalent of a SELECT DISTINCT on the final result set. In other words, the UNION clause forces all rows returned by each portion of the UNION to be sorted and merged and duplicate to be filtered before the first row is returned.  A UNION ALL simply returns all rows including duplicates and does not have to perform any sort, merge or filter.  If your tables are mutually exclusive (include no duplicate records), or you don't care if duplicates are returned, the UNION ALL is much more efficient as it does not perform the SELECT DISTINCT function, which saves a lot of unnecessary resources from being used.

Do’s    
SELECT acct_num, balance_amt
FROM debit_transactions
WHERE tran_date = '1-JAN-06' 
UNION ALL
SELECT acct_num, balance_amt
FROM credit_transactions
WHERE tran_date = '1-JAN-06' ;  

Don’ts
SELECT acct_num, balance_amt
FROM debit_transactions
WHERE tran_date = '1-JAN-06'
UNION
SELECT acct_num, balance_amt
FROM credit_transactions
WHERE tran_date = '1-JAN-06' ;



http://www.datagenx.net/2016/04/tuning-select-statement-2.html
The DECODE Function
Using the DECODE function as a replacement for multiple query set operators is good for performance but should only be used in extreme cases such as the UNION clause joined SQL statements as shown.

Do’s  
SELECT stock_id||' '||
      DECODE(SIGN(qtyonhand)
            ,-1,'Out of Stock',0,'Out of Stock'
            ,1,DECODE(SIGN(qtyonhand-min)
                   ,-1,'Under Stocked',0,'Stocked'
                   ,1,DECODE(sign(qtyonhand-max)
                        ,-1,'Stocked',0,'Stocked'
                          ,1,'Over Stocked'
                   )
            )
      ) FROM stock;  
    

Don’ts
SELECT stock_id||' Out of Stock' FROM stock WHERE
   qtyonhand <=0
UNION
SELECT stock_id||' Under Stocked' FROM stock
      WHERE qtyonhand BETWEEN 1 AND min-1
UNION
SELECT stock_id||' Stocked' FROM stock
WHERE qtyonhand BETWEEN min AND max
UNION
SELECT stock_id||' Over Stocked' FROM stock
      WHERE qtyonhand > max;


    

Datatype Conversions
Try to avoid using any type of data conversion function in any part of an SQL statement which could potentially match an index, especially if you are trying to assist performance by matching appropriate indexes.

Do’s  
SELECT ...
FROM customer
WHERE cust_id = TO_NUMBER('11');    SELECT ...
FROM customer
WHERE cust_id = '11';

Don’ts
SELECT ...
FROM customer
WHERE cust_type = 1;    SELECT ...
FROM emp
WHERE TO_NUMBER (cust_type) = 1;






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 27 April 2016

Tuning SQL Statement #2


Tuning SQL Statement #1.......
Tuning SQL Statement #3.......


•    The UNION statement performs the equivalent of a SELECT DISTINCT on the final result set. In other words, the UNION clause forces all rows returned by each portion of the UNION to be sorted and merged and duplicate to be filtered before the first row is returned.  A UNION ALL simply returns all rows including duplicates and does not have to perform any sort, merge or filter.  If your tables are mutually exclusive (include no duplicate records), or you don't care if duplicates are returned, the UNION ALL is much more efficient as it does not perform the SELECT DISTINCT function, which saves a lot of unnecessary resources from being used.

Do's-
SELECT acct_num, balance_amt FROM debit_transactions  WHERE tran_date = '1-JAN-06' 
UNION ALL
SELECT acct_num, balance_amt  FROM credit_transactions  WHERE tran_date = '1-JAN-06' ;  

Don'ts-
SELECT acct_num, balance_amt  FROM debit_transactions  WHERE tran_date = '1-JAN-06'
UNION
SELECT acct_num, balance_amt  FROM credit_transactions  WHERE tran_date = '1-JAN-06' ;



•    Try to avoid using any type of data conversion function in any part of an SQL statement which could potentially match an index, especially if you are trying to assist performance by matching appropriate indexes

Do's-
SELECT ...  FROM customer  

WHERE cust_id = TO_NUMBER('11');  

SELECT ...  FROM customer
WHERE cust_type = 1;    


Don'ts-
SELECT ...  FROM customer
WHERE cust_id = '11';

SELECT ...  FROM emp
WHERE TO_NUMBER (cust_type) = 1;



•    When using functions in SQL statements it is best to keep the functions away from any columns involving index matching unless function-based indexes are available and can be created.

Do's-
SELECT account_name, trans_date, amount  FROM transaction
WHERE account_name LIKE 'CAPITAL%';  

Don'ts-
SELECT account_name, trans_date, amount  FROM transaction
WHERE SUBSTR(account_name,1,7) = 'CAPITAL';



•    If your SELECT statement contains a HAVING clause, write your query so that the WHERE clause does most of the work (removing undesired rows) instead of the HAVING clause do the work of removing undesired rows. Using the WHERE clause appropriately can eliminate unnecessary rows before they get to the GROUP BY and HAVING clause, saving some unnecessary work, and boosting performance

Do's-
SELECT country, AVG (population)  FROM survey
WHERE country != 'INDIA'
AND country != 'CHINA';
GROUP BY country;  

Don'ts-
SELECT country, AVG (population)  FROM survey
GROUP BY country
HAVING country != ‘INDIA'
AND country != 'CHINA’;



•    The ORDER BY clause is always executed after the WHERE clause. Try not to override the WHERE clause with the ORDER BY clause because the Optimizer may choose a less efficient method of execution based on the ORDER BY clause. Try to put your where columns in order by clause if possible.

Do's-
SELECT * FROM ordersline
WHERE order_id < 10
ORDER BY order_id, seq#;  

Don'ts-
SELECT * FROM ordersline
WHERE order_id < 10
ORDER BY seq#;







Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 26 April 2016

Tuning SQL Statement #1



•    Try to use explicit columns and try to read columns in index orders if possible, even to the point of reading indexes and not tables.
•    Use a standard approach to table aliases.  If two identical SQL statements vary because an identical table has two different aliases, then the SQL is different and will not be shared.
•    Use table aliases and prefix all column names by their aliases when more than one table is involved in a query.  This reduces parse time AND prevents future syntax errors if someone adds a column to one of the tables with the same name as a column in another table.
•    Use bind variables.  The values of bind variables do not need to be the same for two statements to be considered identical.  Bind variables are not substituted until a statement has been successfully parsed.

i.e -
Do's     - Select FirstNme, MidName, LastName from atul.employee;
Don'ts  - Select * from atul.employee;



•    Try to match comparison condition column sequences with existing index column sequences, although it is not strictly necessary.
•    Always try to use unique, single-column indexes wherever possible. A single-column unique index is much more likely to produce exact hits. An exact hit is the fastest access method.
•    Try to avoid WHERE clauses that are NEGATIVE. Negative key words are in the WHERE clause, such as "IS NULL", "<>", "!=", "!>", "!<", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE"

i.e -
Do's     - Select FirstNme from atul.employee where DeptCd Like 'IN%';
Don'ts  - Select FirstNme from atul.employee where SubSTR(DeptCd, 1,2) = 'IN';

Do's     - Select Material from atul.store where quantity > 0;
Don'ts  - Select Material from atul.store where quantity <> 0;

Do's     - Select Material from atul.store where quantity > 3;
Don'ts  - Select Material from atul.store where quantity + 3 > 6;



Tuming SELECT Statement Continues...............




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 20 November 2015

5 Tips For Better DataStage Design #5



#1. Use the Data Set Management utility, which is available in the Tools menu of the DataStage Designer or the DataStage Manager, to examine the schema, look at row counts, and delete a Parallel Data Set. You can also view the data itself.

#2. Use Sort stages instead of Remove duplicate stages. Sort stage has got more grouping options and sort indicator options.

#3. for quick checking if DS job is running on Server or not, from UNIX
ps -ef | grep 'DSD.RUN'



#4. Make use of Order By clause when a DB stage is being used in join. The intention is to make use of Database power for sorting instead of Data Stage resources. Keep the join partitioning as Auto. Indicate don’t sort option between DB stage and join stage using sort stage when using order by clause.

#5. There are two types of variables - string and encrypted. If you create an encrypted environment variable it will appears as the string "*******" in the Administrator tool.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Wednesday, 9 September 2015

DataStage Terminology


 DataStage Term                     Description
DataStage Administrator Client program that is used to manage the project of DataStage.
DataStage server DataStage server engine component that links to DataStage client programs.
DataStage Director Execution and monitoring of the program that you want to use DataStage job.
DataStage Designer Graphical design tool that developers use to design and development of the DataStage job.
DataStage Manager Program used to manage the contents of the repository and see DataStage. Please refer to the "DataStage Manager Guide."
Stage Component that represents the data source for DataStage job, and process steps.
Source I pointed to the source of data extraction. STEPS, such as, for example.
Target Refers to the destination of the data. For example, point to the file to be loaded to AML (output by DataStage).
Category Category names used to categorize the jobs in DataStage.
Container Process of calling a common process.
Job Program that defines how to do data extraction, transformation, integration, and loading data into the target database.
Job templates And job processing is performed similar role model.
Job parameters Variables that are included in the job design. (For example,. Is the file name and password, for example)
Job sequence Control is a job for a start and run other jobs.
Scratch disk Disk space to store the data set, such as virtual record.
Table definition Definition that describes the required data includes information about the associated data tables and columns. Also referred to as metadata.
Partitioning For high-speed processing of large amounts of data, the mechanism of DataStage to perform split of the data.
Parallel engine Engine running on multiple nodes to control jobs DataStage.
Parallel jobs Available to parallel processing, the DataStage job type.
Project Jobs and a collection of components required to develop and run. The entry level to DataStage from the client. DataStage project must be licensed.
Metadata Data about the data. For example, a table definition that describes the columns that are building data.
Link Each stage of the job combined to form a data flow or reference lookup.
Routine Functions that are used in common.
Column definition Define the columns to be included in the data table. Contains the names and data types of the columns that make up the column.
Environmental parameters Variables that are included in the job design. For example, the file name and password.
DB2 stage Stage to be able to read and write to DB2 database.
FileSet stage Collection of files used to store the data.
Lookup Stage Performing processing such as a file reference to be used in DataStage or text file or table.
LookupFileSet Stage The process of storing a lookup table.
Sequential Stage Want to manipulate text files.
Custom Stage The process can not be implemented in stages, which is provided as standard DataStage, the process of implementing the language C.
Copy Stage The process of copying the data set.
Stage Generator The process of generating a dummy data set.
DataSet stage Data file to be used in the parallel engine.
Funnel Stage The process of copying the data from one set of multiple data sets.
Filter Stage Processing to extract records from the input data set.
Merge Stage Processing the join more than one input record.
LinkCorrector Stage Process of collecting the data that was previously divided.
RemoveDuplicate Stage The process of removing duplicate entries from the data set.
ChangeCapture Stage The process of comparing the two data sets, the difference between the two records.
RowImport Stage Process of importing a column from a string or binary column.
RowExport Stage Processing to export a column of another type string or binary column.
Transformer Stage Processing to edit item, or type conversion.
Modify Stage Process of conversion to the specified data type, the conversion of the value to the specified value NULL.
XML input stage Reading and processing of XML file, the extraction of the required element from the XML data.
Sort Stage Process to sort data in ascending or descending order.
Join Stage Processing the join more than one input record.
RowGenerator Stage Process of adding a line to an existing dataset
ColumnGenerator Stage Process of adding a column to an existing dataset
Aggregator stage The process of aggregating the data.
Pivot Stage The process of converting multiple columns into multiple rows of records.
Peek Stage Treatment to destroy the records.
Stream link Link that represents the flow of data.
Reference Links Input link that represents the reference data.
Reject link Link to output the data that do not meet the criteria you specify.
Integer type Data type that represents the integer value.
Decimal type Data type represents a number that contains the value of the decimal point.
NULL value Specific value indicating the unknown value. 0, not the same as a blank or empty sequence.
DataSet Collection of data.
SQLStatement Statement to manipulate the data in the table.
TWS Stands for Tivoli Workload Scheduler. The name of the product you want to create a job net.
Hash One way of dividing the data specified in the partitioning function of DataStage. Partitioning is performed by using a hash value.
Modulus One way of dividing the data specified in the partitioning function of DataStage. I do partitioning by dividing the number of partitions in the specified value.
Same One way of dividing the data specified in the partitioning function of DataStage. Processing as input partition without subdivision has been output by the previous stage.
Job Properties Property sheet to make settings for the job.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

 

Thursday, 27 August 2015

DB2 Column Compare Tool





            Whenever, we are working in Datawarehouse / Database (Oracle / MSSQL / DB2) projects, we often need to compare to tables column wise. When such requirement comes, we might end up in confusion putting the columns list in the excel sheet and comparing each column by column.

              Here’s procedure you can create in your local DB and run it feeding it with Tables names as input, and in one run you get three outputs: Matched columns, Table1 unmatched columns, table2 unmatched columns. You might need to give schema name of Table if there are two tables in SYSCOLUMNS with the same name.




You can modify or re-write the same procedure for your respective DBs.