Showing posts with label sorting. Show all posts
Showing posts with label sorting. Show all posts

Thursday, 16 June 2016

5 Tips For Better DataStage Design #14



1. The use of Lookup stage depends upon the volume of data.Sparse lookup type should be used when primary input data volume is small.If the reference data volume is more, Lookup Stage should be avoided.

2. Use of ORDER BY clause in the database is good as compared to use of sort stage.



3. In Dtatastage Administrator, Tuned the 'Project Tunable' for better performance.

4. For Funnel, the use of this stage reduces the performance of a job. Funnel Stage should be run in continuous mode.

5. If the hash file is used only for lookup then "enable Preload to memory". This will improve the performance.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 24 January 2016

ps command #4 - Sorting with sort command



We can sort the ps command output by unix sort also which is easy to use. Need to pass ps command output to sort command with proper argument and Volla !! You will get the output as you want.

Let's see how this is work [ sort command arguement can differ per your linux flavour and version ]
 I am using - CentOS 6.3


1. Display the top CPU consuming process (Column 3 - %CPU)
$ ps aux | head -1; ps aux | sort -k3 -nr |grep -v 'USER'| head
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND 
atul     21210  2.0  0.1 110232  1140 pts/3    R+   00:13   0:00 ps aux
hduser    2671  0.8  4.1 960428 42436 pts/1    Sl+  Aug22   5:29 mongod
root      1447  0.2  0.3 185112  3384 ?        Sl   Aug22   1:36 /usr/sbin/vmtoolsd
atul      2478  0.2  2.1 448120 21876 ?        Sl   Aug22   1:51 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
rtkit     2359  0.1  0.1 168448  1204 ?        SNl  Aug22   0:44 /usr/libexec/rtkit-daemon
root         7  0.1  0.0      0     0 ?        S    Aug22   0:53 [events/0]
root      2204  0.1  4.3 147500 43872 tty1     Ss+  Aug22   0:45 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-wEmBs1/database -nolisten tcp vt1
root       920  0.0  0.0      0     0 ?        S    Aug22   0:00 [bluetooth]
root         9  0.0  0.0      0     0 ?        S    Aug22   0:00 [khelper]
root         8  0.0  0.0      0     0 ?        S    Aug22   0:00 [cgroup] 

For my linux sort command arguements are --
-kn  ==> This use to select the column n, such as for column 4, -k4
-n   ==> column is numeric
-r   ==> reverse order

sort -k3 -nr ==> sort the third column of output in numeric reverse sort (largest to smallest)

2. Display the top 10 memory consuming process (Column 4 - %MEM)
$ ps aux | head -1; ps aux | sort -k4 -nr |grep -v 'USER'| head
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND 
root      2204  0.1  4.3 147500 43872 tty1     Ss+  Aug22   0:46 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-wEmBs1/database -nolisten tcp vt1
hduser    2671  0.8  4.1 960428 42436 pts/1    Sl+  Aug22   5:32 mongod
atul      2458  0.0  2.3 943204 23624 ?        S    Aug22   0:16 nautilus
atul      2516  0.0  2.2 275280 22316 ?        Ss   Aug22   0:06 gnome-screensaver
atul      2478  0.2  2.1 448120 21876 ?        Sl   Aug22   1:52 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
atul      2507  0.0  1.6 321388 16680 ?        S    Aug22   0:01 python /usr/share/system-config-printer/applet.py
atul      2589  0.0  1.4 292556 14600 ?        Sl   Aug22   0:14 gnome-terminal
atul      2536  0.0  1.3 395832 13372 ?        S    Aug22   0:00 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22
atul      2502  0.0  1.3 474620 13952 ?        Sl   Aug22   0:01 gpk-update-icon
atul      2537  0.0  1.2 459964 12736 ?        S    Aug22   0:10 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=34
 




3. Display the process by time (Column 4 - TIME)
$ ps vx | head -1; ps vx | sort -k4 -r| grep -v 'PID' | head
PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND 2478 ?        Sl     1:52    351   593 447526 21876  2.1 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
 2458 ?        S      0:16    228  1763 941440 23624  2.3 nautilus
 2589 ?        Sl     0:15     28   296 292259 14600  1.4 gnome-terminal
 2421 ?        Ssl    0:14     22    34 500541 9676  0.9 /usr/libexec/gnome-settings-daemon
 2479 ?        S      0:13     23   403 310472 11996  1.1 nm-applet --sm-disable
 2537 ?        S      0:10     37   168 459795 12736  1.2 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=34
 2444 ?        Ssl    0:10     25    64 445791 4872  0.4 /usr/bin/pulseaudio --start --log-target=syslog
 2445 ?        S      0:07     51   593 322206 12684  1.2 gnome-panel
 2516 ?        Ss     0:06      4   151 275128 22316  2.2 gnome-screensaver
 2522 ?        Sl     0:05      5    41 231870 1960  0.1 /usr/libexec/gvfs-afc-volume-monitor

 


4. Display the top 10 real memory usage process (Column 8 - RSS)
$ ps vx | head -1; ps vx | sort -k8 -nr| grep -v 'PID' | head 
PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 2458 ?        S      0:16    228  1763 941440 23624  2.3 nautilus
 2516 ?        Ss     0:06      4   151 275128 22316  2.2 gnome-screensaver
 2478 ?        Sl     1:52    351   593 447526 21876  2.1 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
 2507 ?        S      0:01     73     2 321385 16680  1.6 python /usr/share/system-config-printer/applet.py
 2589 ?        Sl     0:15     28   296 292259 14600  1.4 gnome-terminal
 2502 ?        Sl     0:01     29   257 474362 13952  1.3 gpk-update-icon
 2536 ?        S      0:00     92  1607 394224 13372  1.3 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22
 2537 ?        S      0:10     37   168 459795 12744  1.2 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=34
 2445 ?        S      0:07     51   593 322206 12684  1.2 gnome-panel
 2438 ?        Sl     0:03     30   542 433105 12512  1.2 metacity


Like above examples you can create so many one liners for you. But before using anyone of above one command, check your ps and sort command behavior then use them.
Mostly, every other shell has its own argument for ps and sort but basics are same. For sorting any command output by particular column first understand that output/column and then use sort commnd.

I hope, you find this helpful. Keep Learning !!





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 8 January 2016

5 Tips For Better DataStage Design #7



#1. In case the partition type for the next immediate stage is to be changed then the ‘Propagate partition’ should be set to ‘Clear’ in the current stage.



#2. Make sure that appropriate partitioning and sorting are used in the stages, where ever possible. This enhances the performances. Make sure that you understand the partitioning being used. Otherwise leave it auto.

#3. For fixed width files, final delimiter should be set to 'none' in the file format property.

#4. If any processing stage requires a key ( like remove duplicate, merge, join, etc ) the Keys, sorting keys and Partitioning keys should be same and in the same order

#5. To improve Funnel, all the input links must be hash partitioned on the sort keys.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx