Showing posts with label storage. Show all posts
Showing posts with label storage. Show all posts

Monday, 18 February 2019

MongoDB Index in Python - Simple Index


Like RDBMS Systems MongoDB also provide Indexes to improve it's performance to process the query quicker and return the resultset. Mongo supports different type of indexes such as SingleKey, Compound, MultiKey, PartialKey and Text Indexes. We will look into these ones one by one.

Starting with Simple Index or One Key Index which use only one key from the collection/document [quivalent as  Table/Row in RDBMS systems], Let's see how -


Mongo Shell Command:  db.<collectionName>.createIndex({<field>:<direction>})
pyMongo Command:      db.<collectionName>.create_index([(<field>, <direction>)

Let's analyze the impact of Index creation on Query Performance, first via mongo shell, second in python - 

In MongoShell:

In our example, we are taking the collection 'people' as an example which has the field 'last_name'

db.people.find({last_name:'Tucker'}).explain('executionStats')

The above command will generate the executions stats for a query where last_name == Tuker .


as the execution plan shows, mongoDB scanned the whole collection (total 50747 documents for fetching 65 records) to fetch the result which is costly when your collection is big.

Now, Creating a Simple Index or Single Key Index

db.people.createIndex({"last_name":1}) 



Now, querying again the same - 

db.people.find({last_name:'Tucker'}).explain('executionStats')


This time MongoDB finds that there is an Index available on last_name columns which has been used to fetch the result. It scanned only 65 index keys to fetch 65 records. 

Single Key Index can be used in below scenarios - 
   - Querying on the range of Indexed Key values
   - Querying on selected values of Indexed Key

Advantage:
  - Returned result will be sorted by Index Key, no need to put a sort operation if sorting on the index key
  - Index key can be used in any sort order - Ascending or Descending

Consideration while Designing Single Key Index:
  - Do not create Single Key Index on each field available on collections, it will slow down the performance of select and write query both.





Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group     


Monday, 14 November 2016

DataStage Partitioning #3



Best allocation of Partitions in DataStage for storage area

Srno
No of Ways
Volume of Data
Best way of Partition
Allocation of Configuration File (Node)
1
DB2 EEE  extraction in serial
Low
-
1
2
DB2 EEE extraction in parallel
High
Node number = current node (key)
64 (Depends on how many nodes are allocated)
3
Partition or Repartition in the Stages of DataStage
Any
Modulus (It should be single key that to integer)
Hash (Any number of keys with different data type)
8 (Depends on how many nodes are allocated for the job)
4
Writing into DB2
Any
DB2
-
5
Writing into Dataset
Any
Same
1,2,4,8,16,32,64 etc… (Based on the incoming records it writes into it.)
6
Writing into Sequential File
Low
-
1

 

Best allocation of Partitions in DataStage for each stage

S. No
Stage
Best way of Partition
Important points
1
Join
Left and Right link: Hash or Modulus
All the input links should be sorted based on the joining key and partitioned with higher key order.

  1.  
Lookup
Main link: Hash or same
Reference link: Entire
Both the links need not be in the sorted order

  1.  
Merge
Master and update link: Hash or Modulus
All the input links should be sorted based on the merging key and partitioned with higher key order. Pre-sort makes merge “lightweight” for memory.

  1.  
Remove Duplicate, Aggregator
Hash or Modulus
If the input link is in sorted order based on the key it will perform better.

  1.  
Sort
Hash or Modulus
Sorting happens after partitioning


Transformer, Funnel, Copy, Filter
Same
None
7
Change Capture
Left and Right link: Hash or Modulus
Both the input links should be in the sorted order based on the key and partitioned with higher key order.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/