Friday, 9 August 2019

Setup Aliases for Docker


While working on Docker in day to day work I realized that it's little difficult to remember each command which is required to work on docker container so I have started to build my docker alias command list.

I have tried to use minimum no of characters in alias command with a hint of command and every alias start with letter 'd' so that it can be easily maintained when managing multiple alias for different tools.

Create a bashrc file if not exist or append below alias list in you bashrc and set this in your user profile so that when you login on terminal, all the aliases will be available to use.




########################################################
# Docker Alias
########################################################
alias dc='docker ps'
alias dcommit='docker commit'
alias dcopy='docker cp'
alias dcs='docker ps -as'
alias dhist='_() { docker history "$1" --format "{{.ID}}: {{.CreatedBy}}" --no-trunc };_'
alias di='docker images'
alias dinfo='docker info'
alias dlog='docker logs --follow'
alias dlogin='_(){ docker exec -it "$1" /bin/bash ;};_'
alias dloginu='_(){ docker exec -it -u "$1" "$2" /bin/bash ;};_'
alias dlogn='_(){ docker logs -f `docker ps | grep $1 | awk "{print $1}"` ;};_'
alias dlogt='docker logs --tail 100'
alias dp='docker system prune'
alias dpush='_(){ docker push "$1" ;};_'
alias drmac='docker rm `docker ps -a -q`'
alias drmc='_(){ docker rm "$@" ;};_'
alias drmcc='docker rm $(docker ps -qa --no-trunc --filter "status=created")'
alias drmdc='docker rm $(docker ps -qa --no-trunc --filter "status=exited")'
alias drmdi='docker rmi $(docker images --filter "dangling=true" -q --no-trunc)'
alias drmdn='docker network rm'
alias drmdv='docker volume rm $(docker volume ls -qf dangling=true)'
alias drmi='_(){ docker rmi "$@" ;};_'
alias drmv='docker rm volume'
alias drun='_(){ docker run -d --name "$1" -it --detach "$1" /bin/bash; };_'
alias dstats='docker stats'
alias dstop='_(){ docker stop "$@" ;};_'
alias dtag='_(){ docker tag "$1" "$2";};_'
alias dvol='docker volume ls'
alias dvoli='docker volume inspect'
alias dclean=' \
docker ps --no-trunc -aqf "status=exited" | xargs docker rm ; \
docker images --no-trunc -aqf "dangling=true" | xargs docker rmi ; \
docker volume ls -qf "dangling=true" | xargs docker volume rm'




Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Telegram Group     


Tuesday, 6 August 2019

Setup HTTP/S Proxy for Docker


When working with Docker behind the Proxy firewall, docker is unable to communicate to public docker repo (dockerhub) to download or install any dependency for your docker scripts. This you can resolve by altering or creating few config files for docker. Let's see How --

1. In command line:

Use below environment variable in docker command as argument to enable proxy for it

# In Command Line:
--env HTTP_PROXY="http://<user>:<password>@<proxy_server>:<port>"
--env HTTPS_PROXY="http://<user>:<password>@<proxy_server>:<port>"
--env ALL_PROXY="http://<user>:<password>@<proxy_server>:<port>"

 

2. In Docker File:

You can add proxy setting in your docker file if don't have access to change at system level. Just add below line in your docker script.

# In DockerFile
ENV HTTP_PROXY "http://<user>:<password>@<proxy_server>:<port>"
ENV HTTPS_PROXY "http://<user>:<password>@<proxy_server>:<port>"
ENV ALL_PROXY "http://<user>:<password>@<proxy_server>:<port>"



3. Using config.json (Client level) file:

Or you can create user level config file for docker environment if not exist. These settings will be available to your docker script by default until override by other means. 

# Create/Edit ~/.docker/config.json file (All below json in this file, create file if not exists)
 

{
 "proxies":
 {
   "default":
   {
     "httpProxy": "http://<user>:<password>@<proxy_server>:<port>",
     "httpsProxy": "http://<user>:<password>@<proxy_server>:<port>",
     "allProxy": "http://<user>:<password>@<proxy_server>:<port>"
   }
 }
}



4. Using docker system level config change (Docker Server level):

You can change the docker server level proxy change to use it as below: 


# Altering Docker Syetem File (you have to be root user for this)
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo vi /etc/systemd/system/docker.service.d/http-proxy.conf
 
#(add below content to this file )
[Service]
Environment="HTTP_PROXY=http://<user>:<password>@<proxy_server>:<port>"
Environment="HTTPS_PROXY=http://<user>:<password>@<proxy_server>:<port>"
Environment="ALL_PROXY=http://<user>:<password>@<proxy_server>:<port>"

# now run below command to restart the docker daemon
sudo systemctl daemon-reload
sudo systemctl restart docker

# verify
sudo systemctl show --property=Environment docker





Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Telegram Group     


Wednesday, 31 July 2019

Frequency Distribution #2 - #UnlockStats

Starting from the point where we left off - Frequency Distribution #1 - #UnlockStats
Below is the table for 100 students and their heights categories -
Height (in)No of Students
60-625
63-6518
66-6838
69-7131
72-748
total100


Histogram:

It consists of a set of rectangles having based on a horizontal axis with center at the class mark and width equal to the class intervals size and length proportional to class frequency.

The histogram shows how the data is distributed, In our example, the width is 3 of each category and left-skewed. Most of the data is left side of the histogram









Frequency Polygon:

A Frequency Polygon is line graph of the class frequencies plotted against class marks = ( UCL + LCL ) / 2
It can be obtained by connecting the midpoint of the tops of the rectangles in the histogram.



























Box Plots:


A box plot shows a box which contains the middle 50% of data values, It also shows two whiskers that extend from the box to maximum and minimum value.

Relative Frequency Distribution:

The Relative Frequency of a class is the frequency of the class divided by total frequency of all the classes (total no of data points) and expressed in percentage.
Height (in)Relative Frequency Distribution (%)
60-625
63-6518
66-6838
69-7131
72-748

Cumulative Frequency Distribution: 

The total frequency of all values less than the upper-class boundary of a given class interval is called the cumulative frequency up to and including that class interval. 
Height (in)No of StudentsCum. Freq. Distribution
60-62 ( <=62) 5 5
63-65 (<=65)185+18 = 23
66-68 (<=68)3823 + 38 = 16
69-71 (<=71)3161 + 31 = 92
72-74 (<=74)8 92 + 8 = 100

A line plot between Upper-Class Boundary and Cum. Frequency is called Cum Freq Distribution polygon or ogive.

Cumulative Relative Frequency Distribution:

Height (in)No of StudentsCum. Rel. Freq. Distribution (%)
 <=6255
<=651823
<=683816
<=7131 92
<=748100

23% of the students have less than or equal to 65 inches.

Types of Frequency Curves:

a. Symmetrical or bell curves are characterized by the fact that observations equidistance from the central maximum has the same frequency.
b. Curves that have tails to the left are said to be skewed to the left.
c. Curves that have tails to the right are said to be skewed to the right.
d. Curves that have approx equal frequencies across their values are said to be uniformly distributed.
e. J-shaped or reverse J-shaped frequency curve the maximum occurs at one end or the other.
f. A U-shaped curve has maxima at both end and minimum in between.
g. A bimodal frequency curve has two maxima.
h. A multimodal frequency curve has more than 2 maxima. 



Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Telegram Group     


Tuesday, 30 July 2019

Frequency Distribution #1 - #UnlockStats


Raw Data:

Raw data are collected data that have not been organized in any way.

Array:

An array is a list of raw numerical data in ascending or descending order of magnitude.

Frequency Distribution:

When summarizing large no of data, we categorized them into classes or categories and no of individuals belongs to each class is called the Class Frequency.
           A tabular arrangement of data by classes or categories with class frequency is called Frequency Distribution.

Example:

Below is the table for 100 students and their heights categories -

Height (in) No of Students
60-62 5
63-65 18
66-68 38
69-71 31
72-74 8
total 100

Class Intervals and Class Limits:

A symbol defining a class is called Class Intervals such as 63-65, also called Closed Class Intervals as Class has end numbers.
The end no of the class is called Class Limits such as 66 and 68 where 66 is Lower Class Limit and 68 is Upper-Class Limit.  If Class has either no upper class nor no lower class is called an Open Class Intervals such as category 65+years.

Class Boundaries:

Class Boundaries can be defined by adding upper-class limit if a category to lower class limit of the next category by 2.

Upper-Class Boundary (n) - { UCL(n) + LCL(n+1) } / 2

For 63-65 category, 65.5 { (65+66)/2 } is upper-class limit and 62.5 { (62+63)/2 } is lower class limit.


Size/Width of a Class Interval:

The difference between the lower and upper-class limit is called size or width of a Class Interval.
such as -
For 63-65 category, Width is - 65.5 - 62.5 = 3

The Class Mark:

The Class Mark is mid-point of a Class interval and can be calculated as below - 

Class Mark (n) - { UCL(n) + LCL(n) } / 2

For 63-65 category, Class Mark is - (63 + 65) / 2 = 64


Histogram and Frequency Polygon are two graphic representation of frequency distribution. We will discuss this more in the next post.

Till then, Happy Learning.........





Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group     


Sunday, 28 July 2019

Let's #UnlockStats - Extending #UnlockAI


I've been asked so many times how to start with Machine Learning or which course I have to join to learn it?  But I have always the same reply - Machine Learning is a journey where you have to travel with a good friend (i.e. - Python, R, Java.....), face some obstacles (i.e. - Mathematical and Statistical Concepts...) and make more new friends there (i.e. lot of other things which you required to understand this ML world - Problem & its Domain, Algorithms, Comparison, tiring testing). So in simple words, ML is not a destination where you have to reach but its a journey which you have to live. Little dramatic... isn't it :-)
     
Anyway, Starting with very basic stats or statistics which you should be aware of with other ML things. Hoping, you will like it... Please comment if you have any query or request.


Use #UnlockStats to fetch all Stats post and #UnlockAI for all AI/ML posts.





Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group     


Friday, 12 July 2019

VM v/s Containers


Continuing the last post "containerization - what & why", In this post, we will compare the VM & Containers advantages, which one will server us better in which scenario ?


Portability:
VMs are more portable than Containers because VM carry its os with itself hence it can run on any host od where Containers can only run on the os for which it is built.

Size:
VMs are heavier than Containers due to guest os size within VM.

Maintenance Ease:
Containers are easy to maintain as with it Developers only have to deal with the application code & its dependencies, underlying os will be maintained by the os team where VMs are heavier and tightly coupled with application which makes it more difficult to manager.

Replication:
Containers can be replicated faster than VM and consume less resources on host os.

Kickoff time:
Containers are taking less time to kick off than VM as containers are using host os memory and resources which is already up and running where VM first have to bring up its guest os and then application.

Security:
VM are more secure than Containers as Containers shares lot of things with host os and it can be fatal if missed a back door access to your container. But with evolving kubernetes it can be made more secure for use.

Let me know in comment what next you want to discuss about Containerization ?? till then Happy Learning !!



Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group     


Monday, 3 June 2019

Containerization - What & Why ??


Containerization, always a word which describe to hold something, literally taken from the world of freight transportation which allows to put lot of different product/item into one box and move around the world without worries of damages. Quite a definition :) Isn't it ?
                 In simple word, or I say in IT term, Containerization is a process or a way which allows user to have a sandbox environment with required software specific to versions which you can flush whenever you are done with it and re-instantiate it when needed.

Now questions comes then what is the difference between a Virtual Machine and Container which we are going to discuss next.

In last decade, Virtual Machines (VM) allow IT giants/users to have one physical machine and host different application and its variants in VMs which shares the resources from host machine. But this comes with a small price, the bottleneck of resources shared. Your physical machine limit to host VMs is totally depends on its resources such as storage, processing power or memory cause VM requires these as contains the guest os and application with its dependencies. Guest OS itself eats lots of host storage & memory and required to be patched on timely manner to support your application.

VM stack is somewhat look like below -

https://www.datagenx.net/2019/02/lets-learn-git-pull-specific-folder.html

Containerization has removed guest OS dependency and uses Host machine and OS which substantially reduce the size of container as well as resources consumption which brings lot more pros over virtual machine. Containerization stack is as below -

https://www.datagenx.net/2019/02/mongodb-index-in-python-simple-index.html
In next post, we will discuss about pros and cons of VM and Containerization.




Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group     


Monday, 18 February 2019

MongoDB Index in Python - Simple Index


Like RDBMS Systems MongoDB also provide Indexes to improve it's performance to process the query quicker and return the resultset. Mongo supports different type of indexes such as SingleKey, Compound, MultiKey, PartialKey and Text Indexes. We will look into these ones one by one.

Starting with Simple Index or One Key Index which use only one key from the collection/document [quivalent as  Table/Row in RDBMS systems], Let's see how -


Mongo Shell Command:  db.<collectionName>.createIndex({<field>:<direction>})
pyMongo Command:      db.<collectionName>.create_index([(<field>, <direction>)

Let's analyze the impact of Index creation on Query Performance, first via mongo shell, second in python - 

In MongoShell:

In our example, we are taking the collection 'people' as an example which has the field 'last_name'

db.people.find({last_name:'Tucker'}).explain('executionStats')

The above command will generate the executions stats for a query where last_name == Tuker .


as the execution plan shows, mongoDB scanned the whole collection (total 50747 documents for fetching 65 records) to fetch the result which is costly when your collection is big.

Now, Creating a Simple Index or Single Key Index

db.people.createIndex({"last_name":1}) 



Now, querying again the same - 

db.people.find({last_name:'Tucker'}).explain('executionStats')


This time MongoDB finds that there is an Index available on last_name columns which has been used to fetch the result. It scanned only 65 index keys to fetch 65 records. 

Single Key Index can be used in below scenarios - 
   - Querying on the range of Indexed Key values
   - Querying on selected values of Indexed Key

Advantage:
  - Returned result will be sorted by Index Key, no need to put a sort operation if sorting on the index key
  - Index key can be used in any sort order - Ascending or Descending

Consideration while Designing Single Key Index:
  - Do not create Single Key Index on each field available on collections, it will slow down the performance of select and write query both.





Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group     


Friday, 1 February 2019

Let's Learn - Git - Pull Specific Folder - sparsecheckout


What if your git repository has lots of folders but you have to work on a specific file in a particular folder. This git feature is called Sparse checkout. Previous Versions of git doesn't support this feature which forces you to download the whole repository. Sometime repository is too big to download and time-consuming process.

Current git versions support Sparse checkout which allows you to clone or fetch only a particular folder from a very big repository. Let's see how we can achieve it.

Task - Need to sync a folder named 'other' from 'DataGenX' repository 

Step #1: Initialize the Repository
Create a folder where you want to sync your git repo folder and Initialize git



Step #2: Add the Remote Repository
Add the remote Git repository with this local git repo as below -



Step #3: Create and Checkout a branch [Optional Step]
Creating of the branch is a totally optional step but it is advisable to create.



Step #4: Enable the Sparse Checkout Properties
Now, we have to enable the Sparse checkout properties and adding the folder name (in our case - 'other') in property file which we want to sync.



Step #5: Pull the Specific Folder
This is the last step where we pull the specific folder as below -

git pull <remote> <pull_branch_name>  #not locally created


while running this command, need to give proper branch name from where you want to pull the data, In our case, it is master.

Step #6: List and work with synced directory



Commands as below - 
==




Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group     


Friday, 4 January 2019

How to Query on MongoDB by _id - INTERMEDIATE II


What if we have to query on MongoDB collections based on the "_id" field, Can we really query on "_id" field ? If so, what is the syntax ? Let's try this out -

Let's first fetch a document's id -

$ db.user.find_one()

{'_id': ObjectId('5c16e863817810ed3fc5e5f9'),
 'Fname': 'atul',
 'Lname': 'Singh',
 'Grade': 12.0,
 'College': 'SGM',
 'Job': 'Student',
 'Address': 'Young St.'}



Now, we will pick the object id and use this to fetch the same document from collection

$ db.user.find_one({'_id':'5c16e863817810ed3fc5e5f9'}) #this will return nothing

$ db.user.find_one({'_id':ObjectId('5c16e863817810ed3fc5e5f9')})
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
 in 
----> 1 db.user.find_one({'_id':ObjectId('5c16e863817810ed3fc5e5f9')})

NameError: name 'ObjectId' is not defined


We have received this error because ObjectId is not the same as its string representation, it must be converted to ObjectId from a string before it is passed to find command.

$ from bson.objectid import ObjectId
$ db.user.find_one({'_id':ObjectId('5c16e863817810ed3fc5e5f9')})

{'_id': ObjectId('5c16e863817810ed3fc5e5f9'),
 'Fname': 'atul',
 'Lname': 'Singh',
 'Grade': 12.0,
 'College': 'SGM',
 'Job': 'Student',
 'Address': 'Young St.'}



IPYTHON Notebook can be found HERE








Like the below page to get the update  
Facebook Page      Facebook Group      Twitter Feed      Google+ Feed      Telegram Group