Showing posts with label Code. Show all posts
Showing posts with label Code. Show all posts

Thursday, 28 September 2017

Mathematics in Markdown


From Wiki - 
Markdown is a lightweight markup language with plain text formatting syntax designed so that it can be converted to HTML and many other formats using a tool by the same name.[8] Markdown is often used to format readme files, for writing messages in online discussion forums, and to create rich text using a plain text editor(extensions - *.markdown , *.md ). website
Best thing of markdown files is you can convert the same into html without any issue. 



I've introduced with Markdown files when I have started to put my code on GitHub (https://github.com/atulsingh0). Started with little up n downs but after I've fallen for it, Its easy to write ReadMe or Math Equations files in markdown with little help.

In this tutorial, I have focused on Mathematics part only, for writing math formulas, Markdown is using LaTeX symbols for Greek letters, Brackets, Sign operator and lots of other symbols.

I have consolidated few of them and will add more,
Hoping, you will find it useful -  Direct Link

==








Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Saturday, 14 January 2017

Learning Numpy #2

Thursday, 12 January 2017

Learning Numpy #1


Numpy is a python library used for numerical calculations and this is better performant than pure python. In this notebook, I have shared some basics of Numpy and will share more in next few posts. I hope you find these useful.





Click Here for Next Tutorial ~

Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 11 January 2017

My Learning Path for Machine Learning


I am a Python Lover guy so my way includes lots of Python points. If you dont know the basics of this wonderful language, start it from HERE else you can follow the links which I am going to share.

Learning ML is not only studying ML algorithms, it includes Basic Algebra, Statistics, Algorithms, Programming and lot more. But no need to afraid as such :-) we need to start from somewhere.....

This is my github repo, you can fork it and follow me with these 2 links --

Fork Fork
Follow - Follow @atulsingh0
I am still updating this list and welcome you to update this as well.



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 6 January 2017

10 minutes with pandas library

Thursday, 5 January 2017

Learning Pandas #5 - read & write data from file

Wednesday, 4 January 2017

Learning Pandas #4 - Hierarchical Indexing

Sunday, 1 January 2017

Learning Pandas #3 - Working on Summary & MissingData

Saturday, 31 December 2016

Learning Pandas - DataFrame #2

Friday, 30 December 2016

Learning Pandas - Series #1

Wednesday, 28 December 2016

Learning Graphlab - SFrame #2

In last post Learning Graphlab - SFrame #1, we have learn basics of SFrame, like how to create, add or delete the columns in SFrame. In this post, we will revise it once again and learn some advance features of SFrame. Have a good learnng !!!

You can view the Jupyter Notebook for the same HERE




Wednesday, 30 November 2016

Learning Graphlab - SFrame #1


Hoping you guys went through the last post (Lnk -> Getting Started with Graphlab), In this post we will do some handson SFrame datatype of Graphlab which is same as dataframe of pandas python library.

i. Reading the CSV file
==
rdCSV

ii. save DataSet 
==

iii. load DataSet
==


iv. Check Total Rows and Columns
==
rowNum

v. Check Columns data type and Name
==
colTypes

vi. Add new column
==
addCol

vii. Delete column
==

viii. Rename column
==
renameCol

ix. Column Swapping (location)
==






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 16 October 2016

Jenkins with Windows #1


One of my team member is assigned to install and configure "jenkins" on our server so out of curiosity, I asked what is this now ?? but didn't get a satisfactory answer :-) so I thought of having my hand dirty with it. Here, I am sharing whatever I learn.

What is Jenkins:-
Wikipedia sources says,  Jenkins is an open source automation server written in Java. Jenkins helps automating the non-human part of the whole software development process, with now common things like Continuous Integration, but by further empowering teams to implement the technical part of a Continuous Delivery.

https://jenkins.io/

What is Continuous Integration & Continuous Delivery:-
CI is a process that most developers follow to keep their code base intact. It's mostly a common practice when you work in a group environment. For example, an analogy for this would be constructing a new home. There will be multiple contractors working on the site. So, if we have installed the window glasses and the paint person comes in and paints the house there are high chances that he will drop some paint on the glasses or end up breaking the glass. So, the inspector comes and checks it every day to see if something broke. The same process is applied for constructing a new code. CI system gathers all your code from different developers and makes sure it compiles and build fine. This is good. But, not complete. I will get to that once I complete talking about Jenkins.


Jenkins is the inspector in the analogy. Jenkins is nothing but a middle man between your code repo and your build server. It checks for changes on your server every few minutes. If it found them, it gathers them and sends them to your build server. That's what Jenkins is.

Basically Continuous Integration is the practice of running your tests on a non-developer machine automatically everytime someone pushes new code into the source repository.

This has the tremendous advantage of always knowing if all tests work and getting fast feedback. The fast feedback is important so you always know right after you broke the build (introduced changes that made either the compile/build cycle or the tests fail) what you did that failed and how to revert it.

If you only run your tests occasionally the problem is that a lot of code changes may have happened since the last time and it is rather hard to figure out which change introduced the problem. When it is run automatically on every push then it is always pretty obvious what and who introduced the problem.

Built on top of Continuous Integration are Continuous Deployment/Delivery where after a successful test run your instantly and automatically release the latest version of your codebase. Makes deployment a non-issue and helps you speed up your development.


                              Jenkins offers the following major features out of the box, and many more can be added through plugins:

Developer time is focused on work that matters — Much of the work of frequent integrations is handled by automated build and testing systems, meaning developer time isn't wasted on large-scale error-ridden integrations.

Software quality is improved — Any issues are detected and resolved almost immediately, keeping software in a state where it can be safely released at any time.

Faster Development - Integration costs are reduced both because serious integration issues are less likely and because much of the work of integration is automated.

Easy installation: Just run java -jar jenkins.war, deploy it in a servlet container. No additional install, no database. Prefer an installer or native package? We have those as well.

Easy configuration: Jenkins can be configured entirely from its friendly web GUI with extensive on-the-fly error checks and inline help.

Rich plugin ecosystem: Jenkins integrates with virtually every SCM or build tool that exists.

Extensibility: Most parts of Jenkins can be extended and modified, and it's easy to create new Jenkins plugins. This allows you to customize Jenkins to your needs.

Distributed builds: Jenkins can distribute build/test loads to multiple computers with different operating systems. Building software for OS X, Linux, and Windows? No problem.


Check out the part 2 for Installation.



Sources:
https://jenkins.io/ https://en.wikipedia.org/wiki/Jenkins_(software) http://stackoverflow.com https://www.quora.com



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 12 September 2016

Python Points #15 - Exceptions

Tuesday, 14 June 2016

Python Points #14 - Code a childhood game


Level : Intermediate

Try to code this famous childhood game played in india, known as "Raja, Mantri, Chor, Sipahi", in python by seeing the game output shared below -

https://twitter.com/datagenx


Little bit about Game:
Chits are made for Raja/King(100 points), Mantri/Minister(80 points),Chor/Thief(0 points) and Sipahi/Insprector(50 points). These chits are then thrown in the middle and 4 players pick one each. Raja/King then exclaims ‘Mera Manrti kaun?’ (Who is my minister?){In my game/code, King is so smart and asked directly to Mantri/Minister} Mantri/Minister responds and s/he is then asked to identify the Chor/Thief (Who stole my Queen's neckless ). If he guesses correctly then the points are retained if s/he is incorrect that he has to surrender the points to the Chor/Thief. The player with highest point wins in the end.



http://www.datagenx.net/search/label/Python?&max-results=9






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 28 March 2016

10 Scenario based Interview Questions #1 - DataStage


1. Design a job which convert single source row to three target row.
2. Design a job which can identify the row if they are duplicate in input.
3. Design a job which will fetch the input file header and footer.
4. Design a job which will segregate unique and duplicate records in different files.
5. Design a job which remove the header from the input file.

For more ->  DataStage Scenario
6. Design a job which remove the footer from the input file.
7. Design a job which throw a mail if footer is not there in input file.
8. Design a job which extract the alternate records from the input file.
9. Design a job which extract the Nth row from the input file
10. Design a job which extract data from two input files and load them in alternate in target.


For more ->  DataStage Scenario

Tuesday, 1 March 2016

ETL Development Standards



These development standards provide consistency in the artifacts the ETL developers create.  This consistency improves testing, operational support, maintenance, and performance.  

Code Development Guidelines


1 Commenting in Code and Objects

As a primary guideline where it is possible and does not interfere with the operation of the applications, all code must contain a developer comment/note.  
All ETL jobs must have a proper annotation (short description of the functionality of the job).
The target output files (.csv files) should not contain any leading or trailing spaces.
While deciding record level delimiter, “Delimiter Collision” issue needs to be considered. No such delimiter should be used as a field defaults that is present as a part of data.

2 ETL Naming Standards

The standardized naming conventions ease the burden on developers switching from one project to another.  Knowing the names and where things are located are very useful to understand before the occurrence of the design and development phases.

The following table identifies DataStage elements and their standard naming convention.


2.1 Job and Properties Naming Conventions

GUI Component Entity Convention
Designer Parallel Job <<Application>>_<<job_Name>>_JOB
Designer Sequence  <<Application>>_<<job_Name>>_SEQ
Designer Server Job  <<Application>>_<<job_Name>>_SVR
Designer Parameter  <<Application>>_<<job_Name>>_PARM

2.2 Job Processing Stage Naming Conventions

GUI Component Entity Convention
Designer Aggregator  AGG_<<PrimaryFunction>>
Designer Copy  CP_<<PrimaryFunction>>
Designer Filter  FLT_<<PrimaryFunction>>
Designer Funnel  FNL_<<PrimaryFunction>>
Designer Join (Inner)  JN_<<PrimaryFunction>>
Designer FTP Enterprise FTP_<<PrimaryFunction>>
Designer Lookup  LKP_<< Value Name or table Name>>
Designer Merge  MRG_<<PrimaryFunction>>
Designer Modify  MOD_<<PrimaryFunction>>
Designer Sort  SRT_<<PrimaryFunction>>

2.3 Links Naming Conventions

GUI Component Entity Convention
Designer Reference (Lookup)  Lnk_Ref_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Reject (Lookup, File, DB)  Lnk_Rej_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Input  Lnk_In_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Output  Lnk_Out_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Delete  Lnk_Del_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Insert  Lnk_Ins_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Update  Lnk_Upd_<<Number or Additional descriptor, if needed to form a unique object name>>

2.4 Data Store Naming Conventions:

In the case of a data store, the class word refers to the type of data store (e.g. Dataset, Sequential File, Table, View, and so forth).

GUI Component Entity Convention
Designer Database  DB_<<DatabasName>>
Designer Table  TBL_<<TableName>>
Designer View  VIEW_<<ViewName>>
Designer Dimension  DM_<<TableName>>
Designer Fact  TRAN_<<TableName>>
Designer Source SRC_<<Table or Object Name>>
Designer  Target  TRGT_<<Table or objectName>>

2.5 File Stage Naming Conventions:

GUI Component Entity Convention
Designer Sequential File  SEQ_
Designer Complex Flat File  CFF_
Designer Parallel dataset  DS_







Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 7 February 2016

Python Points #10b - Reading Files

Wednesday, 20 January 2016

Using your python library



How to use your own written functions or routines when code in python??


Follow below steps - 
1. Create a folder where you can put all reusable code/function/routines files
2. Let's say it is "routines"
3. Now suppose, you have written all your functions and save it in a file myfunc.py and saved it in routines folder
4. Add routines folder path to your windows path or linux path
for linux:
Edit your .bash_profile (typically towards the end)to the following line
export PYTHON_PATH=$PYTHON_PATH:'/path/to/folder/'   #if you have this variable
export PATH=$PATH:'/path/to/folder/'   #else use  this line
where you put the correct path in the appropriate location




How to use your function in your code:-
1.  import that module into your python script session with a command like
import myfunc

2.  for using a function "my_sqrt" from your library myfunc
x = myfunc.my_sqrt(val)

3.  you can create alias for your library also
import myfunc as mf
x = mf.my_sqrt(val)

If want to import a particular peice from library, use as
from myfunc import my_sqrt
x = my_sqrt(val)

this is tedious if you have to import multiple so use as
from myfunc import *
x = my_sqrt(val)

But, remember if you are import multiple library and they having same function name.
While using, you need to call them as step 2





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Wednesday, 16 December 2015

Python Regular Expression quick guide





^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times (non-greedy)
+ Repeats a character one or more times
+? Repeats a character one or more times (non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ]  Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx