Test Blog on Smartphone

Posted on May 21, 2020 by go2analytics • Posted in Uncategorized • Leave a comment

MathSpace Poter V0

Posted on January 4, 2017 by go2analytics • Posted in Uncategorized • Leave a comment

woe

lasso-choosing-lambda

Python function parameters

Posted on November 30, 2016 by go2analytics • Posted in Uncategorized • Leave a comment

http://pythoncentral.io/fun-with-python-function-parameters/

Python – List & Operations

Posted on September 30, 2016 by go2analytics • Posted in list, Python • Tagged list • Leave a comment

List append() Method

aList = [123, 'xyz', 'zara', 'abc'];
aList.append( 2009 );
print "Updated List : ", aList

result:

Updated List :  [123, 'xyz', 'zara', 'abc', 2009]

List – reciprocal

Example:
FalseTrueRatio = range(1,26)
TrueFalseRatio = [1.0/x for x in FalseTrueRatio]
TrueFalseRatio

Ref: http://stackoverflow.com/questions/8244915/how-do-you-divide-each-element-in-a-list-by-an-int

result:

[1.0,  0.5,  0.3333333333333333,  0.25,  0.2, . . .]

Finding index of an item closest to the value in a list

Example:
# given pcnList as a list; x_val as a given value
min(range(len(pcnList)), key=lambda i: abs(pcnList[i]-x_val))

Ref: http://stackoverflow.com/questions/9706041/finding-index-of-an-item-closest-to-the-value-in-a-list-thats-not-entirely-sort

find nearest value in numpy array

Other List functions

updating list
deleting list — del list1[2];
Basic operations

Python Epression	Results	Description
len([1, 2, 3])	3	Length
[1, 2, 3] + [4, 5, 6]	[1, 2, 3, 4, 5, 6]	Concatenation
[‘Hi!’] * 4	[‘Hi!’, ‘Hi!’, ‘Hi!’, ‘Hi!’]	Repetition
3 in [1, 2, 3]	True	Membership
for x in [1, 2, 3]: print x,	1 2 3	Iteration

Indexing, Slicing, and Matrixes

Example:

L = ['spam', 'Spam', 'SPAM!']

Python Expression	Results	Description
L[2]	‘SPAM!’	Offsets start at zero
L[-2]	‘Spam’	Negative: count from the right
L[1:]	[‘Spam’, ‘SPAM!’]	Slicing fetches sections

Built-in List Functions & Methods:

Function with Description

cmp(list1, list2)

Compares elements of both lists.

len(list)

Gives the total length of the list.

max(list)

Returns item from the list with max value.

min(list)

Returns item from the list with min value.

list(seq)

Converts a tuple into list.

Methods with Description

list.append(obj)

Appends object obj to list

list.count(obj)

Returns count of how many times obj occurs in list

list.extend(seq)

Appends the contents of seq to list

list.index(obj)

Returns the lowest index in list that obj appears

list.insert(index, obj)

Inserts object obj into list at offset index

list.pop(obj=list[-1])

Removes and returns last object or obj from list

list.remove(obj)

Removes object obj from list

list.reverse()

Reverses objects of list in place

list.sort([func])

Sorts objects of list, use compare func if given

Python – string & substring

Posted on September 30, 2016 by go2analytics • Posted in Python, Uncategorized • Tagged string • Leave a comment

Get a string after a specific substring

Two ways:
The easiest way is probably just to split on your target word

# 1st way -- using split
my_string="hello python world , i'm a beginner "
print my_string.split("world",1)[1]
my_string.split("world",1)

# split generates two-element list i.e. [0],[1] -- before/after parameter 1
['hello python ', " , i'm a beginner "]

# 2nd way -- using index
s1 = "hello python world , i'm a beginner "
s2 = "world"
print s1[s1.index(s2) + len(s2):]

, i'm a beginner

print s1[s1.index(s2):]

"world , i'm a beginner "

http://stackoverflow.com/questions/12572362/get-a-string-after-a-specific-substring

Python – combine columns into data frame

Posted on September 30, 2016 by go2analytics • Posted in dataframe, Python, Uncategorized • Leave a comment

Combine lists into data frame

lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
percentile_list = pd.DataFrame({'lst1Tite' : lst1,
 'lst2Tite' : lst2,
 'lst3Tite' : lst3
  })
# you don't need to specify the column names when you're creating a dataframe 
# from a dict like this

percentile_list
    lst1Tite  lst2Tite  lst3Tite
0          0         0         0
1          1         1         1
2          2         2         2
3          3         3         3
4          4         4         4
5          5         5         5
6          6         6         6
...

Combine multiple data frame into one

# merge three data frames (myDF, myDF2, myDF3), using common columns
finalCutDF = myDF.merge(myDF2, on=['FalseTrueRatio', 'TrueFalseRatio']).merge(
                myDF3, on=['FalseTrueRatio', 'TrueFalseRatio'])
finalCutDF

# if merge data frames using one common column
df1.merge(df2,on='name').merge(df3,on='name')

Python – Plotting – Legend

Posted on September 29, 2016 by go2analytics • Posted in Python • Tagged legend, plotting • Leave a comment

For full control of what is being added to the legend, it is common to pass the appropriate handles directly to legend():

Calling legend() function using parameter – handels

line_up, = plt.plot([1,2,3], label='Line 2')
line_down, = plt.plot([3,2,1], label='Line 1')
# the label in the legend will be displayed as 'Line 2' & 'Line 1'
plt.legend(handles=[line_up, line_down])

Calling legend() function explicitly using parameter – handels & Labels

In some cases, it is not possible to set the label of the handle, so it is possible to pass through the list of labels to legend():

line_up, = plt.plot([1,2,3], label='Line 2')
line_down, = plt.plot([3,2,1], label='Line 1')
# the label in the legend will be displayed as 'Line Up' & 'Line Down'
# function syntax --- legend(handels, labels)
plt.legend([line_up, line_down], ['Line Up', 'Line Down'])

Labels in above two examples do NOT have specified color or mark inside. To do that …

Creating artists specifically for adding to the legend

Not all handles can be turned into legend entries automatically, so it is often necessary to create an artist which can. Legend handles don’t have to exists on the Figure or Axes in order to be used.

import matplotlib.patches as mpatches
import matplotlib.pyplot as plt

# use color as legend key
red_patch = mpatches.Patch(color='red', label='The red data')
blue_line = mlines.Line2D([], [], color='blue', marker='*', 
                           markersize=15, label='Blue stars')
plt.legend(handles=[red_patch, blue_line]) 
plt.show()

Legend Location

The location of the legend can be specified by the keyword argument loc.

The bbox_to_anchor keyword gives a great degree of control for manual legend placement.

import matplotlib.pyplot as plt


plt.subplot(211)
plt.plot([1,2,3], label="test1")
plt.plot([3,2,1], label="test2")
# Place a legend above this subplot, expanding itself to
# fully use the given bounding box.
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
           ncol=2, mode="expand", borderaxespad=0.)

plt.subplot(223)
plt.plot([1,2,3], label="test1")
plt.plot([3,2,1], label="test2")
# Place a legend to the right of this smaller subplot.
# or loc = 'best'
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

plt.show()

Add multiple legends

Legend guide — Matplotlib 1.5.3 documentation

import matplotlib.pyplot as plt

line1, = plt.plot([1,2,3], label="Line 1", linestyle='--')
line2, = plt.plot([3,2,1], label="Line 2", linewidth=4)

# Create a legend for the first line.
first_legend = plt.legend(handles=[line1], loc=1)

# Add the legend manually to the current Axes.
# if this line is commented out, then ONLY the last legend will be displayed
ax = plt.gca().add_artist(first_legend)

# Create another legend for the second line.
plt.legend(handles=[line2], loc=4)

plt.show()

An example of adding multiple legend in plotting ROC curve

# df contains FPR and TPR
# v_lines
def MultiROCGenerator2(df, v_lines, v_colors, v_linestyle, v_perf, v_fname):
 
 df = df.sort_values(by='CUTOFFID', ascending=1)
 
 patchList = []
 labelList = []

 plt.figure(figsize=(4,4), dpi=64)
 
 plt.xlabel("FPR", fontsize=14)
 plt.ylabel("TPR", fontsize=14)
 plt.title("ROC Curve", fontsize=14)
 
 x=[0.0, 1.0]

 for index, measure in enumerate(v_lines):

 fpr_colName = measure[0]
 tpr_colName = measure[1]

 fpr = df[fpr_colName]
 tpr = df[tpr_colName]

 aucValue = CalAUC (fpr, tpr)
 aucLabel = 'AUC_' + v_perf[index] + ' = ' + '%.4f' % aucValue
 # create plot legend -- http://matplotlib.org/users/legend_guide.html
 # patch = mpatches.Patch(color=v_colors[index], label=aucLabel)
 # patchList.append(patch)

 patch, = plt.plot(fpr, tpr, color=v_colors[index], linewidth=2, linestyle=v_linestyle)
 patchList.append(patch)
 labelList.append(aucLabel)
 
 plt.xlim(0.0, 1.0)
 plt.ylim(0.0, 1.0)
 plt.plot(x, x, linestyle='dashed', color='red', linewidth=2, label = 'random')
 plt.legend(handles=patchList, labels=labelList, fontsize=10, loc='best')
 plt.tight_layout()
 #plt.show()
 plt.savefig(v_fname)



# calling the above function
mLines =  [('fpr_trx', 'recall_trx'),
           ('fpr_dlr', 'recall_dlr'),
           ('fpr_ent', 'recall_ent')]
mColors = ['blue', 'green', 'black']
mPerf   = ['transaction', 'dollarAmt', 'account'] #['label_1', 'label_2', 'label_3']

MultiROCGenerator2(df, v_lines=mLines, v_colors=mColors, v_linestyle='solid', 
                   v_perf=mPerf, v_fname="c:/temp/ROCtest2.png")

References:

http://matplotlib.org/users/legend_guide.html

Python – changing a specific column name in DataFrame

Posted on September 29, 2016 by go2analytics • Posted in column, dataframe, Python • Tagged dataframe, sorting, tag • Leave a comment

Note that index does NOT support mutable operations. The most elegant solution I have found so far is

names = df.columns.tolist()
# given column index number, change name
names[0] = 'new_name'
# given original column name, change name
names[names.index('old_name')] = 'new_name'
df.columns = names

# sort DataFrame by column 
df = df.sort_values(by='col_name', ascending=1)

Other references:

https://chartio.com/resources/tutorials/how-to-rename-columns-in-the-pandas-python-library

Python – count indexes in FOR loop

Posted on September 29, 2016 by go2analytics • Posted in Python, Uncategorized • Tagged loop • Leave a comment

If you have some given list, and want to iterate over its items and indices, you can use enumerate():

for index, item in enumerate(my_list):
    print index, item

If you only need the indices, you can use range():

for i in range(len(my_list)):
    print i

A Tour of Machine Learning Algorithms

Posted on September 22, 2016 by go2analytics • Posted in machine learning algorithm, SPSS Modeler • Tagged SPSS Modeling Nodes • Leave a comment

A very good online resource summarizing machine learning algorithms.

A Tour of Machine Learning Algorithms

Also, it can be complemented by 10 commonly used ML algorithms.

Related to SPSS Modeler:

SPSS Modeler – Modeling Nodes (****)

Algorithms (new features) in SPSS Modeler 17 and BigData embrancement (****)

R Plugin and GBM Package

New features in SPSS Modeler 18 (****)

Modeling Algorithms included in SPSS Modeler 18

New features in SPSS Modeler 18 and SPSS Statistics 24