Python – List & Operations

List append() Method

aList = [123, 'xyz', 'zara', 'abc'];
aList.append( 2009 );
print "Updated List : ", aList

result:

Updated List :  [123, 'xyz', 'zara', 'abc', 2009]

 

List – reciprocal

Example:
FalseTrueRatio = range(1,26)
TrueFalseRatio = [1.0/x for x in FalseTrueRatio]
TrueFalseRatio

Ref: http://stackoverflow.com/questions/8244915/how-do-you-divide-each-element-in-a-list-by-an-int

result:

[1.0,  0.5,  0.3333333333333333,  0.25,  0.2, . . .]

 

Finding index of an item closest to the value in a list

Example:
# given pcnList as a list; x_val as a given value
min(range(len(pcnList)), key=lambda i: abs(pcnList[i]-x_val))

 

Ref: http://stackoverflow.com/questions/9706041/finding-index-of-an-item-closest-to-the-value-in-a-list-thats-not-entirely-sort

find nearest value in numpy array

 

Other List functions

  • updating list
  • deleting list  — del list1[2];
  • Basic operations

 

Python Epression Results Description
len([1, 2, 3]) 3 Length
[1, 2, 3] + [4, 5, 6] [1, 2, 3, 4, 5, 6] Concatenation
[‘Hi!’] * 4 [‘Hi!’, ‘Hi!’, ‘Hi!’, ‘Hi!’] Repetition
3 in [1, 2, 3] True Membership
for x in [1, 2, 3]: print x, 1 2 3 Iteration

 

  • Indexing, Slicing, and Matrixes

Example:

L = ['spam', 'Spam', 'SPAM!']
Python Expression Results Description
L[2] ‘SPAM!’ Offsets start at zero
L[-2] ‘Spam’ Negative: count from the right
L[1:] [‘Spam’, ‘SPAM!’] Slicing fetches sections
  • Built-in List Functions & Methods:
    Function with Description
    cmp(list1, list2)
    Compares elements of both lists.
    len(list)
    Gives the total length of the list.
    max(list)
    Returns item from the list with max value.
    min(list)
    Returns item from the list with min value.
    list(seq)
    Converts a tuple into list.
Methods with Description
list.append(obj)
Appends object obj to list
list.count(obj)
Returns count of how many times obj occurs in list
list.extend(seq)
Appends the contents of seq to list
list.index(obj)
Returns the lowest index in list that obj appears
list.insert(index, obj)
Inserts object obj into list at offset index
list.pop(obj=list[-1])
Removes and returns last object or obj from list
list.remove(obj)
Removes object obj from list
list.reverse()
Reverses objects of list in place
list.sort([func])
Sorts objects of list, use compare func if given

Python – string & substring

Get a string after a specific substring

Two ways:
The easiest way is probably just to split on your target word

# 1st way -- using split
my_string="hello python world , i'm a beginner "
print my_string.split("world",1)[1]
my_string.split("world",1)
# split generates two-element list i.e. [0],[1] -- before/after parameter 1
['hello python ', " , i'm a beginner "]
# 2nd way -- using index
s1 = "hello python world , i'm a beginner "
s2 = "world"
print s1[s1.index(s2) + len(s2):]
, i'm a beginner 
print s1[s1.index(s2):]
"world , i'm a beginner "
http://stackoverflow.com/questions/12572362/get-a-string-after-a-specific-substring

 




 

Python – combine columns into data frame

Combine lists into data frame

lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
percentile_list = pd.DataFrame({'lst1Tite' : lst1,
 'lst2Tite' : lst2,
 'lst3Tite' : lst3
  })
# you don't need to specify the column names when you're creating a dataframe 
# from a dict like this

percentile_list
    lst1Tite  lst2Tite  lst3Tite
0          0         0         0
1          1         1         1
2          2         2         2
3          3         3         3
4          4         4         4
5          5         5         5
6          6         6         6
...

 

Combine multiple data frame into one

# merge three data frames (myDF, myDF2, myDF3), using common columns
finalCutDF = myDF.merge(myDF2, on=['FalseTrueRatio', 'TrueFalseRatio']).merge(
                myDF3, on=['FalseTrueRatio', 'TrueFalseRatio'])
finalCutDF

# if merge data frames using one common column
df1.merge(df2,on='name').merge(df3,on='name')

 

Python – Plotting – Legend

For full control of what is being added to the legend, it is common to pass the appropriate handles directly to legend():

Calling legend() function using parameter – handels

line_up, = plt.plot([1,2,3], label='Line 2')
line_down, = plt.plot([3,2,1], label='Line 1')
# the label in the legend will be displayed as 'Line 2' & 'Line 1'
plt.legend(handles=[line_up, line_down])

Calling legend() function explicitly using parameter – handels & Labels

In some cases, it is not possible to set the label of the handle, so it is possible to pass through the list of labels to legend():

line_up, = plt.plot([1,2,3], label='Line 2')
line_down, = plt.plot([3,2,1], label='Line 1')
# the label in the legend will be displayed as 'Line Up' & 'Line Down'
# function syntax --- legend(handels, labels)
plt.legend([line_up, line_down], ['Line Up', 'Line Down'])

Labels in above two examples do NOT have specified color or mark inside. To do that …

Creating artists specifically for adding to the legend

Not all handles can be turned into legend entries automatically, so it is often necessary to create an artist which can. Legend handles don’t have to exists on the Figure or Axes in order to be used.

 

import matplotlib.patches as mpatches
import matplotlib.pyplot as plt

# use color as legend key
red_patch = mpatches.Patch(color='red', label='The red data')
blue_line = mlines.Line2D([], [], color='blue', marker='*', 
                           markersize=15, label='Blue stars')
plt.legend(handles=[red_patch, blue_line]) 
plt.show()

Legend Location

The location of the legend can be specified by the keyword argument loc.

The bbox_to_anchor keyword gives a great degree of control for manual legend placement.

 

import matplotlib.pyplot as plt


plt.subplot(211)
plt.plot([1,2,3], label="test1")
plt.plot([3,2,1], label="test2")
# Place a legend above this subplot, expanding itself to
# fully use the given bounding box.
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
           ncol=2, mode="expand", borderaxespad=0.)

plt.subplot(223)
plt.plot([1,2,3], label="test1")
plt.plot([3,2,1], label="test2")
# Place a legend to the right of this smaller subplot.
# or loc = 'best'
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

plt.show()

 

Add multiple legends

Legend guide — Matplotlib 1.5.3 documentation

import matplotlib.pyplot as plt

line1, = plt.plot([1,2,3], label="Line 1", linestyle='--')
line2, = plt.plot([3,2,1], label="Line 2", linewidth=4)

# Create a legend for the first line.
first_legend = plt.legend(handles=[line1], loc=1)

# Add the legend manually to the current Axes.
# if this line is commented out, then ONLY the last legend will be displayed
ax = plt.gca().add_artist(first_legend)

# Create another legend for the second line.
plt.legend(handles=[line2], loc=4)

plt.show()

An example of adding multiple legend in plotting ROC curve

# df contains FPR and TPR
# v_lines
def MultiROCGenerator2(df, v_lines, v_colors, v_linestyle, v_perf, v_fname):
 
 df = df.sort_values(by='CUTOFFID', ascending=1)
 
 patchList = []
 labelList = []

 plt.figure(figsize=(4,4), dpi=64)
 
 plt.xlabel("FPR", fontsize=14)
 plt.ylabel("TPR", fontsize=14)
 plt.title("ROC Curve", fontsize=14)
 
 x=[0.0, 1.0]

 for index, measure in enumerate(v_lines):

 fpr_colName = measure[0]
 tpr_colName = measure[1]

 fpr = df[fpr_colName]
 tpr = df[tpr_colName]

 aucValue = CalAUC (fpr, tpr)
 aucLabel = 'AUC_' + v_perf[index] + ' = ' + '%.4f' % aucValue
 # create plot legend -- http://matplotlib.org/users/legend_guide.html
 # patch = mpatches.Patch(color=v_colors[index], label=aucLabel)
 # patchList.append(patch)

 patch, = plt.plot(fpr, tpr, color=v_colors[index], linewidth=2, linestyle=v_linestyle)
 patchList.append(patch)
 labelList.append(aucLabel)
 
 plt.xlim(0.0, 1.0)
 plt.ylim(0.0, 1.0)
 plt.plot(x, x, linestyle='dashed', color='red', linewidth=2, label = 'random')
 plt.legend(handles=patchList, labels=labelList, fontsize=10, loc='best')
 plt.tight_layout()
 #plt.show()
 plt.savefig(v_fname)



# calling the above function
mLines =  [('fpr_trx', 'recall_trx'),
           ('fpr_dlr', 'recall_dlr'),
           ('fpr_ent', 'recall_ent')]
mColors = ['blue', 'green', 'black']
mPerf   = ['transaction', 'dollarAmt', 'account'] #['label_1', 'label_2', 'label_3']

MultiROCGenerator2(df, v_lines=mLines, v_colors=mColors, v_linestyle='solid', 
                   v_perf=mPerf, v_fname="c:/temp/ROCtest2.png")

 

References:

http://matplotlib.org/users/legend_guide.html

Python – changing a specific column name in DataFrame

Note that index does NOT support mutable operations. The most elegant solution I have found so far is

names = df.columns.tolist()
# given column index number, change name
names[0] = 'new_name'
# given original column name, change name
names[names.index('old_name')] = 'new_name'
df.columns = names

# sort DataFrame by column 
df = df.sort_values(by='col_name', ascending=1)

Other references:

https://chartio.com/resources/tutorials/how-to-rename-columns-in-the-pandas-python-library 

A Tour of Machine Learning Algorithms

A very good online resource summarizing machine learning algorithms.

A Tour of Machine Learning Algorithms 

 

Also, it can be complemented by 10 commonly used ML algorithms.

 

Related to SPSS Modeler:

SPSS Modeler – Modeling Nodes (****)

Algorithms (new features) in SPSS Modeler 17 and BigData embrancement (****)

R Plugin and GBM Package

New features in SPSS Modeler 18 (****)

Modeling Algorithms included in SPSS Modeler 18

New features in SPSS Modeler 18 and SPSS Statistics 24