Python function parameters
Python – List & Operations
List append() Method
aList = [123, 'xyz', 'zara', 'abc']; aList.append( 2009 ); print "Updated List : ", aList
result:
Updated List : [123, 'xyz', 'zara', 'abc', 2009]
List – reciprocal
Example:
FalseTrueRatio = range(1,26)
TrueFalseRatio = [1.0/x for x in FalseTrueRatio]
TrueFalseRatio
Ref: http://stackoverflow.com/questions/8244915/how-do-you-divide-each-element-in-a-list-by-an-int
result:
Finding index of an item closest to the value in a list
Example:
# given pcnList as a list; x_val as a given value
min(range(len(pcnList)), key=lambda i: abs(pcnList[i]-x_val))
find nearest value in numpy array
Other List functions
- updating list
- deleting list — del list1[2];
- Basic operations
Python Epression | Results | Description |
---|---|---|
len([1, 2, 3]) | 3 | Length |
[1, 2, 3] + [4, 5, 6] | [1, 2, 3, 4, 5, 6] | Concatenation |
[‘Hi!’] * 4 | [‘Hi!’, ‘Hi!’, ‘Hi!’, ‘Hi!’] | Repetition |
3 in [1, 2, 3] | True | Membership |
for x in [1, 2, 3]: print x, | 1 2 3 | Iteration |
- Indexing, Slicing, and Matrixes
Example:
L = ['spam', 'Spam', 'SPAM!']
Python Expression | Results | Description |
---|---|---|
L[2] | ‘SPAM!’ | Offsets start at zero |
L[-2] | ‘Spam’ | Negative: count from the right |
L[1:] | [‘Spam’, ‘SPAM!’] | Slicing fetches sections |
- Built-in List Functions & Methods:
Function with Description cmp(list1, list2) Compares elements of both lists. len(list) Gives the total length of the list. max(list) Returns item from the list with max value. min(list) Returns item from the list with min value. list(seq) Converts a tuple into list.
Methods with Description |
list.append(obj) |
Appends object obj to list |
list.count(obj) |
Returns count of how many times obj occurs in list |
list.extend(seq) |
Appends the contents of seq to list |
list.index(obj) |
Returns the lowest index in list that obj appears |
list.insert(index, obj) |
Inserts object obj into list at offset index |
list.pop(obj=list[-1]) |
Removes and returns last object or obj from list |
list.remove(obj) |
Removes object obj from list |
list.reverse() |
Reverses objects of list in place |
list.sort([func]) |
Sorts objects of list, use compare func if given |
Python – string & substring
Get a string after a specific substring
Two ways:
The easiest way is probably just to split on your target word
# 1st way -- using split my_string="hello python world , i'm a beginner " print my_string.split("world",1)[1] my_string.split("world",1)
# 2nd way -- using index s1 = "hello python world , i'm a beginner " s2 = "world" print s1[s1.index(s2) + len(s2):]
print s1[s1.index(s2):]
http://stackoverflow.com/questions/12572362/get-a-string-after-a-specific-substring
Python – combine columns into data frame
Combine lists into data frame
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
percentile_list = pd.DataFrame({'lst1Tite' : lst1,
'lst2Tite' : lst2,
'lst3Tite' : lst3
})
# you don't need to specify the column names when you're creating a dataframe
# from a dict like this
percentile_list
lst1Tite lst2Tite lst3Tite
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
...
Combine multiple data frame into one
# merge three data frames (myDF, myDF2, myDF3), using common columns
finalCutDF = myDF.merge(myDF2, on=['FalseTrueRatio', 'TrueFalseRatio']).merge(
myDF3, on=['FalseTrueRatio', 'TrueFalseRatio'])
finalCutDF
# if merge data frames using one common column
df1.merge(df2,on='name').merge(df3,on='name')
Python – Plotting – Legend
For full control of what is being added to the legend, it is common to pass the appropriate handles directly to legend()
:
Calling legend() function using parameter – handels
line_up, = plt.plot([1,2,3], label='Line 2')
line_down, = plt.plot([3,2,1], label='Line 1')
# the label in the legend will be displayed as 'Line 2' & 'Line 1'
plt.legend(handles=[line_up, line_down])
Calling legend() function explicitly using parameter – handels & Labels
In some cases, it is not possible to set the label of the handle, so it is possible to pass through the list of labels to legend()
:
line_up, = plt.plot([1,2,3], label='Line 2')
line_down, = plt.plot([3,2,1], label='Line 1')
# the label in the legend will be displayed as 'Line Up' & 'Line Down'
# function syntax --- legend(handels, labels)
plt.legend([line_up, line_down], ['Line Up', 'Line Down'])
Labels in above two examples do NOT have specified color or mark inside. To do that …
Creating artists specifically for adding to the legend
Not all handles can be turned into legend entries automatically, so it is often necessary to create an artist which can. Legend handles don’t have to exists on the Figure or Axes in order to be used.
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
# use color as legend key
red_patch = mpatches.Patch(color='red', label='The red data')
blue_line = mlines.Line2D([], [], color='blue', marker='*',
markersize=15, label='Blue stars')
plt.legend(handles=[red_patch, blue_line])
plt.show()
Legend Location
The location of the legend can be specified by the keyword argument loc.
The bbox_to_anchor
keyword gives a great degree of control for manual legend placement.
import matplotlib.pyplot as plt
plt.subplot(211)
plt.plot([1,2,3], label="test1")
plt.plot([3,2,1], label="test2")
# Place a legend above this subplot, expanding itself to
# fully use the given bounding box.
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
ncol=2, mode="expand", borderaxespad=0.)
plt.subplot(223)
plt.plot([1,2,3], label="test1")
plt.plot([3,2,1], label="test2")
# Place a legend to the right of this smaller subplot.
# or loc = 'best'
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()
Add multiple legends
Legend guide — Matplotlib 1.5.3 documentation
import matplotlib.pyplot as plt
line1, = plt.plot([1,2,3], label="Line 1", linestyle='--')
line2, = plt.plot([3,2,1], label="Line 2", linewidth=4)
# Create a legend for the first line.
first_legend = plt.legend(handles=[line1], loc=1)
# Add the legend manually to the current Axes.
# if this line is commented out, then ONLY the last legend will be displayed
ax = plt.gca().add_artist(first_legend)
# Create another legend for the second line.
plt.legend(handles=[line2], loc=4)
plt.show()
An example of adding multiple legend in plotting ROC curve
# df contains FPR and TPR # v_lines def MultiROCGenerator2(df, v_lines, v_colors, v_linestyle, v_perf, v_fname): df = df.sort_values(by='CUTOFFID', ascending=1) patchList = [] labelList = [] plt.figure(figsize=(4,4), dpi=64) plt.xlabel("FPR", fontsize=14) plt.ylabel("TPR", fontsize=14) plt.title("ROC Curve", fontsize=14) x=[0.0, 1.0] for index, measure in enumerate(v_lines): fpr_colName = measure[0] tpr_colName = measure[1] fpr = df[fpr_colName] tpr = df[tpr_colName] aucValue = CalAUC (fpr, tpr) aucLabel = 'AUC_' + v_perf[index] + ' = ' + '%.4f' % aucValue # create plot legend -- http://matplotlib.org/users/legend_guide.html # patch = mpatches.Patch(color=v_colors[index], label=aucLabel) # patchList.append(patch) patch, = plt.plot(fpr, tpr, color=v_colors[index], linewidth=2, linestyle=v_linestyle) patchList.append(patch) labelList.append(aucLabel) plt.xlim(0.0, 1.0) plt.ylim(0.0, 1.0) plt.plot(x, x, linestyle='dashed', color='red', linewidth=2, label = 'random') plt.legend(handles=patchList, labels=labelList, fontsize=10, loc='best') plt.tight_layout() #plt.show() plt.savefig(v_fname) # calling the above function mLines = [('fpr_trx', 'recall_trx'), ('fpr_dlr', 'recall_dlr'), ('fpr_ent', 'recall_ent')] mColors = ['blue', 'green', 'black'] mPerf = ['transaction', 'dollarAmt', 'account'] #['label_1', 'label_2', 'label_3'] MultiROCGenerator2(df, v_lines=mLines, v_colors=mColors, v_linestyle='solid', v_perf=mPerf, v_fname="c:/temp/ROCtest2.png")
References:
Python – changing a specific column name in DataFrame
Note that index does NOT support mutable operations. The most elegant solution I have found so far is
names = df.columns.tolist() # given column index number, change name names[0] = 'new_name' # given original column name, change name names[names.index('old_name')] = 'new_name' df.columns = names # sort DataFrame by column df = df.sort_values(by='col_name', ascending=1)
Other references:
https://chartio.com/resources/tutorials/how-to-rename-columns-in-the-pandas-python-library
Python – count indexes in FOR loop
If you have some given list, and want to iterate over its items and indices, you can use enumerate()
:
for index, item in enumerate(my_list): print index, item
If you only need the indices, you can use range()
:
for i in range(len(my_list)):
print i
A Tour of Machine Learning Algorithms
A very good online resource summarizing machine learning algorithms.
A Tour of Machine Learning Algorithms
Also, it can be complemented by 10 commonly used ML algorithms.
Related to SPSS Modeler:
SPSS Modeler – Modeling Nodes (****)
Algorithms (new features) in SPSS Modeler 17 and BigData embrancement (****)
New features in SPSS Modeler 18 (****)
Modeling Algorithms included in SPSS Modeler 18
New features in SPSS Modeler 18 and SPSS Statistics 24