Python

Basic

Basic - 基本运算

python 除法

# python2
10 / 3 = 3
10 / 3.0 = 3.333333
10 // 3 = 3
10 // 3.0 = 3.0 # 截取小数部分但还是小数

# python3
10 / 3 = 3.333333
10 / 3.0 = 3.333333
10 // 3 = 3
10 // 3.0 = 3.0 # 截取小数部分但还是小数

查看变量类型
```
type()
```

ord, chr, unichr

chr(65) = 'A'
ord('a') = 97
unichr(12345) = u'\u3039'

Basic - eval, exec

An explanation

http://www.mojidong.com/python/2013/05/10/python-exec-eval/

Bit manipulation

Bit manipulation - 进制转换

10进制转换成2、8、16进制

bin();oct();hex()
60 = 0b11100 = 0o74 = 0x3c

其他进制转换成10进制
```
int('101',2) = 5
int('17',8) = 15
```

Bit manipulation - 位操作

二进制操作
- &：按位与
- |：按位或
- ^：按位异或
- ~：取反
- <<：左移
- >>: 右移

List

List - Sort

Two method

a.sort() # did on a
a = sorted(a, reverse = True) # gen a new item

List - Counter

count 函数

a = [1,1,1,'1']
a.count(1) = 3
a.count('1') = 1

collections.Counter 模块计数

http://www.zlovezl.cn/articles/collections-in-python/

from collections import Counter
s = '''A Counter i....'''.lower()

c = Counter(s)
# 获取出现频率最高的5个字符
print c.most_common(5)

# Result:
[(' ', 54), ('e', 32), ('s', 25), ('a', 24), ('t', 24)]

List - Join, Split

http://wangwei007.blog.51cto.com/68019/1100587

li = ['my','name','is','bob']
'_'.join(li) = 'my_name_is_bob'
'a  b'.split(' ') = ['a','','b'] # 中间两个空格

List - Extended Slices

Official explanation

https://docs.python.org/2/whatsnew/2.3.html#extended-slices

L = range(11) # L = [0,1,2,3,4,5,6,7,8,9,10]
L[::2] # = [0,2,4,6,8,10]
L[0:10:2] # = [0,2,4,6,8]
L[::-1] # L = [10,9,8,7,6,5,4,3,2,1,0]

获取特定 index 范围

# list
a = [1,2,3,4,5,6]
a[:3] = [1,2,3]
a[3:] = [4,5,6]
a[-2:] = [5,6]

反转 Reverse

a = [1,2,3]
a[::-1] = [3,2,1]
a.reverse() # update on a directly

List - copy

Ref: http://www.jb51.net/article/64030.htm

=通常只是创建了一个引用 Normally, = is just creating an alias
```
# wrong 
a = [1,2,3,4,5]
b = a
b.append(6)
print a # [1,2,3,4,5,6]
```
大多数情况下，下面两种方式可以复制一个 list 对象，In most cases, the following will do
```
# one possible ans
b = a[:]
b.append(6)
print a # [1,2,3,4,5]

# another possible ans
b = list(a)
```
当list里包含list的时候，必须使用 copy.deepcopy()
```
import copy
a = [[1,2,3], [2,3,4]]
b = copy.deepcopy(a)
```

List - Traverse

Basic:

>>> nums = [6, 7, 8, 9, 10]
>>> for i in nums:
...     print i,
6 7 8 9 10

with index using enumerate:

>>> nums = [6, 7, 8, 9, 10]
>>> for idx, n in enumerate(nums):
...     print idx, n
0 6
1 7
2 8
3 9
4 10

List - Special Usage

range 用法

range(10) == range(0,10) == range(0,10,1) == [0,1,2,3,4,5,6,7,8,9]
range(10,0,-1) == [10,9,8,7,6,5,4,3,2,1] == range(1,11)[::-1]

获取数组最后一个元素 Last element
```
a = [1,2,3,4]
a[-1] = 4
```

获取元素某元素的 index

# a.index(obj[, start_search_index])
>>> a = [1,2,3,4,3,2,1]
>>> a.index(3)      # 2
>>> a.index(3, 2)   # 2
>>> a.index(3, 3)   # 4
>>> a.index(5)      # Value Error, will exit program
>>> a.index(max(a)) # Get max obj index

List - Bisect

Official Intro: https://docs.python.org/2/library/bisect.html
- An amazing module, can automatically insert a value into a sorted list
- Note: The list must be sorted from small to big

插入：

import bisect as bi

# insert to the left
bi.insort_left(l, val) # bi.insort_left(l, val, lo=0, hi=len(l))
l.insert(bi.bisect_left(l, val), val) # same as above

# Below is equivalent, insert to the right
bi.insort_right(l, val)
bi.insort(l, val)

查找插入的位置

import bisect as bi

# return the index of 'will insert place' (left)
bi.bisect_left(l, val) # bi.bisect_left(l, val, lo=0, hi=len(l))

# Below is equivalent, return the index of 'will insert place' (right)
bi.bisect_right(l, val)
bi.bisect(l, val)

Generator

Generator - Basic Usage

Using ()

 >>> L = [x * x for x in range(10)]
 >>> L
 [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
 >>> g = (x * x for x in range(10))
 >>> g
 <generator object <genexpr> at 0x104feab40>

 >>> for n in g:
 ...     print n,
 ...
 0 1 4 9 16 25 36 49 64 81

An yield example

 # An example
 def rev_str(my_str):  
     length = len(my_str)  
     for i in range(length - 1,-1,-1):  
         yield my_str[i]
 for char in rev_str("hello"):  
     print(char), # o l l e h

Ref
- http://www.jb51.net/article/63929.htm
- http://www.cnblogs.com/hump/p/6287462.html

String

String - Encode, Decode

Encode

http://www.runoob.com/python/att-string-decode.html

>>> str = "this is string example....wow!!!";
>>> print "Encoded String: " + str.encode('base64','strict')

Encoded String: dGhpcyBpcyBzdHJpbmcgZXhhbXBsZS4uLi53b3chISE=

Decode

http://www.runoob.com/python/att-string-encode.html

# str.decode(encoding='UTF-8',errors='strict')
>>> str = "this is string example....wow!!!";
>>> str = str.encode('base64','strict');
>>> print "Encoded String: " + str;
>>> print "Decoded String: " + str.decode('base64','strict')

Encoded String: dGhpcyBpcyBzdHJpbmcgZXhhbXBsZS4uLi53b3chISE=
Decoded String: this is string example....wow!!!

Queue

Queue - Basic

Official Intro: https://docs.python.org/2/tutorial/datastructures.html?highlight=queue

>>> from collections import deque
>>> queue = deque(["Eric", "John", "Michael"])
>>> queue.append("Terry")           # Terry arrives
>>> queue.append("Graham")          # Graham arrives
>>> queue.popleft()                 # 'Eric'
>>> queue.popleft()                 # 'John'
>>> queue                           # deque(['Michael', 'Terry', 'Graham'])

Dict

Dict - 遍历 Traverse

遍历：
```
for k,v in d.items():
    print k,v
```

Dict - 排序 Sort

Using operator

# Sort by keys
import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(0),reverse=True)
for k,v in sorted_x: # Return a tuple, can directly use: 'for k,v in sorted_x'
   ...

# Sort by values:
import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(1),reverse=True)

Using lambda

# Sort by keys:
sorted(d.items(), key=lambda x: x[0],reverse=True)

# Sort by values:
sorted(d.items(), key=lambda x: x[1],reverse=True)

Set

Set - Basic Usage

Basic

x.add('d') # 添加一项 
x.update([10,37,42]) # 在s中添加多项
x.remove('H')
x in s # 测试 x 是否是 s 的成员
x not in s # 测试 x 是否不是 s 的成员

symbol

x & y # 交集
x | y # 并集
x - y # 差集
x ^ y # 对称差集（项在x或sy中，但不会同时出现在二者中）

Others

# 测试是否 s 中的每一个元素都在 t 中
s.issubset(t) == True
s <= t

# 测试是否 t 中的每一个元素都在 s 中
s.issuperset(t)
s >= t

# 删除 set s 中的所有元素
s.clear()

File

File - Traverse a directory (3 methods)

using os.listdir 只会列出当前文件夹下的文件、目录，若要再遍历，需自己写递归

import os
sep = os.sep # get sys seperator
root = "." + sep + "Desktop" + sep + "mywiki" + sep
def traverse_dir(root):
    for f in os.listdir(root):
        full_path = os.path.join(root, f) # join path
        if os.path.isfile(full_path):
            print full_path # file
        if os.path.isdir(full_path):
            traverse_dir(full_path) # dir, call traverse_dir again

using os.walk 会直接再递归遍历完全部的文件夹

# os.walk(top, topdown = True, onerror = None)
# topdown = True : bfs
# topdown = false: dfs

import os 
def traverse_dir(rootDir): 
    for root, dirs, files in os.walk(rootDir): 
        print root, dirs, files
        for d in dirs: 
            print os.path.join(root, d)      
        for f in files: 
            print os.path.join(root, f)

using os.path.walk，要利用回调函数，会直接再递归遍历完全部的文件夹

# os.path.walk(top, func, arg)
# func: call back func, must contain at least 3 args (arg, dirname, files)
# arg:  arg in call_back_func as a tuple

import os
# call back func
def find_file(arg, dirname, files):
    for file in files:
        file_path = os.path.join(dirname, file)
        if os.path.isfile(file_path):
            print "find file:%s" % file_path

os.path.walk("./Desktop/mywiki", find_file, ())

File - Read csv

First Row is column name

with open(path, 'r') as f:
    reader = csv.DictReader(f)
    # fields: No., Time, Source, Destination, Protocol, Length, Host, Info, Request URI Query Parameter
    for row in reader:
        print row["Info"]

First Row is data, need to specify column name

with open('names.csv', 'w') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

Does not use column name, row is a list instead

with open('some.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print row

Numpy

Numpy - array to list, list to array

Usage

import numpy as np
l = [2, 3, 4, 5]
arr = np.array(l) # convert list to array
l = arr.tolist() # convert array to list

Numpy - get indices when sorting an array

get indices

>>> import numpy as np
>>> arr = np.array([1, 10, 2, 4, 8]) # convert list to array

>>> arr.argsort() # ascending
array([0, 2, 3, 4, 1])

>>> arr.argsort()[::-1] # descending
array([0, 2, 3, 4, 1])

get max / min n indices

>>> import numpy as np
>>> top_n = 3
>>> arr = np.array([1, 3, 2, 4, 5]) # convert list to array

>>> top_n_indices = arr.argsort()[-top_n:][::-1] # get max n indices
array([4, 3, 1])

>>> min_n_indices = arr.argsort()[:n] # get min n indices
array([0, 2, 1])

Special

Special - with

open one file

with open("x.txt") as f:
    data = f.read()
    # do something with data

open multiple files

with open("x.txt") as f1, open('xxx.txt') as f2:
    # do something with f1,f2

Special - Levenshtein Distance

Installation:

Ref: https://github.com/ztane/python-Levenshtein/

Ref: http://www.cnblogs.com/kaituorensheng/archive/2013/05/18/3085653.html

Usage:

import Levenshtein

# Calc Levenshtein dDistance (or Edit Distance)
# 删除、插入、替换 +1
Levenshtein.distance(str1, str2)

# Calc Levenshtein Ratio, r = (sum - ldist) / sum
# sum = len(str1) + len(str2)
# but 删除、插入 +1，替换 +2
Levenshtein.ratio(str1, str2)

# Calc Hamming Distance
# len must be the same
Levenshtein.hamming(str1, str2)

# Calc Jaro Distance
Levenshtein.jaro(s1, s2)

# Calc Jaro–Winkler Distance
Levenshtein.jaro_winkler(s1, s2)