Py libraries
Python For Data Science Cheat Sheet
Numpy
linalg
1 | a = [1, 2, 3] |
statistics
1 | np.sum(a,axis) |
Pandas
panel data && data analysis
Scipy
scipy.optimize
scipy.linalg
scipy.stats
Scikit-Learn
- Classification:
- Supervised Learning
- Identify the category of a given object. .
- Implemented Algs: SVM, nearest neighbors, logical regression, random forests, decision trees, and multi-level perceptron (MLP) neural networks.
- Regression
- Predict continuous value attributes associated with a given object.
- Implemented Algs: Support Vector Regression (SVR), Lasso Regression, Bayesian Regression, Stochastic Forest Regression, etc.
- Clustering:
- Unsupervised Learning
- Group a given object based on similar characteristics
- Implemented Algs: K-means clustering, spectral clustering, hierarchical clustering, and DBSCAN clustering.
- Data dimensionality reduction
- Excessive computational complexity, and feature sparsity is too severe
- Such as principal component analysis (PCA), non negative matrix decomposition (NMF), or feature selection to reduce the number of random variables.
- Model selection
- Select which model works best for a given parameter and model.
- The main purpose is to run the model by setting different parameters, and then select the optimal parameters through the results to improve the final model accuracy.
- Implemented Modules: Lattice search, cross validation, and various measurement functions for prediction error evaluation.
- Data Processing
- The feature extraction and normalized feature extraction of data preprocessing data refers to converting text or image data into digital variables.
- Learn by removing invariant, covariant, or other statistically insignificant feature quantities.
- Normalization refers to converting input data into new variables with a zero mean and unit weight variance.
matplotlib
1 | import numpy as np |
uuid
uuid1()
基于时间戳
使用主机ID, 序列号, 和当前时间来生成UUID, 可保证全球范围的唯一性。
但由于使用该方法生成的UUID中包含有主机的网络地址, 因此可能危及隐私.
该函数有两个参数, 如果 node 参数未指定, 系统将会自动调用 getnode() 函数来获取主机的硬件地址。
如果 clock_seq 参数未指定系统会使用一个随机产生的14位序列号来代替.
uuid2()
算法和uuid1()相同,不同的是把时间戳的前4位换成POSI的UID,实际当中很少使用
uuid3()
基于名字的MD5散列值
通过计算命名空间和名字的MD5散列值来生成UUID,
可以保证同一命名空间中不同名字的唯一性和不同命名空间中同一名称的唯一性,
但同一命名空间的同一名字生成的UUID相同。
uuid4()
基于随机数
通过伪随机数来生成UUID。有一定的重复概率.
uuid5()
基于名字的SHA-1散列值
通过计算命名空间和名字的SHA-1散列值来生成UUID, 算法与 uuid.uuid3() 相同.
唯一性要求,最好使用uuid3()或uuid5()
全局分布式环境下,最好用uuid1()
base64
1 | s1 = b"hehehe" |
Use urlsafe_b64encode() and urlsafe_b64decode() for safety when using url
1
2
3
4
5
6
7
8
9
```
# hashlib
```Python
s1 = b"hehehe"
m = hashlib.md5() //Or m1 = hashlib.sha1()
m.update(s1)
res = m.hexdigest()
print(res)
//Update() can be used many times when data is big.
hmac
1 | import hmac |
time
1 | time.time() //timetag of current time |
Format
%a | locally simplify week name |
---|---|
%A | locally complete week name |
%b | locally simplify month name |
%B | locally complete month name |
%c | local corresponding date and time expression |
%d | the day in a month |
%H | the hour in a day(00-23) |
%I | the hour in a day(01-12) |
%j | the day in a year(001-366) |
%m | month(01-12) |
%M | minute(00-59) |
%p | am/pm |
%S | second(00-59) |
%U | the number of weeks in a year(Start with Sunday) |
%w | the day in a week(0-6) |
%W | the number of weeks in a year(Start with Monday) |
%x | the corresponding local date |
%X | the corresponding local time |
%y | year without cenetury(00-99) |
%Y | whole year |
%Z | the name of time zone, “”for nonexistence |
datetime
1 | t1 = datetime.datetime.now() //get current time |
All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.