怎么把文本文档改成Python 怎么把文本文档改成fasta

转载

charlesc 2024-05-12 18:30:38

文章标签 怎么把文本文档改成Python sql mysql 数据 文章分类 Python 后端开发

如何从数据库中读取数据到DataFrame中？

使用pandas.io.sql模块中的sql.read_sql_query(sql_str,conn)和sql.read_sql_table(table_name,conn)就好了。

第一个是使用sql语句，第二个是直接将一个table转到dataframe中。

pandas提供这这样的接口完成此工作——read_sql()。下面我们用离子来说明这个方法。

我们要从sqlite数据库中读取数据，引入相关模块

1.  read_sql接受两个参数，一个是sql语句，这个你可能需要单独学习；一个是con（数据库连接）、read_sql直接返回一个DataFrame对象
2.  打印一下，可以看到已经成功的读取了数据
3.  
4.  我们还可以使用index_col参数来规定将那一列数据设置为index
5.  
6.  结果输出为：
7.  
8.  当然，我们可以设置多个index，只要将index_col的值设置为列表
9.  
10.  输出结果为：
11.  
12.  写入数据库也很简单，下面第二句用于删除数据库中已有的表"weather_2012"，然后将df保存到数据库中的"weather_2012"表
13.  假如我们使用的是mysql数据库也没问题，我们只需要建立与mysql的连接即可，用下面的con代替上面的con可以达到的效果相同。

补充：

（1）DateFrane 可以将结果转换成DataFrame

import pandas as pd
 import pymysql
 conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1')
 cursor = conn.cursor()
 # cursor.execute("DROP TABLE IF EXISTS test")#必须用cursor才行
 
 sql = "select * from user"
 
 df = pd.read_sql(sql,conn,)
 
 aa=pd.DataFrame(df)
 
 print aa
 （2）存储
pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除
pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='replace')#必须制定flavor='mysql'

 #!/usr/bin/env python
 # -*- coding:utf-8 -*-
 import pandas as pd
 import pymysql
 conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1')
 cursor = conn.cursor()
 # cursor.execute("DROP TABLE IF EXISTS user_copy")#必须用cursor才行
 
 sql = "select * from user"
 df = pd.read_sql(sql,conn,chunksize=2)
 for piece in df:
     aa=pd.DataFrame(piece)
     # pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除
     pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='replace')#必须制定flavor='mysql'
 
  (3)根据条件添加一列数据
piece['xb'] = list(map(lambda x: '男' if x == '123' else '女', piece['pwd']))
(4)如果有汉字，链接时必须知道字符类型   charset="utf8"
(5)最后实现代码（迭代读取数据，根据一列内容新增一列，）
  
 #!/usr/bin/env python
 # -*- coding:utf-8 -*-
 import pandas as pd
 import pymysql
 conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='123456', db='db1',charset="utf8")
 cursor = conn.cursor()
 # cursor.execute("DROP TABLE IF EXISTS user_copy")#必须用cursor才行
 
 sql = "select * from user"
 df = pd.read_sql(sql,conn,chunksize=2)
 for piece in df:
     # pd.io.sql.write_frame(df, "user_copy", conn)#不能用已经移除
 
     piece['xb'] = list(map(lambda x: '男' if x == '123' else '女', piece['pwd']))
     print(piece)
 
     pd.io.sql.to_sql(piece, "user_copy", conn,flavor='mysql',if_exists='append')#必须制定flavor='mysql'
  
create_engine("mysql+pymysql://root:123456@127.0.0.1:3306/jd?charset=utf8", max_overflow=5)
 # 用sqlalchemy链接
 
 from sqlalchemy import create_engine
 engine = create_engine("mysql+pymysql://root:123456@127.0.0.1:3306/db1?charset=utf8")
 sql = "select * from user"
 df = pd.read_sql(sql,engine,chunksize=2)
 for piece in df:
     print(piece)
     pd.io.sql.to_sql(piece, "user_copy", engine, flavor='mysql', if_exists='append')
 
  
  
  
 
 
 
 

pandas 选取数据 iloc和 loc的用法不太一样，iloc是根据索
引， loc是根据行的数值


 >>> import pandas as pd
 >>> import os
 >>> os.chdir("D:\\")
 >>> d = pd.read_csv("GWAS_water.qassoc", delimiter= "\s+")
 >>> d.loc[1:3]
    CHR SNP   BP  NMISS    BETA      SE       R2      T       P
 1    1   .  447     44  0.1800  0.1783  0.02369  1.009  0.3185
 2    1   .  449     44  0.2785  0.2473  0.02931  1.126  0.2665
 3    1   .  452     44  0.1800  0.1783  0.02369  1.009  0.3185
  
 >>> d.loc[0:3]
    CHR SNP   BP  NMISS    BETA      SE       R2      T       P
 0    1   .  410     44  0.2157  0.1772  0.03406  1.217  0.2304
 1    1   .  447     44  0.1800  0.1783  0.02369  1.009  0.3185
 2    1   .  449     44  0.2785  0.2473  0.02931  1.126  0.2665
 3    1   .  452     44  0.1800  0.1783  0.02369  1.009  0.3185
  
 >>> d.iloc[0:3]
    CHR SNP   BP  NMISS    BETA      SE       R2      T       P
 0    1   .  410     44  0.2157  0.1772  0.03406  1.217  0.2304
 1    1   .  447     44  0.1800  0.1783  0.02369  1.009  0.3185
 2    1   .  449     44  0.2785  0.2473  0.02931  1.126  0.2665
  
  
 >>> d.iloc[1:3,2]
 1    447
 2    449
 Name: BP, dtype: int64
  
 >>> d.iloc[0:3,2]
 0    410
 1    447
 2    449
 Name: BP, dtype: int64
  
 >>> d.head()
    CHR SNP   BP  NMISS    BETA      SE       R2       T       P
 0    1   .  410     44  0.2157  0.1772  0.03406  1.2170  0.2304
 1    1   .  447     44  0.1800  0.1783  0.02369  1.0090  0.3185
 2    1   .  449     44  0.2785  0.2473  0.02931  1.1260  0.2665
 3    1   .  452     44  0.1800  0.1783  0.02369  1.0090  0.3185
 4    1   .  462     44  0.2548  0.2744  0.02012  0.9286  0.3584
  
 >>> d.tail(3)
         CHR SNP        BP  NMISS    BETA      SE       R2       T      P
 418704   12   .  19345588     44 -0.2207  0.2558  0.01743 -0.8631  0.393
 418705   12   .  19345598     44 -0.2207  0.2558  0.01743 -0.8631  0.393
 418706   12   .  19345611     44 -0.2207  0.2558  0.01743 -0.8631  0.393
  
 >>> d.describe()
                  CHR            BP     NMISS          BETA            SE  \
 count  418707.000000  4.187070e+05  418707.0  4.186820e+05  418682.00000
 mean        5.805738  1.442822e+07      44.0 -4.271777e-03       0.21433
 std         3.392930  8.933882e+06       0.0  2.330019e-01       0.05190
 min         1.000000  4.100000e+02      44.0 -1.610000e+00       0.10130
 25%         3.000000  7.345860e+06      44.0 -1.638000e-01       0.17320
 50%         5.000000  1.371612e+07      44.0 -1.826000e-16       0.20670
 75%         9.000000  2.051322e+07      44.0  1.391000e-01       0.25010
 max        12.000000  4.238896e+07      44.0  1.467000e+00       0.67580
  
                   R2             T             P
 count  418682.000000  4.186820e+05  4.186820e+05
 mean        0.026268 -1.910774e-02  4.772397e-01
 std         0.035903  1.095115e+00  2.944290e-01
 min         0.000000 -5.582000e+00  2.034000e-08
 25%         0.002969 -7.955000e-01  2.179000e-01
 50%         0.012930 -8.468000e-16  4.624000e-01
 75%         0.035910  6.712000e-01  7.254000e-01
 max         0.531200  6.898000e+00  1.000000e+00
  
 >>> d.sort_values(by="P").iloc[0:15]
         CHR SNP        BP  NMISS    BETA      SE      R2      T             P
 42870     1   .  32316680     44  1.1870  0.1721  0.5312  6.898  2.034000e-08
 29301     1   .  22184568     44  1.1870  0.1721  0.5312  6.898  2.034000e-08
 29302     1   .  22184590     44  1.1870  0.1721  0.5312  6.898  2.034000e-08
 29306     1   .  22184654     44  1.1870  0.1721  0.5312  6.898  2.034000e-08
 29305     1   .  22184628     44  1.1870  0.1721  0.5312  6.898  2.034000e-08
 29304     1   .  22184624     44  1.1870  0.1721  0.5312  6.898  2.034000e-08
 112212    3   .  14365699     44  1.4670  0.2255  0.5018  6.504  7.490000e-08
 29254     1   .  22167448     44  1.0780  0.1723  0.4822  6.254  1.713000e-07
 69291     2   .   9480651     44  1.1140  0.1829  0.4690  6.091  2.939000e-07
 29299     1   .  22180991     44  0.8527  0.1458  0.4488  5.848  6.574000e-07
 101391    3   .   6959715     44  0.6782  0.1166  0.4462  5.817  7.285000e-07
 29333     1   .  22198267     44  0.9252  0.1616  0.4383  5.724  9.888000e-07
 195513    5   .  20178388     44  1.0350  0.1817  0.4359  5.697  1.082000e-06
 29295     1   .  22180901     44  0.7469  0.1320  0.4324  5.657  1.236000e-06
 29300     1   .  22181119     44  0.7469  0.1320  0.4324  5.657  1.236000e-06
 >>> sort_D = d.sort_values(by="P").iloc[0:5]
 >>> m_D = d.dropna()           #remove NA
  
 >>> sort_C = d.sort_values(["P","CHR", "BP"])
 >>> sort_C.to_csv(file_name, sep='\t', encoding='utf-8')
  
  
 >>> d.sort_values(by="C", ascending=True)
  
  
 >>> sort_D.to_csv("result.txt", sep= " ")
 >>> sort_D.to_csv("result_no_index.txt", sep= " ", index=False)
 >>>




 参考


 for m, i in enumerate(list(range(1,10))):    
     for n, j in enumerate(list(range(m+1,10))):    
         print i * j 










 安装： 


      pip install pandas
 导入:     


     import pandas as pd

     from pandas import Series,DataFrame


 #Series


 数据类型： Series,DataFrame


 Series：与numpy中的一维数组相似


 初始化： 
  方式一：


     data = [1,2,3,4,5]    #一般为序列
     series_data = Series(data)  #不传入任何参数,索引默认从0开始
  方式二：


     indexes = ['name','shuxue','yuwen','huaxue','yingyu']
     series_data =Series(['lizhen',1,2,3,4],index=indexes)  #索引为指定的索引值,此时索引为指定的值，索引的长度与值的长度一定要相等

  方式三：


     data = {'huaxue': 3, 'name': 'lizhen', 'shuxue': 1, 'yingyu': 4, 'yuwen': 2}
     series_from_dict = Series(data)

  查看索引：series_data.index

  根据索引修改值： series_data.'shuxue' = 3

  查看全部数据：series_data.values

  设置数据名称： series_data.index.name = 'type'

  根据索引查找列的值： series_data['yuwen']

  获取多个索引的值：  series_data[['yingyu','yuwen']]

  导出数据到指定格式(dict,clipboard,csv,json,string,sql)：

     series_from_dict.to_dict()

  两个Series相加：

     具有相同的索引才可以相加, 当索引不同时,相加的结果为 NaN

     只有值为整数时才有意义

  判断索引是否存在：

     index_name in series_data   #返回True 或者 False

 #DataFrame类似表或电子表格

     初始化时传入等长列表或numpy数组组成的字典，自动增加索引，且全部列都会被有序排列


   方式一：


    data = {'state': ['Ohio','Ohio','Ohio'],
     'year': [2000,2001,2002],
     'pop': [1.5,1.7,3.6]
     }

     frame = DataFrame(data)  #


  方式二：


     data = {'state': ['Ohio','Ohio','Ohio'],
     'year': [2000,2001,2002],
     'pop': [1.5,1.7,3.6]
     }

     frame = DataFrame(data,columns=['year','state','pop','debt'],index=['one','two','three'])  

     #数据展示按照column指定的格式

     #若传入的列未找到,默认为NaN

  方式三：


     data = {'Nevada': {2001:2.4,2002:2.9},
     'Ohio':{2000:1.5,2001s:1.7,2002:2.4},
     }
     frame = DataFrame(data)

     #外层key解释为column name, 内层key解释为 index name, 内层key不存在时,对应的column默认NaN补齐
  设置索引的名称： frame.idnex.name = 'self_index_name'
  设置列的名称：  frame.columns.name = 'self_columns_name'
  查看所有的值：   frame.values
  查看所有的列名： frame.columns
  查看指定列的值：frame[column_name] 或 frame.column_name
  查看前N行的值： frame.head(n)
  查看后N行值： frame.tail(n)
  查看指定索引行的值： frame.ix[[index_name1[,index_name2]]]
  修改指定列的值： frame['column_name'] = 'new_value'
     注意：当指定的值为单一值时, 会自动在所有的行上广播
           指定多个值时, 长度需要和frame的行的长度相等
           指定的值可以为Series, Series的索引必须与frame的索引名称相同,索引名不同时，默认插入NaN
  删除不需要的列： del frame['column_name']
  注意: 索引的名称无法更改








 在使用pandas框架的DataFrame的过程中，如果需要处理一些字符串的特性，例如判断某列是否包含一些关键字，某列的字符长度是否小于3等等这种需求，如果掌握str列内置的方法，处理起来会方便很多。


 下面我们来详细了解一下，Series类的str自带的方法有哪些。



 1、cat() 拼接字符串
         例子：
         >>> Series(['a', 'b', 'c']).str.cat(['A', 'B', 'C'], sep=',')
         0 a,A
         1 b,B
         2 c,C
         dtype: object
         >>> Series(['a', 'b', 'c']).str.cat(sep=',')
         'a,b,c'
         >>> Series(['a', 'b']).str.cat([['x', 'y'], ['1', '2']], sep=',')
         0    a,x,1
         1    b,y,2
         dtype: object


 2、split() 切分字符串
         >>> import numpy,pandas;
         >>> s = pandas.Series(['a_b_c', 'c_d_e', numpy.nan, 'f_g_h'])
         >>> s.str.split('_')
         0    [a, b, c]
         1    [c, d, e]
         2          NaN
         3    [f, g, h]
         dtype: object
         >>> s.str.split('_', -1)
         0    [a, b, c]
         1    [c, d, e]
         2          NaN
         3    [f, g, h]
         dtype: object
         >>> s.str.split('_', 0)
         0    [a, b, c]
         1    [c, d, e]
         2          NaN
         3    [f, g, h]
         dtype: object
         >>> s.str.split('_', 1)
         0    [a, b_c]
         1    [c, d_e]
         2         NaN
         3    [f, g_h]
         dtype: object
         >>> s.str.split('_', 2)
         0    [a, b, c]
         1    [c, d, e]
         2          NaN
         3    [f, g, h]
         dtype: object
         >>> s.str.split('_', 3)
         0    [a, b, c]
         1    [c, d, e]
         2          NaN
         3    [f, g, h]
         dtype: object


 3、get() 获取指定位置的字符串


         >>> s.str.get(0)
         0      a
         1      c
         2    NaN
         3      f
         dtype: object
         >>> s.str.get(1)
         0      _
         1      _
         2    NaN
         3      _
         dtype: object
         >>> s.str.get(2)
         0      b
         1      d
         2    NaN
         3      g
         dtype: object


 4、join() 对每个字符都用给点的字符串拼接起来，不常用


         >>> s.str.join("!")
         0    a!_!b!_!c
         1    c!_!d!_!e
         2          NaN
         3    f!_!g!_!h
         dtype: object
         >>> s.str.join("?")
         0    a?_?b?_?c
         1    c?_?d?_?e
         2          NaN
         3    f?_?g?_?h
         dtype: object
         >>> s.str.join(".")
         0    a._.b._.c
         1    c._.d._.e
         2          NaN
         3    f._.g._.h
         dtype: object


 5、contains() 是否包含表达式


         >>> s.str.contains('d')
         0    False
         1     True
         2      NaN
         3    False
         dtype: object


 6、replace() 替换


         >>> s.str.replace("_", ".")
         0    a.b.c
         1    c.d.e
         2      NaN
         3    f.g.h
         dtype: object


 7、repeat() 重复


         >>> s.str.repeat(3)
         0    a_b_ca_b_ca_b_c
         1    c_d_ec_d_ec_d_e
         2                NaN
         3    f_g_hf_g_hf_g_h
         dtype: object


 8、pad() 左右补齐


 >>> s.str.pad(10, fillchar="?")
 0    ?????a_b_c
 1    ?????c_d_e
 2           NaN
 3    ?????f_g_h
 dtype: object
 >>>
 >>> s.str.pad(10, side="right", fillchar="?")
 0    a_b_c?????
 1    c_d_e?????
 2           NaN
 3    f_g_h?????
 dtype: object


 9、center() 中间补齐，看例子
 >>> s.str.center(10, fillchar="?")
 0    ??a_b_c???
 1    ??c_d_e???
 2           NaN
 3    ??f_g_h???
 dtype: object


 10、ljust() 右边补齐，看例子


 >>> s.str.ljust(10, fillchar="?")
 0    a_b_c?????
 1    c_d_e?????
 2           NaN
 3    f_g_h?????
 dtype: object


 11、rjust() 左边补齐，看例子


 >>> s.str.rjust(10, fillchar="?")
 0    ?????a_b_c
 1    ?????c_d_e
 2           NaN
 3    ?????f_g_h
 dtype: object


 12、zfill() 左边补0


 >>> s.str.zfill(10)
 0    00000a_b_c
 1    00000c_d_e
 2           NaN
 3    00000f_g_h
 dtype: object


 13、wrap() 在指定的位置加回车符号


 >>> s.str.wrap(3)
 0    a_b\n_c
 1    c_d\n_e
 2        NaN
 3    f_g\n_h
 dtype: object


 14、slice() 按给点的开始结束位置切割字符串
 >>> s.str.slice(1,3)
 0     _b
 1     _d
 2    NaN
 3     _g
 dtype: object


 15、slice_replace() 使用给定的字符串，替换指定的位置的字符
 >>> s.str.slice_replace(1, 3, "?")
 0    a?_c
 1    c?_e
 2     NaN
 3    f?_h
 dtype: object
 >>> s.str.slice_replace(1, 3, "??")
 0    a??_c
 1    c??_e
 2      NaN
 3    f??_h
 dtype: object


 16、count() 计算给定单词出现的次数
 >>> s.str.count("a")
 0     1
 1     0
 2   NaN
 3     0
 dtype: float64


 17、startswith() 判断是否以给定的字符串开头
 >>> s.str.startswith("a");
 0     True
 1    False
 2      NaN
 3    False
 dtype: object


 18、endswith() 判断是否以给定的字符串结束
 >>> s.str.endswith("e");
 0    False
 1     True
 2      NaN
 3    False
 dtype: object


 19、findall() 查找所有符合正则表达式的字符，以数组形式返回
 >>> s.str.findall("[a-z]");
 0    [a, b, c]
 1    [c, d, e]
 2          NaN
 3    [f, g, h]
 dtype: object


 20、match() 检测是否全部匹配给点的字符串或者表达式
 >>> s
 0    a_b_c
 1    c_d_e
 2      NaN
 3    f_g_h
 dtype: object
 >>> s.str.match("[d-z]");
 0    False
 1    False
 2      NaN
 3     True
 dtype: object


 21、extract() 抽取匹配的字符串出来，注意要加上括号，把你需要抽取的东西标注上
 >>> s.str.extract("([d-z])");
 0    NaN
 1      d
 2    NaN
 3      f
 dtype: object


 22、len() 计算字符串的长度
 >>> s.str.len()
 0     5
 1     5
 2   NaN
 3     5
 dtype: float64 


 23、strip() 去除前后的空白字符
 >>> idx = pandas.Series([' jack', 'jill ', ' jesse ', 'frank'])
 >>> idx.str.strip()
 0     jack
 1     jill
 2    jesse
 3    frank
 dtype: object


 24、rstrip() 去除后面的空白字符

 25、lstrip() 去除前面的空白字符

 26、partition() 把字符串数组切割称为DataFrame，注意切割只是切割称为三部分，分隔符前，分隔符，分隔符后

 27、rpartition() 从右切起
 >>> s.str.partition('_')
  0    1    2
 0    a    _  b_c
 1    c    _  d_e
 2  NaN  NaN  NaN
 3    f    _  g_h
 >>> s.str.rpartition('_')
  0    1    2
 0  a_b    _    c
 1  c_d    _    e
 2  NaN  NaN  NaN
 3  f_g    _    h


 28、lower() 全部小写
 29、upper() 全部大写
 30、find() 从左边开始，查找给定字符串的所在位置
 >>> s.str.find('d')
 0    -1
 1     2
 2   NaN
 3    -1
 dtype: float64


 31、rfind() 从右边开始，查找给定字符串的所在位置

 32、index() 查找给定字符串的位置，注意，如果不存在这个字符串，那么会报错！

 33、rindex() 从右边开始查找，给定字符串的位置

 >>> s.str.index('_')
 0     1
 1     1
 2   NaN
 3     1
 dtype: float64
 34、capitalize() 首字符大写
 >>> s.str.capitalize()
 0    A_b_c
 1    C_d_e
 2      NaN
 3    F_g_h
 dtype: object
 35、swapcase() 大小写互换
 >>> s.str.swapcase()
 0    A_B_C
 1    C_D_E
 2      NaN
 3    F_G_H
 dtype: object
 36、normalize() 序列化数据，数据分析很少用到，咱们就不研究了

 37、isalnum() 是否全部是数字和字母组成

 >>> s.str.isalnum()
 0    False
 1    False
 2      NaN
 3    False
 dtype: object

 38、isalpha() 是否全部是字母

 >>> s.str.isalpha()
 0    False
 1    False
 2      NaN
 3    False
 dtype: object

 39、isdigit() 是否全部都是数字

 >>> s.str.isdigit()
 0    False
 1    False
 2      NaN
 3    False
 dtype: object

 40、isspace() 是否空格

 >>> s.str.isspace()
 0    False
 1    False
 2      NaN
 3    False
 dtype: object

 41、islower() 是否全部小写

 42、isupper() 是否全部大写

 >>> s.str.islower()
 0    True
 1    True
 2     NaN
 3    True
 dtype: object
 >>> s.str.isupper()
 0    False
 1    False
 2      NaN
 3    False
 dtype: object

 43、istitle() 是否只有首字母为大写，其他字母为小写

 >>> s.str.istitle()
 0    False
 1    False
 2      NaN
 3    False
 dtype: object
 44、isnumeric() 是否是数字
 45、isdecimal() 是否全是数字



 pandas获取列数据位常用功能，但在写法上还有些要注意的地方，在这里总结一下：




 import pandas as pd
 data1 = pd.DataFrame(...) #任意初始化一个列数为3的DataFrame
 data1.columns=['a', 'b', 'c']
  
 1.
 data1['b']
 #这里取到第2列（即b列）的值
  
 2.
 data1.b
 #效果同1，取第2列（即b列）

 #这里b为列名称，但必须是连续字符串，不能有空格。如果列明有空格，则只能采取第1种方法
  
 3.
 data1[data1.columns[1:]]

 #这里取data1的第2列和第3列的所有数据
  
 番外1.
 data1[5:10]

 #这里取6到11行的所有数据，而不是列数据
  
 番外2.
 data_raw_by_tick[2]
 #非法，返回“KeyError: 2”



导出mysql数据，利用pandas生成excel文档，并发送邮件


 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 import pandas
 import pandas as pd
 import MySQLdb
 import MySQLdb.cursors
 import os
 import datetime
 from email.mime.text import MIMEText
 from email.mime.multipart import MIMEMultipart
 import smtplib
  
  
 #返回SQL结果的函数
 def retsql(sql):
     db_user = MySQLdb.connect('IP','用户名','密码','j数据库名(可以不指定)',cursorclass=MySQLdb.cursors.DictCursor(设置返回结果以字典的格式))
     cursor = db_user.cursor()
     cursor.execute("SET NAMES utf8;"(设置字符集为utf-8，不然在返回的结果中会显示乱码，即使数据库的编码设置就是utf-8)) 
     cursor.execute(sql)
     ret = cursor.fetchall()
     db_user.close()
  
     return ret
  
 #生成xls文件的函数
 def retxls(ret,dt):
     file_name = datetime.datetime.now().strftime("/path/to/store/%Y-%m-%d-%H:%M") + dt + ".sql.xlsx"
     dret = pd.DataFrame.from_records(ret)
     dret.to_excel(filename,"Sheet1",engine="openpyxl"）###z注意openpyxl这个库可能在生成xls的时候出错，pip install openpyxls==1.8.6，其他版本似乎与pandas有点冲突，安装1.8.6的即可
  
     print "Ok!!! the file in",file_name
     return filename
  
 #发送邮件的函数
 ##传入主题，显示名，目标邮箱，附件名
 def sendm(sub,cttstr,to_list,file):
     msg = MIMEMultipart()
     att = MIMEText(open(file,'rb').read(),"base64","utf-8")
     att["Content-Type"] = "application/octet-stream"
     att["Content-Disposition"] = 'attachment; filename="sql查询结果.xlsx"'
  
     msg['from'] = '发件人地址'
     msg['subject'] = sub
     ctt = MIMEText(cttstr,'plain','utf-8')
  
     msg.attach(att)
     msg.attach(ctt)
     try:
         server = smtplib.SMTP()
         #server.set_debuglevel(1)  ###如果问题可打开此选项以便调试
         server.connect("mail.example.com",'25')
         server.starttls()   ###如果开启了ssl或者tls加密，开启加密
         server.login("可用邮箱用户名","密码")
         server.sendmail(msg['from'],to_list,msg.as_string())
         server.quit()
         print 'ok!!!'
     except Exception,e:
         print str(e)
  
  
 ###想要查询的sql语句
 sql="""sql语句"""
  
  
  
 #接收邮件的用户列表
 to_list = ['test1@example.com',
  'test2@example.com']
  
  
  
 #执行sql并将结果传递给ret
 ret = retsql(sql)
  
 #将结果文件路径结果传给retfile
 retfile = retxls(ret,"1")
  
  
 #发送邮件
 #发送sql语句内容
 sendm(sub1,sub1,to_list,retfile1)





Python之ipython、notebook、matplotlib安装使用


 #!/usr/bin/python
 # -*- coding: UTF-8 -*-


 以下进行逐步安装配置
 python 3.5.2, ipython 5.1.0, jupyter notebook, matplotlib


 1、安装python3.5


 具体安装请参考官方文档。安装程序时注意勾选配置环境变量。https://www.python.org/downloads/windows/


 2、升级pip


python -m pip install --upgrade pip


 3、使用pip安装ipython

 pip.exe install ipython




 4、使用pip安装notebook


pip install notebook




 5、安装画图工具 matplotlib

pip install matplotlib

 pip install matplotlib --upgrade


 6、实例


 import numpy as np

 import matplotlib.pyplot as plt

 N = 5
 menMeans = (20, 35, 30, 35, 27)
 menStd =   (2, 3, 4, 1, 2)
 ind = np.arange(N)  # the x locations for the groups
 width = 0.35       # the width of the bars
 fig, ax = plt.subplots()
 rects1 = ax.bar(ind, menMeans, width, color='r', yerr=menStd)
 womenMeans = (25, 32, 34, 20, 25)
 womenStd =   (3, 5, 2, 3, 3)
 rects2 = ax.bar(ind+width, womenMeans, width, color='y', yerr=womenStd)
 # add some
ax.set_ylabel('Scores')
 ax.set_title('Scores by group and gender')
ax.set_xticks(ind+width)
 ax.set_xticklabels( ('G1', 'G2', 'G3', 'G4', 'G5') )
ax.legend( (rects1[0], rects2[0]), ('Men', 'Women') )
 def autolabel(rects):
     # attach some text labels
     for rect in rects:
         height = rect.get_height()
         ax.text(rect.get_x()+rect.get_width()/2., 1.05*height, '%d'%int(height),
                 ha='center', va='bottom')
 autolabel(rects1)
 autolabel(rects2)
 plt.show()




 import numpy as np
 import matplotlib.pyplot as plt
 x = np.arange(9)
 y = np.sin(x)
 plt.plot(x,y)
 plt.show()



 import matplotlib.pyplot as plt 
   
plt.bar(left = 0,height = 1)
 plt.show()




 首先我们import了matplotlib.pyplot ，然后直接调用其bar方法，最后用show显示图像。

 我解释一下bar中的两个参数：

 left：柱形的左边缘的位置，如果我们指定1那么当前柱形的左边缘的x值就是1.0了

 height：这是柱形的高度，也就是Y轴的值了

 left，height除了可以使用单独的值（此时是一个柱形），也可以使用元组来替换（此时代表多个矩形）。例如，下面的例子：


 import matplotlib.pyplot as plt
   
plt.bar(left = (0,1),height = (1,0.5))
 plt.show()




 可以看到 left = (0,1)的意思就是总共有两个矩形，第一个的左边缘为0，第二个的左边缘为1。height参数同理。
 当然，可能你还觉得这两个矩形“太胖”了。此时我们可以通过指定bar的width参数来设置它们的宽度。


 import matplotlib.pyplot as plt
   
 plt.bar(left = (0,1),height = (1,0.5),width = 0.35)
 plt.show()




 此时又来需求了，我需要标明x，y轴的说明。比如x轴是性别，y轴是人数。实现也很简单，看代码：


 import matplotlib.pyplot as plt
   
plt.xlabel(u'性别')

plt.ylabel(u'人数')

plt.bar(left = (0,1),height = (1,0.5),width = 0.35)

 plt.show()




 注意这里的中文一定要用u（3.0以上好像不用，我用的2.7），因为matplotlib只支持unicode。接下来，让我们在x轴上的每个bar进行说明。比如第一个是“男”，第二个是“女”。


 import matplotlib.pyplot as plt
   
 plt.xlabel(u'性别')

 plt.ylabel(u'人数')
   
plt.xticks((0,1),(u'男',u'女'))
   
 plt.bar(left = (0,1),height = (1,0.5),width = 0.35)
   
 plt.show()


 plt.xticks的用法和我们前面说到的left,height的用法差不多。如果你有几个bar，那么就是几维的元组。第一个是文字的位置，第二个是具体的文字说明。不过这里有个问题，很显然我们指定的位置有些“偏移”，最理想的状态应该在每个矩形的中间。你可以更改(0,1)=>( (0+0.35)/2 ,(1+0.35)/2 )不过这样比较麻烦。我们可以通过直接指定bar方法里面的align="center"就可以让文字居中了。


 import matplotlib.pyplot as plt
   
 plt.xlabel(u'性别')

 plt.ylabel(u'人数')
   
plt.xticks((0,1),(u'男',u'女'))
   
 plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")
   
 plt.show()


 接下来，我们还可以给图标加入标题。当然，还有图例也少不掉:


 import matplotlib.pyplot as plt
   
 plt.xlabel(u'性别')

 plt.ylabel(u'人数')
   
   
plt.title(u"性别比例分析")

plt.xticks((0,1),(u'男',u'女'))

 rect = plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")
   
plt.legend((rect,),(u"图例",))
   
 plt.show()


注意这里的legend方法，里面的参数必须是元组。即使你只有一个图例，不然显示不正确。

 接下来，我们还可以在每个矩形的上面标注它具体点Y值。这里，我们需要用到一个通用的方法：


 def autolabel(rects):
     for rect in rects:
         height = rect.get_height()
         plt.text(rect.get_x()+rect.get_width()/2., 1.03*height, '%s' % float(height))


 其中plt.text的参数分别是：x坐标，y坐标，要显示的文字。所以，调用代码如下：


 import matplotlib.pyplot as plt
   
 def autolabel(rects):
     for rect in rects:
         height = rect.get_height()

         plt.text(rect.get_x()+rect.get_width()/2., 1.03*height, '%s' % float(height))
   
plt.xlabel(u'性别')

plt.ylabel(u'人数')
   
   
plt.title(u"性别比例分析")

plt.xticks((0,1),(u'男',u'女'))

 rect = plt.bar(left = (0,1),height = (1,0.5),width = 0.35,align="center")
   
plt.legend((rect,),(u"图例",))

 autolabel(rect)
   
 plt.show()





 matplotlib所绘制的图表的每个组成部分都和一个对象对应，我们可以通过调用这些对象的属性设置方法set_*()或者pyplot模块的属性设置函数setp()设置它们的属性值。


 因为matplotlib实际上是一套面向对象的绘图库，因此也可以直接获取对象的属性


 配置文件


 绘制一幅图需要对许多对象的属性进行配置，例如颜色、字体、线型等等。我们在绘图时，并没有逐一对这些属性进行配置，许多都直接采用了matplotlib的缺省配置。


 matplotlib将这些缺省配置保存在一个名为“matplotlibrc”的配置文件中，通过修改配置文件，我们可以修改图表的缺省样式。配置文件的读入可以使用rc_params()，它返回一个配置字典；在matplotlib模块载入时会调用rc_params()，并把得到的配置字典保存到rcParams变量中；matplotlib将使用rcParams字典中的配置进行绘图；用户可以直接修改此字典中的配置，所做的改变会反映到此后创建的绘图元素。


 绘制多子图（快速绘图）


 Matplotlib 里的常用类的包含关系为 Figure -> Axes -> (Line2D, Text, etc.)一个Figure对象可以包含多个子图(Axes)，在matplotlib中用Axes对象表示一个绘图区域，可以理解为子图。


可以使用subplot()快速绘制包含多个子图的图表，它的调用形式如下：


 subplot(numRows, numCols, plotNum)

subplot将整个绘图区域等分为numRows行* numCols列个子区域，然后按照从左到右，从上到下的顺序对每个子区域进行编号，左上的子区域的编号为1。


如果numRows，numCols和plotNum这三个数都小于10的话，可以把它们缩写为一个整数，例如subplot(323)和subplot(3,2,3)是相同的。


subplot在plotNum指定的区域中创建一个轴对象。如果新创建的轴和之前创建的轴重叠的话，之前的轴将被删除。




 subplot()返回它所创建的Axes对象，我们可以将它用变量保存起来，然后用sca()交替让它们成为当前Axes对象，并调用plot()在其中绘图。


 绘制多图表（快速绘图）


如果需要同时绘制多幅图表，可以给figure()传递一个整数参数指定Figure对象的序号，如果序号所指定的Figure对象已经存在，将不创建新的对象，而只是让它成为当前的Figure对象。


 import numpy as np

 import matplotlib.pyplot as plt
  
plt.figure(1) # 创建图表1

plt.figure(2) # 创建图表2

 ax1 = plt.subplot(211) # 在图表2中创建子图1

 ax2 = plt.subplot(212) # 在图表2中创建子图2
  
 x = np.linspace(0, 3, 100)

 for i in xrange(5):

     plt.figure(1)  # # 选择图表1

     plt.plot(x, np.exp(i*x/3))

     plt.sca(ax1)   # # 选择图表2的子图1

     plt.plot(x, np.sin(i*x))

     plt.sca(ax2)  # 选择图表2的子图2

     plt.plot(x, np.cos(i*x))
  
 plt.show()










 在图表中显示中文


 matplotlib的缺省配置文件中所使用的字体无法正确显示中文。为了让图表能正确显示中文，可以有几种解决方案。


 在程序中直接指定字体。
 在程序开头修改配置字典rcParams。
 修改配置文件。
 比较简便的方式是，中文字符串用unicode格式，例如：u''测试中文显示''，代码文件编码使用utf-8 加上" # coding = utf-8  "一行。


 matplotlib输出图象的中文显示问题


 面向对象画图


 matplotlib API包含有三层，Artist层处理所有的高层结构，例如处理图表、文字和曲线等的绘制和布局。通常我们只和Artist打交道，而不需要关心底层的绘制细节。


 直接使用Artists创建图表的标准流程如下：


 创建Figure对象

 用Figure对象创建一个或者多个Axes或者Subplot对象

 调用Axies等对象的方法创建各种简单类型的Artists


 import matplotlib.pyplot as plt


 X1 = range(0, 50) Y1 = [num**2 for num in X1] # y = x^2 X2 = [0, 1] Y2 = [0, 1] # y = x


  
 Fig = plt.figure(figsize=(8,4)) # Create a `figure' instance 

 Ax = Fig.add_subplot(111) # Create a `axes' instance in the figure 

Ax.plot(X1, Y1, X2, Y2) # Create a Line2D instance in the axes

  
 Fig.show() 

 Fig.savefig("test.pdf")



 matplotlib还提供了一个名为pylab的模块，其中包括了许多NumPy和pyplot模块中常用的函数，方便用户快速进行计算和绘图，十分适合在IPython交互式环境中使用。这里使用下面的方式载入pylab模块：


 >>> import pylab as pl

 1 安装numpy和matplotlib


 >>> import numpy

 >>> numpy.__version__


 >>> import matplotlib

 >>> matplotlib.__version__


 2 两种常用图类型：Line and scatter plots(使用plot()命令), histogram(使用hist()命令)


 2.1 折线图&散点图 Line and scatter plots


 2.1.1 折线图 Line plots(关联一组x和y值的直线)


 import numpy as np

 import pylab as pl
  
 x = [1, 2, 3, 4, 5]
 y = [1, 4, 9, 16, 25]
  
 pl.plot(x, y)

 pl.show()



 2.1.2 散点图 Scatter plots


 把pl.plot(x, y)改成pl.plot(x, y, 'o')即可，下图的蓝色版本


  


 2.2  美化 Making things look pretty


 2.2.1 线条颜色 Changing the line color


 红色：把pl.plot(x, y, 'o')改成pl.plot(x, y, ’or’)


 2.2.2 线条样式 Changing the line style


虚线:plot(x,y, '--')


 2.2.3 marker样式 Changing the marker style


蓝色星型markers：plot(x,y, ’b*’)


 2.2.4 图和轴标题以及轴坐标限度 Plot and axis titles and limits


 import numpy as np

 import pylab as pl
  
 x = [1, 2, 3, 4, 5]# Make an array of x values

 y = [1, 4, 9, 16, 25]# Make an array of y values for each x value

pl.plot(x, y)# use pylab to plot x and y
  
pl.title(’Plot of y vs. x’)# give plot a title

pl.xlabel(’x axis’)# make axis labels

pl.ylabel(’y axis’)
  
pl.xlim(0.0, 7.0)# set axis limits

pl.ylim(0.0, 30.)
  
 pl.show()# show the plot on the screen




 2.2.5 在一个坐标系上绘制多个图 Plotting more than one plot on the same set of axes


 做法是很直接的，依次作图即可:


 import numpy as np

 import pylab as pl
  
 x1 = [1, 2, 3, 4, 5]# Make x, y arrays for each graph
 y1 = [1, 4, 9, 16, 25]
 x2 = [1, 2, 4, 6, 8]
 y2 = [2, 4, 8, 12, 16]
  
pl.plot(x1, y1, ’r’)# use pylab to plot x and y
 pl.plot(x2, y2, ’g’)
  
pl.title(’Plot of y vs. x’)# give plot a title

pl.xlabel(’x axis’)# make axis labels

pl.ylabel(’y axis’)
  
  
pl.xlim(0.0, 9.0)# set axis limits

pl.ylim(0.0, 30.)
  
  
 pl.show()# show the plot on the screen








 2.2.6  图例 Figure legends


pl.legend((plot1, plot2), (’label1, label2’), 'best’, numpoints=1)


 其中第三个参数表示图例放置的位置:'best’‘upper right’, ‘upper left’, ‘center’, ‘lower left’, ‘lower right’.


 如果在当前figure里plot的时候已经指定了label，如plt.plot(x,z,label="cos(x2)")，直接调用plt.legend()就可以了哦。


 import numpy as np
 import pylab as pl
  
 x1 = [1, 2, 3, 4, 5]# Make x, y arrays for each graph
 y1 = [1, 4, 9, 16, 25]
 x2 = [1, 2, 4, 6, 8]
 y2 = [2, 4, 8, 12, 16]
  
 plot1 = pl.plot(x1, y1, ’r’)# use pylab to plot x and y : Give your plots names
 plot2 = pl.plot(x2, y2, ’go’)
  
 pl.title(’Plot of y vs. x’)# give plot a title
 pl.xlabel(’x axis’)# make axis labels
 pl.ylabel(’y axis’)
  
  
 pl.xlim(0.0, 9.0)# set axis limits
 pl.ylim(0.0, 30.)
  
  
pl.legend([plot1, plot2], (’red line’, ’green circles’), ’best’, numpoints=1)     # make legend

 pl.show()# show the plot on the screen






 2.3 直方图 Histograms


 import numpy as np
 import pylab as pl
  
 # make an array of random numbers with a gaussian distribution with
 # mean = 5.0
 # rms = 3.0
 # number of points = 1000

 data = np.random.normal(5.0, 3.0, 1000)
  
 # make a histogram of the data array

pl.hist(data)
  
 # make plot labels

pl.xlabel(’data’)

 pl.show()

 如果不想要黑色轮廓可以改为pl.hist(data, histtype=’stepfilled’)




 2.3.1 自定义直方图bin宽度 Setting the width of the histogram bins manually


 增加这两行


 bins = np.arange(-5., 16., 1.) #浮点数版本的range

pl.hist(data, bins, histtype=’stepfilled’)




 3 同一画板上绘制多幅子图 Plotting more than one axis per canvas


 如果需要同时绘制多幅图表的话，可以是给figure传递一个整数参数指定图标的序号，如果所指定
 序号的绘图对象已经存在的话，将不创建新的对象，而只是让它成为当前绘图对象。


 fig1 = pl.figure(1)
 pl.subplot(211)

 subplot(211)把绘图区域等分为2行*1列共两个区域, 然后在区域1(上区域)中创建一个轴对象. pl.subplot(212)在区域2(下区域)创建一个轴对象。




 import numpy as np
 import pylab as pl
  
 # Use numpy to load the data contained in the file
 # ’fakedata.txt’ into a 2-D array called data
 data = np.loadtxt(’fakedata.txt’)
  
 # plot the first column as x, and second column as y
 pl.plot(data[:,0], data[:,1], ’ro’)
 pl.xlabel(’x’)
 pl.ylabel(’y’)
 pl.xlim(0.0, 10.)
 pl.show()






 4.2 写入数据到文件 Writing data to a text file


 写文件的方法也很多，这里只介绍一种可用的写入文本文件的方法，更多的可以参考官方文档。


 import numpy as np
 # Let’s make 2 arrays (x, y) which we will write to a file
 # x is an array containing numbers 0 to 10, with intervals of 1

 x = np.arange(0.0, 10., 1.)

 # y is an array containing the values in x, squared

 y = x*x
 print ’x = ’, x
 print ’y = ’, y
  
 # Now open a file to write the data to
 # ’w’ means open for ’writing’
 file = open(’testdata.txt’, ’w’)
 # loop over each line you want to write to file
 for i in range(len(x)):
     # make a string for each line you want to write
     # ’\t’ means ’tab’
     # ’\n’ means ’newline’
     # ’str()’ means you are converting the quantity in brackets to a string type
     txt = str(x[i]) + ’\t’ + str(y[i]) + ’ \n’
     # write the txt to the file
     file.write(txt)
 # Close your file
 file.close()



 图例1


 import matplotlib.pyplot as plt; plt.rcdefaults()
 import numpy as np
 import matplotlib.pyplot as plt
  
  
 # Example data
 people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim')
 y_pos = np.arange(len(people))
 performance = 3 + 10 * np.random.rand(len(people))
 error = np.random.rand(len(people))
  
 #barh(bottom, width, height=0.8, left=0, **kwargs)
 plt.barh(y_pos, performance, xerr=error, height=0.8,align='center',alpha=0.4)
 plt.yticks(y_pos, people)
 plt.xlabel('Performance')
 plt.title('How fast do you want to go today?')
  
 plt.show()




 图例 2


 import numpy as np
 import matplotlib.pyplot as plt
 import pylab
 from matplotlib.ticker import MaxNLocator
  
 grade = 2
 day = '2014-06-22'  # Today in this year
  
 numTests = 5
 testNames = ['swap','memory', '/project', '/backup', '/root']
 testMeta = ['', '', '', '','']
 scores = [98,79, 39, 92,17]
 lastweek_scores = ['97%','35%','86%','21%','70%']
 #rankings = np.round(np.random.uniform(0, 1, numTests)*100, 0)
 rankings = 3 + 10 * np.random.rand(numTests)
  
  
 fig, ax1 = plt.subplots(figsize=(9, 7))
 plt.subplots_adjust(left=0.115, right=0.88)
 fig.canvas.set_window_title('Usage Chart')
 pos = np.arange(numTests)+0.5    # Center bars on the Y-axis ticks
 rects = ax1.barh(pos, scores, align='center', height=0.5, color='m')
  
 ax1.axis([0, 100, 0, 5])
 pylab.yticks(pos, testNames)
 ax1.set_title('Server 18.32 Usage Chart')
 plt.text(50, -0.5, 'date: ' + day,
          horizontalalignment='center', size='small')
  
 # Set the right-hand Y-axis ticks and labels and set X-axis tick marks at the
 # deciles
 ax2 = ax1.twinx()
 ax2.plot([100, 100], [0, 5], 'white', alpha=0.1)
 ax2.xaxis.set_major_locator(MaxNLocator(11))
 xticks = pylab.setp(ax2, xticklabels=['0', '10', '20', '30', '40', '50', '60',
                                       '70', '80', '90', '100'])
 ax2.xaxis.grid(True, linestyle='--', which='major', color='grey',
 alpha=0.25)
 #Plot a solid vertical gridline to highlight the median position
 plt.plot([50, 50], [0, 5], 'grey', alpha=0.25)
  
 # Build up the score labels for the right Y-axis by first appending a carriage
 # return to each string and then tacking on the appropriate meta information
 # (i.e., 'laps' vs 'seconds'). We want the labels centered on the ticks, so if
 # there is no meta info (like for pushups) then don't add the carriage return to
 # the string
  
  
 def withnew(i, scr):
     if testMeta[i] != '':
         return '%s\n' % scr
     else:
         return scr
  
 scoreLabels = [withnew(i, scr) for i, scr in enumerate(lastweek_scores)]
 scoreLabels = [i+j for i, j in zip(scoreLabels, testMeta)]
 # set the tick locations
 ax2.set_yticks(pos)
 # set the tick labels
 ax2.set_yticklabels(scoreLabels)
 # make sure that the limits are set equally on both yaxis so the ticks line up
 ax2.set_ylim(ax1.get_ylim())
  
  
 ax2.set_ylabel("Last Week's data",color='sienna')
 #Make list of numerical suffixes corresponding to position in a list
 #            0     1     2     3     4     5     6     7     8     9
 suffixes = ['%', '%', '%', '%', '%', '%', '%', '%', '%', '%']
 ax2.set_xlabel('Percentile Ranking Across ' + suffixes[grade]
               + ' Grade '  + 's')
  
 # Lastly, write in the ranking inside each bar to aid in interpretation
 for rect in rects:
     # Rectangle widths are already integer-valued but are floating
     # type, so it helps to remove the trailing decimal point and 0 by
     # converting width to int type
     width = int(rect.get_width())
  
     # Figure out what the last digit (width modulo 10) so we can add
     # the appropriate numerical suffix (e.g., 1st, 2nd, 3rd, etc)
     lastDigit = width % 10
     # Note that 11, 12, and 13 are special cases
     if (width == 11) or (width == 12) or (width == 13):
         suffix = 'th'
     else:
         suffix = suffixes[lastDigit]
  
     rankStr = str(width) + suffix
     if (width < 5):        # The bars aren't wide enough to print the ranking inside
         xloc = width + 1  # Shift the text to the right side of the right edge
         clr = 'black'      # Black against white background
         align = 'left'
     else:
         xloc = 0.98*width  # Shift the text to the left side of the right edge
         clr = 'white'      # White on magenta
         align = 'right'
  
     # Center the text vertically in the bar
     yloc = rect.get_y()+rect.get_height()/2.0
     ax1.text(xloc, yloc, rankStr, horizontalalignment=align,
             verticalalignment='center', color=clr, weight='bold')
  
 plt.show()




 python结合matplotlib，统计svn的代码提交量

 安装所需的依赖包

yum install -y  numpy matplotlib



  matplotlib.pyplot是一些命令行风格函数的集合，使matplotlib以类似于MATLAB的方式工作。每个pyplot函数对一幅图片(figure)做一些改动：比如创建新图片，在图片创建一个新的作图区域(plotting area)，在一个作图区域内画直线，给图添加标签(label)等。matplotlib.pyplot是有状态的，亦即它会保存当前图片和作图区域的状态，新的作图函数会作用在当前图片的状态基础之上。



 import matplotlib.pyplot as plt

plt.plot([1,2,3,4])

plt.ylabel('some numbers')

 plt.show()



  上图的X坐标是1-3，纵坐标是1-4，这是因为如果你只提供给plot()函数一个列表或数组，matplotlib会认为这是一串Y值(Y向量)，并且自动生成X值(X向量)。而Python一般是从0开始计数的，所以X向量有和Y向量一样的长度(此处是4)，但是是从0开始，所以X轴的值为[0,1,2,3]。



 也可以给plt.plot()函数传递多个序列(元组或列表)，每两个序列是一个X,Y向量对，在图中构成一条曲线，这样就会在同一个图里存在多条曲线。


  为了区分同一个图里的多条曲线，可以为每个X,Y向量对指定一个参数来标明该曲线的表现形式，默认的参数是'b-'，亦即蓝色的直线，如果想用红色的圆点来表示这条曲线，可以：


 import matplotlib.pyplot as plt

plt.plot([1,2,3,4],[1,4,9,16],'ro')

plt.axis([0,6,0,20])




  axis()函数接受形如[xmin,xmax,ymin,ymax]的参数，指定了X,Y轴坐标的范围。


  matplotlib不仅仅可以使用序列(列表和元组)作为参数，还可以使用numpy数组。实际上，所有的序列都被内在的转化为numpy数组。


 import numpy as np
 import matplotlib.pyplot as plt
 t=np,arange(0.,5.,0.2)
 plt.plot(t,t,'r--',t,t**2,'bs',t,t**3,'g^')



 控制曲线的属性


  曲线有许多我们可以设置的性质：曲线的宽度，虚线的风格，抗锯齿等等。有多种设置曲线属性的方法：


  1.使用关键词参数：


plt.plot(x,y,linewidth=2.0)

  2.使用Line2D实例的设置(Setter)方法。plot()返回的是曲线的列表，比如line1,line2=plot(x1,y1,x2,y2).我们取得plot()函数返回的曲线之后用Setter方法来设置曲线的属性。


line,=plt.plot(x,y,'-')

line.set)antialliased(False)  #关闭抗锯齿


  3.使用setp()命令：


 lines=plt.plot(x1,y1,x2,y2)

plt.setp(lines,color='r',linewidth=2.0)

 plt.setp(lines,'color','r','linewidth','2.0')



 处理多个图和Axe


  MATLAB和pyplot都有当前图和当前axe的概念。所有的作图命令都作用在当前axe。


 函数gca()返回当前axe，gcf()返回当前图。


 复制代码
 import numpy as np
 import matplotlib.pyplot as plt


 def f(t):
     return np.exp(-t) * np.cos(2*np.pi*t)


 t1 = np.arange(0.0, 5.0, 0.1)

 t2 = np.arange(0.0, 5.0, 0.02)


plt.figure(1)

plt.subplot(211)

 plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k')


plt.subplot(212)

 plt.plot(t2, np.cos(2*np.pi*t2), 'r--')




  figure()命令是可选的，因为figure(1)会被默认创建，subplot(111)也会被默认创建。

 subplot()命令会指定numrows,numcols,fignum，其中fignum的取值范围为从1到numrows*numcols。如果numrows*numcols小于10则subplot()命令中的逗号是可选的。所以subplot(2,1,1)与subplot(211)是完全一样的。


  如果你想手动放置axe，而不是放置在矩形方格内，则可以使用axes()命令，其中的参数为axes([left,bottom,width,height])，每个参数的取值范围为(0,1)。


  你可以使用多个figure()来创建多个图，每个图都可以有多个axe和subplot：


 复制代码
 import matplotlib.pyplot as plt
 plt.figure(1)                # the first figure
 plt.subplot(211)             # the first subplot in the first figure
 plt.plot([1,2,3])
 plt.subplot(212)             # the second subplot in the first figure
 plt.plot([4,5,6])




plt.figure(2)                # a second figure

 plt.plot([4,5,6])            # creates a subplot(111) by default


 plt.figure(1)                # figure 1 current; subplot(212) still current

 plt.subplot(211)             # make subplot(211) in figure1 current

 plt.title('Easy as 1,2,3')   # subplot 211 title

 复制代码
  你可以使用clf()和cla()命令来清空当前figure和当前axe。


  如果你创建了许多图，你需要显示的使用close()命令来释放该图所占用的内存，仅仅关闭显示在屏幕上的图是不会释放内存空间的。


 处理文本


  text()命令可以用来在任意位置上添加文本，xlabel(),ylabel(),title()可以用来在X轴，Y轴，标题处添加文本。


 复制代码
 import numpy as np
 import matplotlib.pyplot as plt


 mu, sigma = 100, 15
 x = mu + sigma * np.random.randn(10000)


 # the histogram of the data
 n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)




 plt.xlabel('Smarts')
 plt.ylabel('Probability')
 plt.title('Histogram of IQ')
 plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
 plt.axis([40, 160, 0, 0.03])
 plt.grid(True)


  每个text()命令都会返回一个matplotlib.text.Text实例，就像之前处理曲线一样，你可以通过使用setp()函数来传递关键词参数来定制文本的属性。


 t=plt.xlabel('my data',fontsize=14,color='red')
  在文本中使用数学表达式


  matplotlib在任何文本中都接受Text表达式。


  Tex表达式是有两个dollar符号环绕起来的,比如math-4cd9a23707.png的Tex表达式如下


 plt.title(r'$\sigma_i=15$')



用python的matplotlib画标准正态曲线




 import math
 import pylab as pl
 import numpy as np
 def gd(x,m,s):
     left=1/(math.sqrt(2*math.pi)*s)
     right=math.exp(-math.pow(x-m,2)/(2*math.pow(s,2)))
     return left*right
 def showfigure():
     x=np.arange(-4,5,0.1)
     y=[]
     for i in x:
         y.append(gd(i,0,1))
     pl.plot(x,y) 
     pl.xlim(-4.0,5.0)
     pl.ylim(-0.2,0.5)
 #
     ax = pl.gca()
     ax.spines['right'].set_color('none')
     ax.spines['top'].set_color('none')
     ax.xaxis.set_ticks_position('bottom')
     ax.spines['bottom'].set_position(('data',0))
     ax.yaxis.set_ticks_position('left')
     ax.spines['left'].set_position(('data',0))
     #add param
     label_f1 = "$\mu=0,\ \sigma=1$"
     pl.text(2.5,0.3,label_f1,fontsize=15,verticalalignment="top",
             horizontalalignment="left")
     label_f2 = r"$f(x)=\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(x-\mu)^2}{2\sigma^2})$"
     pl.text(1.5,0.4,label_f2,fontsize=15,verticalalignment="top"
             ,horizontalalignment="left")
     pl.show()
















python数据可视化matplotlib的使用


 # -*- coding:UTF-8 -*-
  
 import numpy as np
 import matplotlib.pyplot as plt
 from matplotlib.ticker import MultipleLocator
 from pylab import mpl
  
  
 import sys
 reload(sys)
 sys.setdefaultencoding('utf8')
  
 xmajorLocator = MultipleLocator(10* 1) #将x轴主刻度标签设置为10* 1的倍数
 ymajorLocator = MultipleLocator(0.1* 1) #将y轴主刻度标签设置为0.1 * 1的倍数
  
 # 设置中文字体
 mpl.rcParams['font.sans-serif'] = ['SimHei']
  
 # 导入文件数据
 #data = np.loadtxt('test44.txt', delimiter=None, dtype=float )
 #data = [[1,2],[3,4],[5,6]]
 data = [[1,5,10,20,30,40,50,60,70,80,90,100],[0.0201,0.0262,0.0324,0.0295,0.0221,0.0258,0.0254,0.0299,0.0275,0.0299,0.0291,0.0328],
 [0.0193,0.0254,0.0234,0.0684,0.0693,0.0803,0.1008,0.098,0.0947,0.0934,0.1971,0.2123],[0.0209,0.1176,0.2143,0.2295,0.4176,0.5258,0.6471,0.6484,0.8193,0.829,0.832,0.943]]
  
 data = np.array(data)
  
 # 截取数组数据
  
 x = data[0] #时间
 y = data[1] # 类别一的Y值
 y2 = data[2] #类别二的Y值
 y3 = data[3] #类别三的Y值
  
 plt.figure(num=1, figsize=(8, 6))
  
 ax = plt.subplot(111)
 ax.xaxis.set_major_locator(xmajorLocator)
 ax.yaxis.set_major_locator(ymajorLocator)
 ax.xaxis.grid(True, which='major') #x坐标轴的网格使用主刻度
 ax.yaxis.grid(True, which='major') #x坐标轴的网格使用主刻度
  
 plt.xlabel('时间/t',fontsize='xx-large')#Valid font size are large, None, medium, smaller, small, x-large, xx-small, larger, x-small, xx-large
 plt.ylabel('y-label',fontsize='xx-large')
 plt.title('Title',fontsize='xx-large')
 plt.xlim(0, 110)
 plt.ylim(0, 1)
  
 line1, = ax.plot(x, y, 'g.-',label="类别一",)
  
 line2, = ax.plot(x,y2,'b*-',label="类别二",)
  
 line3, = ax.plot(x,y3,'rD-',label="类别三",)
  
 ax.legend((line1, line2,line3),('类别一','类别二','类别三'),loc=5) # loc可为1、2、3、4、5、6，分别为不同的位置
 plt.show()


python matplotlib 生成x的三次方曲线图


 import matplotlib.pyplot as plt
 import numpy as np
 x = np.linspace(-100,100,100)
 y = x**3
 plt.figure(num=3,figsize=(8,5))   #num xuhao;figsize long width
 l1=plt.plot(x,y,'p')  # quta is to return name to plt.legend(handles)
 plt.xlim((-100,100))
 plt.ylim((-100000,100000))
 plt.xlabel('X')   #x zhou label
 plt.ylabel('Y')
 ax = plt.gca()
 ax.spines['right'].set_color('none')
 ax.spines['top'].set_color('none')    ##don't display border
 ax.xaxis.set_ticks_position('bottom')    ##set x zhou
 ax.yaxis.set_ticks_position('left')
 ax.spines['bottom'].set_position(('data',0))  #y 0 postition is x position
 ax.spines['left'].set_position(('data',0))
 ###tu li
 # labels can just set one label just post one line
 plt.legend(handles=l1,labels='y=x**3',loc='best')  ##loc=location
 plt.show()


python matplotlib 绘制三次函数图像


 >>> from matplotlib import pyplot as pl
 >>> import numpy as np
 >>> from scipy import interpolate
  
 >>> x = np.linspace(-10, 5, 100)
 >>> y = -2*x**3 + 5*x**2 + 9
 >>> pl.figure(figsize = (8, 4))     

 >>> pl.plot(x, y, color="blue", linewidth = 1.5)
 []
 >>> pl.show()


 pl.figure 设置绘图区大小


 pl.plot    开始绘图, 并设置线条颜色, 以及线条宽度


 pl.show 显示图像



python生成20个随机的DNA fasta格式文件

 生成20个随机的文件， 由于没有用到hash名字，文件名有可能会重复


 每个文件中有30-50条序列  每条序列的长度为70-120个碱基


 import os
 import random
 import string
  
 print (dir(string))
  
 letter = string.ascii_letters
  
 os.chdir("D:\\")
  
 bases = {1:"A", 2:"T", 3:"C", 4:"G"}
  
  
 ## Test random module , get random DNA base
  
 Nth = random.randint(1,4)
  
 print (bases[Nth])
  
 ## Create random DNA sequences
  
 for i in range(20):
     Number_of_Seq = random.randint(30,50)
     filename = letter[i]
     with open("Sequences"+filename + \
               str(Number_of_Seq)+ ".fasta", "w") as file_output:
         for j in range(Number_of_Seq):
             each_Seq=""
             Rand_len = random.randint(70,120)
             for k in range(Rand_len):
                 Nth = random.randint(1,4)
                 each_Seq += bases[Nth]
  
             file_output.write(">seq_"+str(Number_of_Seq)+ \
                               "_"+str(Rand_len)+"\n")
             file_output.write(each_Seq+"\n") 
 
 
 

import matplotlib.pyplot as plt

img=plt.imread('ch03/stinkbug.png')

import pylab

plt.imshow(img)

pylab.show()

怎么把文本文档改成Python 怎么把文本文档改成fasta_sql

import numpy as np

import matplotlib.pyplot as plt

import pylab

img=plt.imread('ch03/stinkbug.png')

plt.figure(figsize=(4, 4))

plt.imshow(img)

pylab.show()

怎么把文本文档改成Python 怎么把文本文档改成fasta_mysql_02