读取文本文件:

# -*- coding: utf-8 -*-

if __name__ == '__main__':
    test_file = "./test.txt"
    file_obj = open(test_file, 'r')
    lines = file_obj.readlines()
    # file_obj.close()
    print type(lines)
    for line in lines:
        print line
    file_obj.close()
    print "==============="
    for i, line in enumerate(lines):
        print i,line

运行结果:

C:\Anaconda2\python.exe F:/python01/lect004_shujucaiji/lect004_1.py
<type 'list'>
数据流图(Data Flow Diagram):简称DFD,它从数据传递和加工角度,以图形方式来表达系统的逻辑功能、数据在系统...

在软件工程的课程设计中,感觉DFD图与编程关系不大,

请问诸位高手,DFD图到底是给谁看的(系统分析员OR程序员OR用户)?

还有,事务型DFD图与变换型DFD图有什么区别?
===============
0 数据流图(Data Flow Diagram):简称DFD,它从数据传递和加工角度,以图形方式来表达系统的逻辑功能、数据在系统...

1 在软件工程的课程设计中,感觉DFD图与编程关系不大,

2 请问诸位高手,DFD图到底是给谁看的(系统分析员OR程序员OR用户)?

3 还有,事务型DFD图与变换型DFD图有什么区别?

Process finished with exit code 0

    写文件:

# -*- coding: utf-8 -*-

if __name__ == '__main__':
    test_file = "./test.txt"
    test_obj = open(test_file, 'w')
    test_obj.write("刘海静 肖肖")
    test_obj.close()

逐行写入:

# -*- coding: utf-8 -*-

if __name__ == '__main__':
    file_name ="./test3.txt"
    file_obj = open(file_name, 'w')
    lines = ["这是第%i行\n" %n for n in range(100)]
    file_obj.writelines(lines)
    file_obj.close()

with语句(包括了异常处理,自动调用文件关闭操作,推荐使用)

if __name__ == '__main__':
    test_file ="./test3.txt"
    with open(test_file, "r") as f_obj:
        print f_obj.read()

运行结果:

这是第87行
这是第88行
这是第89行
这是第90行
这是第91行
这是第92行
这是第93行
这是第94行
这是第95行
这是第96行

*************

Series,类似于一维数组对象

DataFrame, 表格型数据结构,每列可以是不同的数据类型,可以表示二维或更高维度的数据。

***************************

json

import json

if __name__ == '__main__':
    file_name = "./global_temperature.json"
    with open(file_name, 'r') as f_obj:
        load = json.load(f_obj)
        print load
    print (type(load))

运行结果:

C:\Anaconda2\python.exe F:/python01/lect004_shujucaiji/lect004_json.py
{u'data': {u'1948': u'-0.0471', u'1949': u'-0.0550', u'1942': u'0.1549', u'1943': u'0.1598', u'1940': u'0.0927', u'1941': u'0.1974', u'1946': u'-0.0013', u'1947': u'-0.0455', u'1944': u'0.2948', u'1945': u'0.1754', u'2015': u'0.8990', u'2014': u'0.7402', u'2011': u'0.5759', u'2010': u'0.7008', u'2013': u'0.6687', u'2012': u'0.6219', u'1955': u'-0.1305', u'1954': u'-0.1118', u'1957': u'0.0538', u'1956': u'-0.1945', u'1951': u'-0.0095', u'1950': u'-0.1579', u'1953': u'0.0997', u'1952': u'0.0288', u'1959': u'0.0640', u'1958': u'0.1145', u'1920': u'-0.2152', u'1921': u'-0.1517', u'1922': u'-0.2318', u'1923': u'-0.2161', u'1924': u'-0.2510', u'1925': u'-0.1464', u'1926': u'-0.0618', u'1927': u'-0.1506', u'1928': u'-0.1749', u'1929': u'-0.2982', u'1933': u'-0.2481', u'1932': u'-0.1214', u'1931': u'-0.0714', u'1930': u'-0.1016', u'1937': u'-0.0204', u'1936': u'-0.1173', u'1935': u'-0.1445', u'1934': u'-0.1075', u'1939': u'-0.0157', u'1938': u'-0.0318', u'1908': u'-0.4441', u'1909': u'-0.4332', u'1906': u'-0.2208', u'1907': u'-0.3767', u'1904': u'-0.4240', u'1905': u'-0.2967', u'1902': u'-0.2535', u'1903': u'-0.3442', u'1900': u'-0.0704', u'1901': u'-0.1471', u'1986': u'0.2308', u'1987': u'0.3710', u'1984': u'0.1510', u'1985': u'0.1357', u'1982': u'0.1836', u'1983': u'0.3429', u'1980': u'0.2651', u'1981': u'0.3024', u'1988': u'0.3770', u'1989': u'0.2982', u'1919': u'-0.2082', u'1918': u'-0.2118', u'1911': u'-0.4367', u'1910': u'-0.3862', u'1913': u'-0.3205', u'1912': u'-0.3318', u'1915': u'-0.0747', u'1914': u'-0.1444', u'1917': u'-0.3193', u'1916': u'-0.2979', u'1991': u'0.4079', u'1990': u'0.4350', u'1993': u'0.2857', u'1992': u'0.2583', u'1995': u'0.4593', u'1994': u'0.3420', u'1997': u'0.5185', u'1996': u'0.3225', u'1999': u'0.4427', u'1998': u'0.6335', u'1898': u'-0.2578', u'1899': u'-0.1172', u'1894': u'-0.2828', u'1895': u'-0.2279', u'1896': u'-0.0971', u'1897': u'-0.1232', u'1890': u'-0.3233', u'1891': u'-0.2552', u'1892': u'-0.3079', u'1893': u'-0.3221', u'1968': u'-0.0282', u'1969': u'0.0937', u'1964': u'-0.1461', u'1965': u'-0.0752', u'1966': u'-0.0204', u'1967': u'-0.0112', u'1960': u'0.0252', u'1961': u'0.0818', u'1962': u'0.0924', u'1963': u'0.1100', u'1889': u'-0.1032', u'1888': u'-0.1541', u'1887': u'-0.2559', u'1886': u'-0.2101', u'1885': u'-0.2220', u'1884': u'-0.2099', u'1883': u'-0.1481', u'1882': u'-0.0710', u'1881': u'-0.0707', u'1880': u'-0.1247', u'1979': u'0.2288', u'1978': u'0.1139', u'1977': u'0.1996', u'1976': u'-0.0769', u'1975': u'0.0060', u'1974': u'-0.0698', u'1973': u'0.1654', u'1972': u'0.0280', u'1971': u'-0.0775', u'1970': u'0.0383', u'2002': u'0.6018', u'2003': u'0.6145', u'2000': u'0.4255', u'2001': u'0.5455', u'2006': u'0.6139', u'2007': u'0.6113', u'2004': u'0.5806', u'2005': u'0.6583', u'2008': u'0.5415', u'2009': u'0.6354'}, u'description': {u'units': u'Degrees Celsius', u'base_period': u'1901-2000', u'title': u'Global Land and Ocean Temperature Anomalies, January-December'}}
<type 'dict'>
import json
import pandas as pd
if __name__ == '__main__':
    file_name = "./global_temperature.json"
    with open(file_name, 'r') as f_obj:
        load = json.load(f_obj)
        print load
    print (type(load))
    print load.keys()
    print load['data']
    print load['description']
    print "=================="
    print type(load['data'])
    data = pd.Series(load['data'],name='data')
    description= pd.Series(load['description'], name='description')
    concat = pd.concat([data, description], axis=1)
    print concat
    concat.to_csv('./tt.csv',index=None)

结果:

C:\Anaconda2\python.exe F:/python01/lect004_shujucaiji/lect004_json.py
{u'data': {u'1948': u'-0.0471', u'1949': u'-0.0550', u'1942': u'0.1549', u'1943': u'0.1598', u'1940': u'0.0927', u'1941': u'0.1974', u'1946': u'-0.0013', u'1947': u'-0.0455', u'1944': u'0.2948', u'1945': u'0.1754', u'2015': u'0.8990', u'2014': u'0.7402', u'2011': u'0.5759', u'2010': u'0.7008', u'2013': u'0.6687', u'2012': u'0.6219', u'1955': u'-0.1305', u'1954': u'-0.1118', u'1957': u'0.0538', u'1956': u'-0.1945', u'1951': u'-0.0095', u'1950': u'-0.1579', u'1953': u'0.0997', u'1952': u'0.0288', u'1959': u'0.0640', u'1958': u'0.1145', u'1920': u'-0.2152', u'1921': u'-0.1517', u'1922': u'-0.2318', u'1923': u'-0.2161', u'1924': u'-0.2510', u'1925': u'-0.1464', u'1926': u'-0.0618', u'1927': u'-0.1506', u'1928': u'-0.1749', u'1929': u'-0.2982', u'1933': u'-0.2481', u'1932': u'-0.1214', u'1931': u'-0.0714', u'1930': u'-0.1016', u'1937': u'-0.0204', u'1936': u'-0.1173', u'1935': u'-0.1445', u'1934': u'-0.1075', u'1939': u'-0.0157', u'1938': u'-0.0318', u'1908': u'-0.4441', u'1909': u'-0.4332', u'1906': u'-0.2208', u'1907': u'-0.3767', u'1904': u'-0.4240', u'1905': u'-0.2967', u'1902': u'-0.2535', u'1903': u'-0.3442', u'1900': u'-0.0704', u'1901': u'-0.1471', u'1986': u'0.2308', u'1987': u'0.3710', u'1984': u'0.1510', u'1985': u'0.1357', u'1982': u'0.1836', u'1983': u'0.3429', u'1980': u'0.2651', u'1981': u'0.3024', u'1988': u'0.3770', u'1989': u'0.2982', u'1919': u'-0.2082', u'1918': u'-0.2118', u'1911': u'-0.4367', u'1910': u'-0.3862', u'1913': u'-0.3205', u'1912': u'-0.3318', u'1915': u'-0.0747', u'1914': u'-0.1444', u'1917': u'-0.3193', u'1916': u'-0.2979', u'1991': u'0.4079', u'1990': u'0.4350', u'1993': u'0.2857', u'1992': u'0.2583', u'1995': u'0.4593', u'1994': u'0.3420', u'1997': u'0.5185', u'1996': u'0.3225', u'1999': u'0.4427', u'1998': u'0.6335', u'1898': u'-0.2578', u'1899': u'-0.1172', u'1894': u'-0.2828', u'1895': u'-0.2279', u'1896': u'-0.0971', u'1897': u'-0.1232', u'1890': u'-0.3233', u'1891': u'-0.2552', u'1892': u'-0.3079', u'1893': u'-0.3221', u'1968': u'-0.0282', u'1969': u'0.0937', u'1964': u'-0.1461', u'1965': u'-0.0752', u'1966': u'-0.0204', u'1967': u'-0.0112', u'1960': u'0.0252', u'1961': u'0.0818', u'1962': u'0.0924', u'1963': u'0.1100', u'1889': u'-0.1032', u'1888': u'-0.1541', u'1887': u'-0.2559', u'1886': u'-0.2101', u'1885': u'-0.2220', u'1884': u'-0.2099', u'1883': u'-0.1481', u'1882': u'-0.0710', u'1881': u'-0.0707', u'1880': u'-0.1247', u'1979': u'0.2288', u'1978': u'0.1139', u'1977': u'0.1996', u'1976': u'-0.0769', u'1975': u'0.0060', u'1974': u'-0.0698', u'1973': u'0.1654', u'1972': u'0.0280', u'1971': u'-0.0775', u'1970': u'0.0383', u'2002': u'0.6018', u'2003': u'0.6145', u'2000': u'0.4255', u'2001': u'0.5455', u'2006': u'0.6139', u'2007': u'0.6113', u'2004': u'0.5806', u'2005': u'0.6583', u'2008': u'0.5415', u'2009': u'0.6354'}, u'description': {u'units': u'Degrees Celsius', u'base_period': u'1901-2000', u'title': u'Global Land and Ocean Temperature Anomalies, January-December'}}
<type 'dict'>
[u'data', u'description']
{u'1948': u'-0.0471', u'1949': u'-0.0550', u'1942': u'0.1549', u'1943': u'0.1598', u'1940': u'0.0927', u'1941': u'0.1974', u'1946': u'-0.0013', u'1947': u'-0.0455', u'1944': u'0.2948', u'1945': u'0.1754', u'2015': u'0.8990', u'2014': u'0.7402', u'2011': u'0.5759', u'2010': u'0.7008', u'2013': u'0.6687', u'2012': u'0.6219', u'1955': u'-0.1305', u'1954': u'-0.1118', u'1957': u'0.0538', u'1956': u'-0.1945', u'1951': u'-0.0095', u'1950': u'-0.1579', u'1953': u'0.0997', u'1952': u'0.0288', u'1959': u'0.0640', u'1958': u'0.1145', u'1920': u'-0.2152', u'1921': u'-0.1517', u'1922': u'-0.2318', u'1923': u'-0.2161', u'1924': u'-0.2510', u'1925': u'-0.1464', u'1926': u'-0.0618', u'1927': u'-0.1506', u'1928': u'-0.1749', u'1929': u'-0.2982', u'1933': u'-0.2481', u'1932': u'-0.1214', u'1931': u'-0.0714', u'1930': u'-0.1016', u'1937': u'-0.0204', u'1936': u'-0.1173', u'1935': u'-0.1445', u'1934': u'-0.1075', u'1939': u'-0.0157', u'1938': u'-0.0318', u'1908': u'-0.4441', u'1909': u'-0.4332', u'1906': u'-0.2208', u'1907': u'-0.3767', u'1904': u'-0.4240', u'1905': u'-0.2967', u'1902': u'-0.2535', u'1903': u'-0.3442', u'1900': u'-0.0704', u'1901': u'-0.1471', u'1986': u'0.2308', u'1987': u'0.3710', u'1984': u'0.1510', u'1985': u'0.1357', u'1982': u'0.1836', u'1983': u'0.3429', u'1980': u'0.2651', u'1981': u'0.3024', u'1988': u'0.3770', u'1989': u'0.2982', u'1919': u'-0.2082', u'1918': u'-0.2118', u'1911': u'-0.4367', u'1910': u'-0.3862', u'1913': u'-0.3205', u'1912': u'-0.3318', u'1915': u'-0.0747', u'1914': u'-0.1444', u'1917': u'-0.3193', u'1916': u'-0.2979', u'1991': u'0.4079', u'1990': u'0.4350', u'1993': u'0.2857', u'1992': u'0.2583', u'1995': u'0.4593', u'1994': u'0.3420', u'1997': u'0.5185', u'1996': u'0.3225', u'1999': u'0.4427', u'1998': u'0.6335', u'1898': u'-0.2578', u'1899': u'-0.1172', u'1894': u'-0.2828', u'1895': u'-0.2279', u'1896': u'-0.0971', u'1897': u'-0.1232', u'1890': u'-0.3233', u'1891': u'-0.2552', u'1892': u'-0.3079', u'1893': u'-0.3221', u'1968': u'-0.0282', u'1969': u'0.0937', u'1964': u'-0.1461', u'1965': u'-0.0752', u'1966': u'-0.0204', u'1967': u'-0.0112', u'1960': u'0.0252', u'1961': u'0.0818', u'1962': u'0.0924', u'1963': u'0.1100', u'1889': u'-0.1032', u'1888': u'-0.1541', u'1887': u'-0.2559', u'1886': u'-0.2101', u'1885': u'-0.2220', u'1884': u'-0.2099', u'1883': u'-0.1481', u'1882': u'-0.0710', u'1881': u'-0.0707', u'1880': u'-0.1247', u'1979': u'0.2288', u'1978': u'0.1139', u'1977': u'0.1996', u'1976': u'-0.0769', u'1975': u'0.0060', u'1974': u'-0.0698', u'1973': u'0.1654', u'1972': u'0.0280', u'1971': u'-0.0775', u'1970': u'0.0383', u'2002': u'0.6018', u'2003': u'0.6145', u'2000': u'0.4255', u'2001': u'0.5455', u'2006': u'0.6139', u'2007': u'0.6113', u'2004': u'0.5806', u'2005': u'0.6583', u'2008': u'0.5415', u'2009': u'0.6354'}
{u'units': u'Degrees Celsius', u'base_period': u'1901-2000', u'title': u'Global Land and Ocean Temperature Anomalies, January-December'}
==================
<type 'dict'>
                data                                        description
1880         -0.1247                                                NaN
1881         -0.0707                                                NaN
1882         -0.0710                                                NaN
1883         -0.1481                                                NaN

爬虫简介:

url管理模块

网页下载模块

网页解析模块

***网页下载代码:

# -*- coding: utf-8 -*-

import urllib2

import cookielib


if __name__ == '__main__':
    test_rul = "http://www.baidu.com/"
    response = urllib2.urlopen(test_rul)
    print response.getcode()
    print response.read()
    print "-----------------"
    # 可以添加用户名密码的
    request = urllib2.Request(test_rul)
    # request.add_header()
    # request.add_data()
    response2 = urllib2.urlopen(request)
    print response2.read()
    print "======================="
    cookie = cookielib.CookieJar()
    handler=urllib2.HTTPCookieProcessor(cookie)
    opener = urllib2.build_opener(handler)
    response3 = opener.open('http://www.baidu.com')
    print response3.read()

***url解析模块

*正则表达式,字符串的模糊匹配

*html.parser

*BeautifulSoup, 结构化的网页解析

*lxml

# -*- coding: utf-8 -*-

import urllib2
from bs4 import BeautifulSoup

if __name__ == '__main__':
    html = urllib2.urlopen("http://www.baidu.com")
    soup = BeautifulSoup(html, 'html.parser')
    print soup.title

结果:

C:\Anaconda2\python.exe F:/python01/lect004_shujucaiji/lect004_beatifulsoup.py
<title>百度一下,你就知道</title>

Process finished with exit code 0
# -*- coding: utf-8 -*-

import urllib2
from bs4 import BeautifulSoup

if __name__ == '__main__':
    html = urllib2.urlopen("http://www.baidu.com")
    soup = BeautifulSoup(html, 'html.parser')
    print soup.title
    # print soup.a
    # 获取某个标签的集合
    a_list = soup.find_all("a")
    for link in a_list:
        # print link.name, link["href"], link.get_text()
        print link
    print "------------------------"
    find = soup.find('a', id='jgwab')
    print find.name, find["href"], find.get_text()
    # 由于class是关键字,所以需要加  _  来区分
    find2 = soup.find('a', class_='mnav')
    print find2.name, find2["href"], find2.get_text()
    print "================================="
    # 异常处理
    try:
        html = urllib2.urlopen("http://www.dfdf.dfdfd")
    except Exception as e:
        print e
    print "end"

运行结果:

C:\Anaconda2\python.exe F:/python01/lect004_shujucaiji/lect004_beatifulsoup.py
<title>百度一下,你就知道</title>
<a href="/" id="result_logo" onmousedown="return c({'fm':'tab','tab':'logo'})"><img alt="到百度首页" src="//www.baidu.com/img/baidu_jgylogo3.gif" title="到百度首页"/></a>
<a href="javascript:;" name="ime_hw">手写</a>
<a href="javascript:;" name="ime_py">拼音</a>
<a href="javascript:;" name="ime_cl">关闭</a>
<a class="toindex" href="/">百度首页</a>
<a class="pf" href="javascript:;" name="tj_settingicon">设置<i class="c-icon c-icon-triangle-down"></i></a>
<a class="lb" href="https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" name="tj_login" onclick="return false;">登录</a>
<a class="mnav" href="http://news.baidu.com" name="tj_trnews">新闻</a>
<a class="mnav" href="http://www.hao123.com" name="tj_trhao123">hao123</a>
<a class="mnav" href="http://map.baidu.com" name="tj_trmap">地图</a>
<a class="mnav" href="http://v.baidu.com" name="tj_trvideo">视频</a>
<a class="mnav" href="http://tieba.baidu.com" name="tj_trtieba">贴吧</a>
<a class="mnav" href="http://xueshu.baidu.com" name="tj_trxueshu">学术</a>
<a class="lb" href="https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" name="tj_login" onclick="return false;">登录</a>
<a class="pf" href="http://www.baidu.com/gaoji/preferences.html" name="tj_settingicon">设置</a>
<a class="bri" href="http://www.baidu.com/more/" name="tj_briicon" style="display: block;">更多产品</a>
<a href="http://news.baidu.com/ns?cl=2&rn=20&tn=news&word=" onmousedown="return c({'fm':'tab','tab':'news'})" wdfield="word">新闻</a>
<a href="http://tieba.baidu.com/f?kw=&fr=wwwt" onmousedown="return c({'fm':'tab','tab':'tieba'})" wdfield="kw">贴吧</a>
<a href="http://zhidao.baidu.com/q?ct=17&pn=0&tn=ikaslist&rn=10&word=&fr=wwwt" onmousedown="return c({'fm':'tab','tab':'zhidao'})" wdfield="word">知道</a>
<a href="http://music.baidu.com/search?fr=ps&ie=utf-8&key=" onmousedown="return c({'fm':'tab','tab':'music'})" wdfield="key">音乐</a>
<a href="http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=" onmousedown="return c({'fm':'tab','tab':'pic'})" wdfield="word">图片</a>
<a href="http://v.baidu.com/v?ct=301989888&rn=20&pn=0&db=0&s=25&ie=utf-8&word=" onmousedown="return c({'fm':'tab','tab':'video'})" wdfield="word">视频</a>
<a href="http://map.baidu.com/m?word=&fr=ps01000" onmousedown="return c({'fm':'tab','tab':'map'})" wdfield="word">地图</a>
<a href="http://wenku.baidu.com/search?word=&lm=0&od=0&ie=utf-8" onmousedown="return c({'fm':'tab','tab':'wenku'})" wdfield="word">文库</a>
<a href="//www.baidu.com/more/" onmousedown="return c({'fm':'tab','tab':'more'})">更多»</a>
<a href="//www.baidu.com/cache/sethelp/help.html" id="setf" onmousedown="return ns_c({'fm':'behs','tab':'favorites','pos':0})" target="_blank">把百度设为主页</a>
<a href="http://home.baidu.com" onmousedown="return ns_c({'fm':'behs','tab':'tj_about'})">关于百度</a>
<a href="http://ir.baidu.com" onmousedown="return ns_c({'fm':'behs','tab':'tj_about_en'})">About  Baidu</a>
<a href="http://e.baidu.com/?refer=888" onmousedown="return ns_c({'fm':'behs','tab':'tj_tuiguang'})">百度推广</a>
<a href="http://www.baidu.com/duty/" onmousedown="return ns_c({'fm':'behs','tab':'tj_duty'})">使用百度前必读</a>
<a class="cp-feedback" href="http://jianyi.baidu.com/" onmousedown="return ns_c({'fm':'behs','tab':'tj_homefb'})">意见反馈</a>
<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000001" id="jgwab" target="_blank">京公网安备11000002000001号</a>
------------------------
a http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000001 京公网安备11000002000001号
a http://news.baidu.com 新闻
=================================

使用scrapy创建爬虫项目

安装scrapy

pip  install  scrapy

新建一个目录spider

执行创建项目命令:

scrapy    createproject  spider01

留一法python代码 python处理流数据_javascript

 

生成要爬去网页的demo

scrapy  genspider demo_spider  https://www.aqistudy.cn/historydata/index.php

留一法python代码 python处理流数据_留一法python代码_02

运行spider

scrapy crawl demo_spider

留一法python代码 python处理流数据_python_03