有时候需要登入网站,然后去抓取一些有用的信息,人工做的话,太累了。有的人可以很快的做到登入,但是需要在登入后再去访问其他页面始终都访问不了,因为他们没有带Cookie进去而被当做是两次会话。下面看看代码


<?php  //test.php

function getWebContent($host,$page="/",$paramstr="",$cookies='',$medth="POST",$port=80){

    $fp = fsockopen($host,$port);

    if(!$fp){

        return false;

    }

    $medth = strtoupper($medth);

    $medth = $medth=="POST" ? "POST":"GET";

    $length = strlen($paramstr);

    if($medth == "GET" && $paramstr){

        $page .= "?".$paramstr;

    }

    $out = "$medth $page  HTTP/1.1\r\n";

    $out .= "Accept: */*\r\n"; 

    $out .= "Host: www.exaple.com\r\n"; 

    $out .= "Content-Length: ".$length."\r\n";

    $out .= "Content-Type: application/x-www-form-urlencoded\r\n";

    if($cookies){

        $out .= "Cookie: ".$cookies." \r\n";

    }

    $out .= "Connection: Keep-Alive\r\n\r\n";

    if($medth=='POST' && $paramstr){

        $out .= $paramstr."\r\n";

    }

    fwrite($fp, $out);

    $cookie = "";

    $content = "";

    while (!feof($fp)) {

        $str = fgets($fp);

        if(preg_match("/Set-Cookie:([^\n]*)/",$str,$matchs)){

            if($cookie){

                $cookie .= ";".$matchs[1];

            }else{

                $cookie = $matchs[1];

            }

        }

        $content .= $str;

        echo $str;

    }

    fclose($fp);

    return array('content'=>$content,'cookie'=>$cookie);

}


$params = "name=admin&pwd=admin";

$rs = getWebContent("127.0.0.1","/test/login.php",$params,"","POST",8080);

echo $rs['content'];

$rs = getWebContent("127.0.0.1","/test/index.php","",$rs['cookie'],"POST",8080);

//这里传入上次cookie是关键,否则会被当成两次会话

echo $rs['content'];

?>


<?php //login.php

    $name = $_REQUEST['name'];

    $pwd = $_REQUEST['pwd'];

    if($name == "admin" && $pwd == "admin"){

        setcookie("cname",$name);

        echo "success";

    }else{

        echo "failed";   

    }

?>


<?php //index.php

if(isset($_COOKIE['cname']) && $_COOKIE['cname']){

    echo "<ul><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ul>";

}else{

    echo "please login first!";

}

?>


将上面三个文件分别保存,login.php和index.php放在root目录下的test目录下。然后test.php放在任意目录,然后去命令行运行php test.php,结果就能出来。


还有一种更简单的方式,就是用curl,代码如下,可以用下面的代码替换test.php

<?php

$post_data = array (

    "name" => "admin",

    "pwd" => "admin",

);

$cookie_jar = tempnam('./', 'cookie');//新建cookie文件

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://localhost:8080/test/login.php");

//设定返回的数据是否自动显示

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

// 我们在POST数据哦!

curl_setopt($ch, CURLOPT_POST, 1);

// 把post的变量加上

curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);

//把返回来的cookie信息保存在$cookie_jar文件中

curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_jar);

echo curl_exec($ch);

curl_close($ch);


$ch2 = curl_init();

curl_setopt($ch2, CURLOPT_URL, "http://localhost:8080/test/index.php");

curl_setopt($ch2, CURLOPT_HEADER, false);

curl_setopt($ch2, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($ch2, CURLOPT_COOKIEFILE, $cookie_jar);

echo curl_exec($ch2);

unlink($cookie_jar);

curl_close($ch2);

?>


学习时的痛苦是暂时的 未学到的痛苦是终生的