需要爬取一个页面,因为访问页面时,参数里需要传入cookie数据,所以先要获取进入该网页的cookie,具体获取cookie步骤如下:
public static String getCookies(String url) throws IOException {
// 全局请求设置
RequestConfig globalConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD).build();
// 创建cookie store的本地实例
CookieStore cookieStore = new BasicCookieStore();
// 创建HttpClient上下文
HttpClientContext context = HttpClientContext.create();
context.setCookieStore(cookieStore);
// 创建一个HttpClient
CloseableHttpClient httpClient = HttpClients.custom().setDefaultRequestConfig(globalConfig)
.setDefaultCookieStore(cookieStore).build();
CloseableHttpResponse res = null;
// 创建一个get请求用来获取必要的Cookie,如_xsrf信息
HttpGet get = new HttpGet(url);
res = httpClient.execute(get, context);
// 获取常用Cookie,包括_xsrf信息
StringBuffer cookie=new StringBuffer();
for (Cookie c : cookieStore.getCookies()) {
//拼接所有cookie变成一个字符串;
cookie.append(c.getName()+"="+c.getValue()+";");
System.out.println(c.getName() + ": " + c.getValue());
}
String cookieres=cookie.toString();
cookieres=cookieres.substring(0,cookieres.length()-1);
res.close();
return cookieres;
}
获取cookie后,再通过post或者get请求,把cookie参数传入获取相对应的返回的json数据,或者html页面