目标
实现查询超出百万的数据,并且对数据进行处理,导出excel表
问题所在:
查询数据速度;内存溢出;
步骤
1.分页查询(解决内存溢出)
2.异步阻塞查询组合数据(解决查询效率慢)
3.SQL优化(通过limit,索引优化效率(explain),具体字段优化)
4.主动gc,对使用完的list数据主动回收。
内容概述
本文中,每查询10w条数据,其中每1w条数据通过异步并发分页查询数据,查询的limit在0到10w中,用异步阻塞的方法获取10w条数据后,取10w条数据的最大orderId的值,做为下次查询的条件,使第10w到20w数据的查询的limit值重新从0到10w取值
PS:
1.需要注意处理异步阻塞查询数据时,并发的脏读问题
2.已知在mysql的limit中 limit 0,5000 和limit 10000,5000中offset值越大,查询效率越慢,则可以通过其他索引字段限制offset值,使offset从头开始
3.注意mybatis中的分页插件查询数据会先进行count查询再进行分页查询,查2次对大数据查询不友好
代码分析
1.创建分布式锁,限制次数(防止用户多次点击)
注意:redisLoader只是对redisTemplate的封装,指定了调用的redis配置
Long start = System.currentTimeMillis() / 1000;
//分布式锁,60s后过期
if (!redisLoader.setIfAbsent("saas:exportData:exportCountOrderInfo", start+"", 60 * 1000)) {
logger.info("接口在执行!!,不能重复执行");
return;
}
2.完成业务后主动删除锁
if (String.valueOf(start).equals(redisLoader.get("saas:exportData:exportCountOrderInfo"))) {
//删锁
redisLoader.delete("saas:exportData:exportCountOrderInfo");
}
3.异步阻塞+分页查询数据
//每百万条数据切换一个sheet
WriteSheet writeSheet = sheets.get(i / 10);
long start1 = System.currentTimeMillis() / 1000;
//主线程等待子线程执行完毕
List<CompletableFuture<List<OrderInfoCountPrintVo>>> futureList = new ArrayList<>();
List<OrderInfoStatisticsSearch> listSearch = new ArrayList<>();
logger.info(System.currentTimeMillis() + ":阻塞异步开始");
//pageSizeWrite/pageSize 写入数据分页大小和读取数据分页大小的倍率
for (int j = 1; j <= pageSizeWrite / pageSize; j++) {
//增加异步查询的次数,来减少每次查询的时间,削峰填谷
int k = i * pageSizeWrite / pageSize + j;
if (k <= pagesFact) {
//并发线程内需要变更的对象需要在线程内创建创建新对象处理
//logger.warn("读取页数:{},页大小:{}",k ,pageSize)
CompletableFuture<List<OrderInfoCountPrintVo>> job1 = asyncService.getExportDataList(orderInfoStatisticsSearch, j, pageSize);
futureList.add(job1);
}
}
List<OrderInfoCountPrintVo> list = new ArrayList<>();
for (CompletableFuture<List<OrderInfoCountPrintVo>> listCompletableFuture : futureList) {
List<OrderInfoCountPrintVo> join = listCompletableFuture.get();
if (join.size() > 0) {
list.addAll(join);
join.clear();
}
}
logger.info(System.currentTimeMillis() + ":异步阻塞结束");
logger.warn("读取数据耗时:{}", System.currentTimeMillis() / 1000 - start1);
if (list.size() > 0) {
list.stream().sorted(Comparator.comparing(OrderInfoCountPrintVo::getOrderId)).collect(Collectors.toList());
//logger.warn("listSize:{},排序耗时:{}", list.size(), System.currentTimeMillis() / 1000 - start2);
long start3 = System.currentTimeMillis() / 1000;
excelWriter.write(list, writeSheet);
logger.warn("listSize:{},write耗时:{}", list.size(), System.currentTimeMillis() / 1000 - start3);
//限制异步查询一下次查询的最小id值,为了offset重新开始
orderInfoStatisticsSearch.setOrderId(list.get(list.size() - 1).getOrderId());
}
//gc,清空内存
list.clear();
logger.warn("pageNo:,{},pageSizeWrite:{},pagesWrite{},耗时:{}", i, pageSizeWrite, pagesWrite, System.currentTimeMillis() / 1000 - start1);
4.并发任务的
@Async("ExportDataBatch")
public CompletableFuture<List<OrderInfoCountPrintVo>> getExportDataList(OrderInfoStatisticsSearch orderInfoStatisticsSearch, int j, int pageSize) {
List<OrderInfoCountPrintVo> list =new ArrayList<>();
//orderInfoStatisticsSearch 对象非安全,
//data-》线程内创建的对象线性安全,解决并发线程幻读脏读问题,
OrderInfoStatisticsSearch data=new OrderInfoStatisticsSearch();
BeanUtils.copyProperties(orderInfoStatisticsSearch,data);
data.setOffset((j-1)*pageSize);
data.setLimit(pageSize);
try {
List<OrderInfoCountPrintVo> orderInfoExportDataList = orderInfoService.listExportOrderInfo(data);
if (orderInfoExportDataList.size() > 0) {
list.addAll(orderInfoExportDataList);
orderInfoExportDataList.clear();
}
//logger.warn("异步读取 pageNo:{},pageSize:{},耗时:{}", j, pageSize, System.currentTimeMillis() / 1000 - start);
} catch (Exception e) {
e.printStackTrace();
}finally {
return CompletableFuture.completedFuture(list);
}
}
5.查询sql
select a.orderId from (
<include refid="exportData"/>
)a
order by a.orderId
<if test="limit != null and limit != ''">
limit #{offset},#{limit}
</if>
主要代码汇总
/**
* 导出 订单报表统计的相关订单
*/
@ResponseBody
@RequestMapping(value = "/exportCountOrderInfo", method = RequestMethod.GET)
public void exportCountOrderInfo(OrderInfoStatisticsSearch orderInfoStatisticsSearch, HttpServletResponse response) throws IOException {
OutputStream outputStream = response.getOutputStream();
// ExcelWriter excelWriter = new ExcelWriter(outputStream, ExcelTypeEnum.XLSX);
ExcelWriter excelWriter = EasyExcel.write(outputStream, OrderPrintVo.class).build();
response.setHeader("Set-Cookie", "fileDownload=true; path=/");
int pageNo = 1;
int pageSizeWrite = 100000;
int pageSize = 10000;
//其他非分页sql数据查询
orderInfoStatisticsSearch = orderInfoService.orderInfoStatisticsSearchChange(orderInfoStatisticsSearch);
//long totalCount=orderInfoService.listCountOrderInfo(orderInfoStatisticsSearch).getTotal();
Long start = System.currentTimeMillis() / 1000;
//分布式锁,60s后过期
if (!redisLoader.setIfAbsent("saas:exportData:exportCountOrderInfo", start+"", 60 * 1000)) {
logger.info("接口在执行!!,不能重复执行");
return;
}
try {
logger.info(orderInfoStatisticsSearch.getTenantCode() + ",orderInfo/exportCountOrderInfo,startTime:{},endTime:{},count:{}",
orderInfoStatisticsSearch.getStartTime(), orderInfoStatisticsSearch.getEndTime(), orderInfoStatisticsSearch.getCount());
response.setContentType("application/vnd.ms-excel");
response.setCharacterEncoding("utf-8");
// 这里URLEncoder.encode可以防止中文乱码 当然和easyexcel没有关系
String fileName = URLEncoder.encode("订单列表", "UTF-8");
response.setHeader("Content-disposition", "attachment;filename=" + fileName + ".xlsx");
orderInfoStatisticsSearch.setOffset(pageNo - 1);
orderInfoStatisticsSearch.setLimit(pageSize);
//禁止使用mybatis的分页查询,mybatis的分页查询会先进行count查询,导出查2次数据
//总数
Long nums = orderInfoMapper.listExportOrderInfoCount(orderInfoStatisticsSearch);
//可以取到小数点后的正确数值,如果两个都是整形,那小数点后面就会清零
float numberA = nums;
//总页数
float numberB = pageSize;
//计算实际读取数据时分页大小
Integer pagesFact = (int) Math.ceil(numberA / numberB);
//计算写入数据时分页大小
float numberC = pageSizeWrite;
int pagesWrite = (int) Math.ceil(numberA / numberC);
logger.info("计算写入页码用时:" + (System.currentTimeMillis() / 1000 - start) + "秒,{},总数:{}", pagesWrite, nums);
logger.info("条件:{}", JSONObject.toJSONString(orderInfoStatisticsSearch));
List<WriteSheet> sheets = new ArrayList<>();
for (int i = 0; i < pagesWrite; i++) {
if (i % 10 == 0) {
//每百万条数据切换一个sheet
WriteSheet writeSheet = EasyExcel.writerSheet("订单列表_" + (i + 1)).build();
sheets.add(writeSheet);
}
}
for (int i = 0; i < pagesWrite; i++) {
//每百万条数据切换一个sheet
WriteSheet writeSheet = sheets.get(i / 10);
long start1 = System.currentTimeMillis() / 1000;
//主线程等待子线程执行完毕
List<CompletableFuture<List<OrderInfoCountPrintVo>>> futureList = new ArrayList<>();
List<OrderInfoStatisticsSearch> listSearch = new ArrayList<>();
logger.info(System.currentTimeMillis() + ":阻塞异步开始");
//pageSizeWrite/pageSize 写入数据分页大小和读取数据分页大小的倍率
for (int j = 1; j <= pageSizeWrite / pageSize; j++) {
//增加异步查询的次数,来减少每次查询的时间,削峰填谷
int k = i * pageSizeWrite / pageSize + j;
if (k <= pagesFact) {
//并发线程内需要变更的对象需要在线程内创建创建新对象处理
//logger.warn("读取页数:{},页大小:{}",k ,pageSize);
//logger.info("读取页数条件:{},id", orderInfoStatisticsSearch.getOrderId()!=null?orderInfoStatisticsSearch.getOrderId():null);
CompletableFuture<List<OrderInfoCountPrintVo>> job1 = asyncService.getExportDataList(orderInfoStatisticsSearch, j, pageSize);
futureList.add(job1);
}
}
List<OrderInfoCountPrintVo> list = new ArrayList<>();
for (CompletableFuture<List<OrderInfoCountPrintVo>> listCompletableFuture : futureList) {
List<OrderInfoCountPrintVo> join = listCompletableFuture.get();
if (join.size() > 0) {
list.addAll(join);
join.clear();
}
}
logger.info(System.currentTimeMillis() + ":异步阻塞结束");
logger.warn("读取数据耗时:{}", System.currentTimeMillis() / 1000 - start1);
if (list.size() > 0) {
list.stream().sorted(Comparator.comparing(OrderInfoCountPrintVo::getOrderId)).collect(Collectors.toList());
//logger.warn("listSize:{},排序耗时:{}", list.size(), System.currentTimeMillis() / 1000 - start2);
long start3 = System.currentTimeMillis() / 1000;
excelWriter.write(list, writeSheet);
logger.warn("listSize:{},write耗时:{}", list.size(), System.currentTimeMillis() / 1000 - start3);
//限制异步查询一下次查询的最小id值,为了offset重新开始
orderInfoStatisticsSearch.setOrderId(list.get(list.size() - 1).getOrderId());
}
//gc,清空内存
list.clear();
logger.warn("pageNo:,{},pageSizeWrite:{},pagesWrite{},耗时:{}", i, pageSizeWrite, pagesWrite, System.currentTimeMillis() / 1000 - start1);
}
long end = System.currentTimeMillis() / 1000;
logger.info("订单报表导出耗时:" + (end - start) + "秒");
if (String.valueOf(start).equals(redisLoader.get("saas:exportData:exportCountOrderInfo"))) {
//删锁
redisLoader.delete("saas:exportData:exportCountOrderInfo");
}
} catch (Exception e) {
logger.error("导出订单失败", e);
} finally {
outputStream.flush();
excelWriter.finish();
outputStream.close();
}
}
扩展内容讨论
1.导出的数据内容上亿后有几十g如何处理?
答:集群服务器分文件生成部分内容,输出到统一的文件服务器上,校验完整输出后打包多个文件为一个zip包,然后保存下载地址,提供用户下载。(当然得注意对数据库读取数据的压力)