Chunking_51CTO博客

Persisten Connections, Pipelining and chunking

The Content-Length header is an interesting HTTP response header. This header tells HTTP client applications the size of the response. However, in HTTP 1.1, this header is optional. HTT

http

persistent

connections

转载

dingzongliang

2012-06-04 13:55:42

430阅读

semantic chunk semantic chunking 工具

网络变压器 This is a technical tutorial on how to set up and add semantic search via transformers as an Elasticsearch index. We go through all steps needed and will introduce the utility class ElasticTran

semantic chunk

网络

python

java

linux

转载

mob64ca140ee96c

2024-08-06 20:47:03

41阅读

late chunking 源码分析-https://github.com/jina-ai/late-chunking

import bisect import logging from typing import Dict, List, Optional, Tuple, Union from llama_index.core.node_parser import Seman

ci

分块

List

原创

AI算法专家李智华

11月前

133阅读

docx文本分块 chunking java

写在前面之前看了一篇很不错的外文博客，结合自己查阅学习的一些论文和资料，加上自己的理解，整理了一些内容，准备来跟大家分享关于文本分割任务的相关内容。文本分割任务的目的是将文本划分为若干有意义的文本块，不同的分割目的有不同的分割粒度，比如：词、句子或者主题。今天我们将要分享的文本分割任务的分割粒度聚焦在主题上，这类文本分割任务也称为主题分割：识别文本主题的过渡从而将长文本划分若干具有不同主题的文本块

人工智能

自然语言处理

模型预测

lua

评价指标

转载

技术领航博主

9月前

64阅读

RAG分块策略：主流方法（递归、jina-seg）+前沿推荐（Meta-chunking、Late chunking）

AI大模型作为人工智能领域的重要技术突破，正成为推动各行各业创新和转型的关键力量。抓住AI大模型的风口，掌握AI大模型的知识和技资料免费分享！

人工智能

大模型

LLM

ai

语言模型

原创

沈页dd

10月前

259阅读

RAG中late chunking的实验效果测试

代码： import os import json import torch import numpy as np import spacy from spacy.tokens import Doc from spacy.langu

ci

类方法

相似度计算

原创

AI算法专家李智华

11月前

56阅读

RAG分块策略：主流方法（递归、jina-seg）+前沿推荐（Meta-chunking、Late chunking、SLM-SFT）

RAG分块策略：主流方法（递归、jina-seg）+前沿推荐（Meta-chunking、Late chunking、SLM-SFT）

分块

数据

向量化

原创

汀丶人工智能

10月前

156阅读

RAG中late chunking的实验效果测试（续）

前文使用了jina ai v2的模型，接下来我们看看v3版本late chunking的实际效果，为了快速，我直接使用官方的api！ # import requests # url = 'https://api.jina.ai/v1/embeddings' headers = { 'Content-

List

相似度

json

原创

AI算法专家李智华

11月前

53阅读

Chunking：基于大模型RAG系统中的文档分块

【引】“枯萎，无法回避，如人之生老病死；荒芜，无法接受，如碌碌无为一生。” 这是周六回乡下除草的感受。有所得，有所感，对工程技术也是如此。将大文档分割成较小的

分块

递归

初始化

原创精选

wireless_com

2024-08-12 14:50:49

390阅读

[Webpack 2] Chunking common modules from multiple apps with the Webpack CommonsChunkPlugin

If you have a multi-page application (as opposed to a single page app), you’re likely sharing modules between these pages. By chunking these common mo

bundle

css

sed

Webpack

转载

mob604756e75222

2016-06-23 01:59:00

80阅读

2评论

RAG中late chunking的实验效果测试（续2）

针对前面RAG测试的长文本问题，我又增加了长文本测试（代码同前）： context_test_documents = [ # 文档1:

量子计算

深度学习

相似度

原创

bonelee

11月前

0阅读

从零开始优化 RAG：7 种 Chunking 方法让你的系统更智能

在构建 Retrieval-Augmented Generation（RAG）系统时，如何高效地处理外部知识，是实现强大问答能力的关键。Chunking 是 RAG 技

人工智能

语言模型

ai

agi

LLM

原创

datian1234

10月前

130阅读

你的AI为何答非所问？解密RAG文档解析与知识切分（Chunking）的核心挑战

1 开源解析和拆分文档第三方的工具去对文件解析拆分，去将我们的文件内容给提取出来，并将我们的文档内容去拆分成一个小的chunk。常见的PDF word mark down, JSON、HTML。都可以有很好的一些模块去把这些文件去进行一个东西去提取。优势支持丰富的文档类型每种文档多样化选择与开源框架无缝集成但有时效果非常差，来内容跟原始的文件内容差别大。 2 PDF格式多样性

Java

原创

公众号JavaEdge

2024-04-11 22:12:51

185阅读

你的AI为何答非所问？解密RAG文档解析与知识切分（Chunking）的核心挑战

1 开源解析和拆分文档第三方工具去对文件解析拆分，将文件内容给提取出来，并将我们的文档内容去拆分成一个小的chunk。常见的PDF word mark down, JSON、HTML。都可以有很好的一些模块去把这些文件去进行一个东西去提取。 1.1 优势支持丰富的文档类型每种文档多样化选择与开源框架无缝集成但有时效果非常差，来内容跟原始的文件内容差别大。 2 PDF格式多样性复

Java

原创

公众号JavaEdge

2024-06-24 10:30:16

61阅读

实话实说，RAG其实并没有你想的那么简单，Late Chunking vs Contextual Retrieval解决上下文难题

RAG是一种将外部知识库检索与生成模型相结合的技术，不过最近的Agent，MCP喧嚣至上，包括DS-R1模型的热度

#人工智能

#AI大模型

#大模型入门

#大模型学习

#大模型

原创

bugyinyin

1月前

33阅读

大模型应用利器：一文RAG文本chunking方法汇总，零基础小白收藏这一篇就够了！！

文章详细介绍了RAG系统中文本Chunking技术，包括定义、必要性、多种分类策略及适用场景。提出了基于token级的评估框架，通过实

人工智能

大模型学习

AI大模型

大模型入门

RAG

原创

datian1234

1月前

83阅读

Could not initialize English chunker/Could not load file from classpath: ‘/en-token.bin‘

具体错误：java.lang.RuntimeException: Could not initialize English chunker at org.languagetool.chunking.EnglishChunker.<init>(EnglishChunker.java:72)

LanguageTool

java

sed

原创

柳鲲鹏泰山

2021-10-08 14:19:25

175阅读

NLP自然语言处理中token是什么意思

Chapter 2. 传统NLP快速回顾Corpora，Tokens and TypesUnigrams，Bigrams，Trigrams，NgramsngtramsngramLemmas and StemsCategorizing Sentences and DocumentsCategorizing Words: POS TaggingCategorizing Spans: Chunking

nlp

python

Parse

分块

Word

转载

云端筑梦工匠

2024-09-17 13:17:59

178阅读

Deep Learning 工具收集

SENNA工具包：part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), semantic role labeling (SRL) and syntactic parsing (PSG)

deep learning

原创

ibright

2013-02-14 13:58:56

708阅读

Building a Better Vocabulary: Lecture 2 The Spelling-Meaning Connection

We discussed five core principles of effective vocabulary learning: starting with clear definitions, putting words into context, making connections between known concepts and new words, exploring the morphology and etymology of words, and chunking words b.

sed

ide

ico

ios ide

原创

天人合一peng

2021-08-18 10:41:44

116阅读

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

51CTO博客

Chunking

Persisten Connections, Pipelining and chunking

semantic chunk semantic chunking 工具

late chunking 源码分析-https://github.com/jina-ai/late-chunking

docx文本分块 chunking java

RAG分块策略：主流方法（递归、jina-seg）+前沿推荐（Meta-chunking、Late chunking）

RAG中late chunking的实验效果测试

RAG分块策略：主流方法（递归、jina-seg）+前沿推荐（Meta-chunking、Late chunking、SLM-SFT）

RAG中late chunking的实验效果测试（续）

Chunking：基于大模型RAG系统中的文档分块

[Webpack 2] Chunking common modules from multiple apps with the Webpack CommonsChunkPlugin

RAG中late chunking的实验效果测试（续2）

从零开始优化 RAG：7 种 Chunking 方法让你的系统更智能

你的AI为何答非所问？解密RAG文档解析与知识切分（Chunking）的核心挑战

你的AI为何答非所问？解密RAG文档解析与知识切分（Chunking）的核心挑战

实话实说，RAG其实并没有你想的那么简单，Late Chunking vs Contextual Retrieval解决上下文难题

大模型应用利器：一文RAG文本chunking方法汇总，零基础小白收藏这一篇就够了！！

Could not initialize English chunker/Could not load file from classpath: ‘/en-token.bin‘

NLP自然语言处理中token是什么意思

Deep Learning 工具收集

Building a Better Vocabulary: Lecture 2 The Spelling-Meaning Connection

解锁非结构化数据价值：unstructured 库常用处理方法全解析

【AI大模型】智能主体分块技术，让AI检索更精准的完整实现指南，建议收藏！！

Could not initialize English chunker/Could not load file from classpath: ‘/en-token.bin‘

Bidirectional LSTM-CRF Models for Sequence Tagging

springbatch flow 并行 springbatch step

直接用Telnet发送SMTP邮件

cdc-file-transfer 谷歌开源的windows 到linux 同步工具

open nlp OpenNLP是什么软件

RAG 系统文本切分Chunk有哪些方法？

Apache Open NLP apache opennlp平台