java 实现字典排序 java实现字典树

转载

mob64ca14092155 2023-09-01 10:28:10

文章标签 java 实现字典排序数据结构算法 java 字典树 文章分类 Java 后端开发

文章目录

字典树

字典树结构
字典树添加、查找、删除操作

添加

查找
删除

相关题目

单词拆分

方法一：字典树+BFS
方法二：字典树+DFS
方法三：动态规划

单词拆分Ⅱ:字典树+DFS
添加与搜索单词 DFS
单词搜索Ⅱ：字典树+DFS
连接词

字典树

字典树结构

字典树又叫前缀树、Trie树。

字典树是一种树形结构，优点是利用字符串的公共前缀来节约存储空间，减少查询时间，最大限度地减少无谓的字符串比较。

如图所示，是存入字符串的前缀树结构

java 实现字典排序 java实现字典树_字典树

由图我们也可以得出字典树的基本属性：

class TrieNode{
    public int path; 							//经过当前节点的单词个数（删除操作会用到）
    public int end; 							//以当前节点结束的单词的个数
    public HashMap<Character, TrieNode> next; 	//当前节点能连接的所有其他节点

    public TrieNode(){
        path = 0;
        end = 0;
        next = new HashMap<>();
    }
}

字典树添加、查找、删除操作

添加

如果根节点开始，找到当前字符节点，有就继续找下一个，没有就生成新的节点，插入其中。直到插入最后一个字符。
这个过程中给经过的节点path进行计数，最后一个节点插入同时给end计数。
代码实现：

class Trie{
    private TrieNode root;
    public Trie(){
        root = new TrieNode();
    }
    public void insert(String word) {
        if(word == null || word.equals("")) return;
        TrieNode cur = root;
        for (int i = 0; i < word.length(); i++) {
            char c =word.charAt(i);
            if(!cur.next.containsKey(c)) {  //没有节点就添加新结点，有就直接往下找
                cur.next.put(c,new TrieNode());
            }
            cur = cur.next.get(c);
            cur.path++;
        }
        cur.end++;
    }
}

查找

从根节点出发，按字符找结点，匹配不到，或最后结点end为0说明不存在
匹配到了最后一个字符结点且结点的end不为0说明找到了

public boolean search(String word) {
        if(word == null || word.equals("")) return false;
        TrieNode cur = root;
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            if(!cur.next.containsKey(c)) return false;
            cur = cur.next.get(c);
        }
        return cur.end != 0;
    }

删除

从根节点出发，按字符找结点，然后让结点的path-1，如果path-1后为0，那么直接把这个结点置空
应为path是单词经过的次数，如果为0，说明没有任何一个单词经过这个结点，也就是说这个结点是没用的

public void delete(String word) {
        if(word == null || word.equals("")) return;
        TrieNode cur = root;
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            if(!cur.next.containsKey(c)) return;
            if(--cur.next.get(c).path == 0) {
                cur.next.remove(c);
                return;
            }
            cur = cur.next.get(c);
        }
        cur.end--;
    }

练习题目力扣.208. 实现 Trie (前缀树)

单词拆分

题目：给你一个字符串 s 和一个字符串列表 wordDict 作为字典。请你判断是否可以利用字典中出现的单词拼接出 s

注意：不要求字典中出现的单词全部都使用，并且字典中的单词可以重复使用。
输入: s = "leetcode", wordDict = ["leet", "code"]
输出: true

方法一：字典树+BFS

将字典存入字典树
找到以字母s[i]为开头的所有匹配字符串的单词，i从0开始，假如能找到匹配s[0…k]的单词，就从s[k + 1]开始再去匹配单词
直到找到最后的单词匹配 s[?..n-1]这一段字符串，说明可以用字典中的单词拼出s
如图所示：

例如 s= catsrat首先从s[0] = c开始找，
找到单词cat,将3加入队列,继续找到单词cats将4加入队列
然后从s[3] = s 开始找，找到了单词s。将4加入队列【4加入过了不再加入】
然后发现6就是s的长度了，说明找到了

BFS代码：

class Solution {
    public boolean wordBreak(String s, List<String> wordDict) {
        Trie trie = new Trie();
        //将字典内容加入字典树
        for(String word : wordDict) {
            trie.insert(word);
        }
        //BFS 找字典中单词
        char[] arr = s.toCharArray();
        int n = s.length();
        Queue<Integer> queue = new LinkedList();
        boolean[] vis = new boolean[n + 1];
        queue.add(0);
        vis[0] = true;
        
        while(!queue.isEmpty()) {
            TrieNode cur = trie.getRoot();//从根节点开始找
            int index = queue.poll();
            //System.out.println(index);
            for(;index < n; index++) {
                cur = cur.next.get(arr[index]);
                //System.out.println(cur);
                if(cur == null) break;  //找不到以这个索引对应字符，且匹配字符串的单词，退出循环（相当于这个索引到这就失效了）
                if(cur.isEnd && !vis[index + 1]) {
                    queue.add(index + 1);
                    vis[index + 1] = true;
                }
            }
            if(index == n && cur.isEnd) return true;
        }
        return false;
    }
}

方法二：字典树+DFS

DFS查找过程如下图所示：

java 实现字典排序 java实现字典树_字典树_02

例如 s= catsrat首先从s[0] = c开始找，
找到单词cat,然后从s[3] = s开始找，
找到单词s然后从s[4] = r 开始找，找到单词rat 找到最后一个单词，结束
如果第二步找不到单词s，就会回溯的上一步，找到单词cats然后从s[4]开始找，找到单词rat

DFS代码：

public class Solution {
    boolean ret = false;
    public boolean wordBreak(String s, List<String> wordDict) {
        Trie trie = new Trie();
        //将字典内容加入字典树
        for(String word : wordDict) {
            trie.insert(word);
        }
        boolean[] visited = new boolean[s.length() + 1];
        visited[0] = true;
        dfs(s,trie,0, visited);
        return ret;

    }
    
    void dfs(String s, Trie trie, int index, boolean[] visited) {
        if(!ret) {
            if(index == s.length()) {
                ret = true;
                return;
            }
            TrieNode cur = trie.getRoot();
            for(int i = index; i < s.length(); i++) {
                char c = s.charAt(i);
                cur = cur.next.get(c);
                if(cur == null) break;
                if(cur.isEnd && !visited[i + 1]) {
                    visited[i + 1] = true;
                    dfs(s,trie,i + 1,visited);
                }
            }
        }
    }
}

方法三：动态规划

这道题目还可以用动态规划来做。详见【Java数据结构与算法】动态规划技巧与相关题目题解思路。//TODO

单词拆分Ⅱ:字典树+DFS

题目：和上题类似，这题需要找到匹配的具体单词，和怎样匹配的。

给定一个非空字符串 s 和一个包含非空单词列表的字典 wordDict，在字符串中增加空格来构建一个句子，
使得句子中所有的单词都在词典中。返回所有这些可能的句子。

思路：字典树+DFS

从字典树根部开始，找到一个单词就，继续找下一个字母开始的单词，直到找到最后一个单词，
然后回溯将每层递归找到的单词添加到句子前面，最终组成完整句子。
这样做的好处是，我们能递归到最深处发现这条路能行的通才去拼接句子

class Solution {
    List<String> ans = new ArrayList<>();
    public List<String> wordBreak(String s, List<String> wordDict) {
        Trie trie = new Trie();
        //将字典内容加入字典树
        for(String word : wordDict) {
            trie.insert(word);
        }
        dfs(s,0,trie,new StringBuilder());
        return ans;
    }

    public void dfs(String s, int index, Trie trie, StringBuilder sb) {
        if(index == s.length()) {
            //移除最后一个空格,移除完再添加回去一个东西，用于回溯到上一步移除
            sb.deleteCharAt(sb.length() - 1);
            ans.add(sb.toString());
            sb.append(" ");
            return;
        }
    
        TrieNode cur = trie.getRoot();
        String ori = sb.toString();
        for (int i = index; i < s.length(); i++) {
            char c = s.charAt(i);
            sb.append(c);
            cur = cur.next.get(c);
            if(cur == null) break;
            //找到一个单词，再去找下一个单词
            if(cur.isEnd) {
                sb.append(" ");
                dfs(s,i + 1,trie, sb);//找不到下一个单词,说明找到的这个单词不能用，继续i++找下一个。
                sb.deleteCharAt(sb.length() - 1);
            }
        }
        //以s[index]开头的匹配的单词找完了，回溯到之前的状态
        sb.delete(0,sb.length());
        sb.append(ori);
    }
}
//字典树结点
class TrieNode{
    public boolean isEnd;
    public HashMap<Character, TrieNode> next;

    public TrieNode(){
        isEnd = false;
        next = new HashMap<>();
    }
}
//字典树
class Trie{
    private TrieNode root;
    public Trie(){
        root = new TrieNode();
    }

    public TrieNode getRoot(){
        return root;
    }

    public void insert(String word) {
        if(word == null || word.equals("")) return;
        TrieNode cur = root;
        for (int i = 0; i < word.length(); i++) {
            char c =word.charAt(i);
            if(!cur.next.containsKey(c)) { //没有节点就添加新结点，有就直接往下找
                cur.next.put(c,new TrieNode());
            }
            cur = cur.next.get(c);
        }
        cur.isEnd = true;
    }
}

添加与搜索单词 DFS

⭐⭐
题目：与设计前缀树基本相同，添加完全一样，查找多考虑了 ‘.’ 这种情况；

实现词典类 WordDictionary ：
WordDictionary() 初始化词典对象
void addWord(word) 将 word 添加到数据结构中，之后可以对它进行匹配
bool search(word) 如果数据结构中存在字符串与 word 匹配，则返回 true ；否则，返回 false 。
word 中可能包含一些 ‘.’ ，每个 . 都可以表示任何一个字母。

用DFS进行查找：
如果当前字符是 . 就找所有的字符结点，往下走，看有没有这个词。
如果当前字符不是 . 找这个字符结点，往下走。
查找代码：

public boolean search(String word) {
        return dfs(word,0,root);
    }
    private boolean dfs(String word, int index, TrieNode curNode) {
        if(index == word.length()) return curNode.isEnd;
        char c = word.charAt(index);
        if(c == '.') {
            for(int j = 0; j < 26; j++) {
                TrieNode next = curNode.next.get((char)(j + 'a'));
                if(next != null && dfs(word, index + 1, next)){
                    return true;
                }
            }
            return false;
        }else {
            TrieNode next = curNode.next.get(c);
            if(next != null && dfs(word, index + 1, next)) {
                return true;
            }else {
                return false;
            }
        }
    }

单词搜索Ⅱ：字典树+DFS

⭐⭐⭐
题目大意：从二维字符数组中找连其起来的词，这个连起来的词在字典中存在。

思路：给字典树加一个String val 属性，每次添加后将isEnd设为true的同时把对应的单词也设置进去

将字典添加入字典树
从board一个点开始，一直往下找字典树中有的单词。
找过的结点记得进行标记
字典树中有即添加到返回列表，然后将字典树中这个单词isEnd置为false防止重复
字典树中没有下一个结点，说明往下不管怎么找都没这个单词，所有直接返回
一个字符四个方向找完进行回溯，记得把结点设置为未访问

class Solution {
    List<String> ret = new ArrayList<>();
    public List<String> findWords(char[][] board, String[] words) {
        Trie trie = new Trie();
        for(String word : words) {
            trie.inseart(word);
        }
        int row = board.length, col = board[0].length;
        TrieNode root = trie.getRoot();
       
       boolean[][] visited = new boolean[row][col];//dfs每次回溯完会把所有位置的标志都恢复
        for(int i = 0; i < board.length; i++) {
            for(int j = 0; j < board[0].length; j++) {
                dfs(i, j, board,row, col, root, visited);
            }
        }
        return ret;
    }

    public void dfs(int x, int y, char[][] board, int row, int col, TrieNode curNode, boolean[][] visited) {
        if(x < 0 || x >= row || y >= col || y < 0 || visited[x][y]) return;
        TrieNode next = curNode.next.get(board[x][y]);
        if(next == null) return;
        visited[x][y] = true;
        if(next.isEnd) {
            ret.add(next.val);
            next.isEnd = false; //将单词从字典中删除，防止重复
        }
        dfs(x - 1, y, board, row, col, next, visited);
        dfs(x + 1, y, board, row, col, next, visited);
        dfs(x, y - 1, board, row, col, next, visited);
        dfs(x, y + 1, board, row, col, next, visited);
        visited[x][y] = false; //修改结点为为访问状态
    }
}

class Trie{
    private TrieNode root;
    public Trie(){
        root = new TrieNode();
    }
    public TrieNode getRoot(){
        return root;
    }
    public void inseart(String word) {
        TrieNode cur =  root;
        for(int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            if(!cur.next.containsKey(c)) {
                cur.next.put(c,new TrieNode());
            }
            cur = cur.next.get(c);
        }
        cur.isEnd = true;
        cur.val = word;
    }
}

class TrieNode{
    String val; //存储end节点对应的字符串
    boolean isEnd;
    HashMap<Character,TrieNode> next;

    public TrieNode(){
        isEnd = false;
        val = null;
        next = new HashMap<>();
    }
}

连接词

⭐⭐⭐
题目：给你一个不含重复单词的字符串数组 words ，请你找出并返回 words 中的所有连接词。

连接词定义为：一个完全由给定数组中的至少两个较短单词组成的字符串。

思路：
被拼成的长单词，不用加入字典树，因为被拼成的长单词，如果能拼成其他单词，那么短的单词也可以拼成那个单词。
先把短的单词加入字典树，这样对于一个新的单词，如果能被拼成，他一定是由比他短的单词拼成的。

步骤：

把字符串数组按长度排序
如果一个单词能被字典树中其他单词拼成就把他加入返回列表，如果不能就把它加入字典树
怎么判断一个单词能被字典树中单词拼成？同上面第一题单词拆分

class Solution {
   public List<String> findAllConcatenatedWordsInADict(String[] words) {
       Arrays.sort(words,(o1,o2) -> o1.length() - o2.length());
       Trie trie = new Trie();
       List<String> ret = new ArrayList<>();
       TrieNode root = trie.getRoot();
       for(String word : words) {
          if(word.length() > 0 && wordBreak(word,root, 0)){
              ret.add(word);
          }else {
              trie.inseart(word);
          }
       }
       return ret;
   }

   //DFS
   public boolean wordBreak(String word, TrieNode root, int index) {
       if(index == word.length()) return true;
       TrieNode cur = root;
       for(int i = index; i < word.length(); i++) {
           cur = cur.next.get(word.charAt(i));
           if(cur == null) break;
           if(cur.isEnd && wordBreak(word, root, i + 1)){
               return true;
           }
       }
       return false;
   }

   /**
   因为words中不含重复字符串，所有，新的单词要么能被拆分成两个即以上，要么不能，不会出现相等，导致误判
    */
    //BFS
   public boolean wordBreak2(String word, TrieNode root) {
      Queue<Integer> queue = new LinkedList<>();
      queue.add(0);
      boolean[] visited = new boolean[word.length() + 1];
      visited[0] = true;
      while(!queue.isEmpty()) {
          TrieNode cur = root;//每次从根节点开始找
          int index = queue.poll();
          for(; index < word.length(); index++) {
              char c = word.charAt(index);
              cur = cur.next.get(c);
              if(cur == null) break;
              if(cur.isEnd && !visited[index + 1]) {
                  queue.add(index + 1);
                  visited[index + 1] = true;
              }
          }
           if(index == word.length() && cur.isEnd) {
               return true;
           }
      }
      return false;
   }
}

class TrieNode{
   boolean isEnd;
   HashMap<Character,TrieNode> next;

   public TrieNode(){
       isEnd = false;
       next = new HashMap<>();
   }
}

class Trie{
   private TrieNode root;

   public Trie(){
       root = new TrieNode();
   }

   public TrieNode getRoot(){
       return root;
   }

   public void inseart(String word) {
       if(word == null || word.equals("")) return;
       TrieNode cur = root;
       for(int i = 0; i < word.length(); i++) {
           char c = word.charAt(i);
           if(!cur.next.containsKey(c)) {
               cur.next.put(c, new TrieNode());
           }
           cur = cur.next.get(c);
       }
       cur.isEnd = true;
   }
}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。