首先如果有熟悉Python的童鞋也可以看一看Python实现SLR(1)语法分析器,编译原理yyds!__FF_Y的博客_python语法分析器,是我一个特别好的学长写的,这波属于传承了属于是。
可能有人觉得Java做这个会比较难,但其实在我看来还好了,前提是要对Java的特性和面向对象的思想有一定的理解。
首先来说一下思路吧,主要关注一下数据结构。我做这个的顺序是先实现的词法分析器,然后去写的文法,之后就是项目集族、FIRST集和FOLLOW集、SLR(1)分析表,最后完成了语法分析器。
由于我被面向对象的思想“污染”的比较深,我把文法中的每一个终结符和非终结符都建了一个类,而这使得构建和修改文法方便了许多。
在词法分析中,我可以用正则匹配来方便的得到每一个终结符(教程:Java 正则表达式 | 菜鸟教程 ,关注lookingAt方法),并生成一个对象存入List中。在正则匹配的过程中,由于会出现例如“>”和“>=”这种有前包含关系的终结符(我需要先匹配“>=”再匹配“>”才能避免“>=”被认作大于号),此时我们只需要维护一个匹配的顺序。当然我们也要把例如“@”这种不在终结符中出现的,以及“1a”这种既不是关键字又不符合变量命名规范的给挑出来,这时候就要报错了。
在构建文法时,由于产生式的左部都是非终结符,右部都是文法符号的一个序列,我们就可以让终结符类(Word)和非终结符类(NTChar)都继承自文法符号类(Char),在NTChar中有一个Char的List来充当产生式的右部。 考虑到之后要使用Set来使求FIRST集和FOLLOW集时变得简单,我们想让Set中不会同时出现两个A(A是一个文法符号),于是我们需要使A类的两个对象可以被看做是同一个,于是我们需要重写Char的equals和hashCode方法。在文法类(Grammar)中,我们只需要维护产生式(NTChar的List),文法符号集可由遍历每个产生式的左部和右部获得。
最后就是项目类(Item)的数据结构,由于项目就是一个带“·”的产生式,我们就可以让Item中有一个NTChar和一个int分别表示产生式和“·”的位置,例如(A→aB,0)表示A→·aB,(A→BO,1)表示A→a·B,(A→BO,2)表示A→aB·,以此类推。当然由于之后要用到项目集,我们还是要重写Item的equals和hashCode方法来让Set能够识别相同的项目。
顺便提一嘴,分析表我用的是Map[],ACTION表和GOTO表的每一个状态对应一个Map,Map的key是文法符号,value是表中的数据,如果Map[k].get(Char)返回null则表示error。
综上所述,我们只需要维护匹配的顺序和产生式,并相应地创建需要用到的文法符号类,便可以让分析器自动完成分析,且所有中间结果都是动态生成的。于是我们实现了解耦。
文法:
<源程序>→<函数体列表>
<函数体列表>→<函数体>|<函数体列表><函数体>
<函数体>→<变量类型><函数名><形参><复合语句>
<变量类型>→<float>|<int>|<string>|<bool>
<函数名>→<main>|<ID>
<形参>→()|(<变量类型><ID>)
<复合语句>→{}|{<语句列表>}
<语句列表>→<语句>|<语句列表><语句>
<语句>→<定义语句>|<赋值语句>|<条件语句>|<循环语句>|<返回语句>
<定义语句>→<变量类型><ID><;>
<赋值语句>→<变量赋值语句>|<变量类型><变量赋值语句>
<变量赋值语句>→<ID>=<表达式><;>
<表达式>→(<表达式>)|<数值>|<表达式><算数运算符><表达式>
<数值>→<ID>|<常量>
<常量>→<float_value>|<int_value>|<string_value>|<bool_value>
<算数运算符>→+|-|*|/
<条件语句>→<S1>|<S2>
<S1>→<if>(<判断语句>)<复合语句>
<S2>→<S1>else<复合语句>
<判断语句>→<数值><逻辑运算符><数值>
<逻辑运算符>→>|<|>=|<=|==|!=
<循环语句>→<while>(<判断语句>)<复合语句>
<返回语句>→<return><数值><;>
非终结符:
S:<源程序>
O:<函数体列表>
F:<函数体>
T:<变量类型>
N:<函数名>
X:<形参>
A:<复合语句>
L:<语句列表>
Y:<语句>
D:<定义语句>
Z:<赋值语句>
I:<条件语句>
W:<循环语句>
R:<返回语句>
B:<变量赋值语句>
E:<表达式>
V:<数值>
G:<算数运算符>
C:<常量>
J:<S1>
K:<S2>
P:<判断语句>
Q:<逻辑运算符>
产生式:
S→O
O→F
O→OF
F→TNXA
T→<float>
T→<int>
T→<string>
T→<bool>
N→<main>
N→<ID>
X→()
X→(T<ID>)
A→{}
A→{L}
L→Y
L→LY
Y→D
Y→Z
Y→I
Y→W
Y→R
D→T<ID><;>
Z→B
Z→TB
B→<ID>=E<;>
E→(E)
E→V
E→EGE
V→<ID>
V→C
C→<float_value>
C→<int_value>
C→<string_value>
C→<bool_value>
G→+
G→-
G→*
G→/
I→J
I→K
J→<if>(P)A
K→J<else>A
P→VQV
Q→>
Q→<
Q→>=
Q→<=
Q→==
Q→!=
W→<while>(P)A
R→<return>V<;>
Char类:
//文法符号
public abstract class Char {
//是否为终结符
@Getter
protected boolean isTerminal;
public Char() {
setIsTerminal();
}
/**
* 设置isTerminal
*
* @author 李电楠
*/
protected abstract void setIsTerminal();
/**
* 重写
* 判断两文法符号是否属于同一类
*
* @param o 文法符号
* @return boolean
* @author 李电楠
*/
@Override
public boolean equals(Object o) {
if (this == o) return true;
return getClass().isInstance(o);
}
/**
* 重写
* 使同一文法符号类的不同对象哈希值相同
*
* @return int
* @author 李电楠
*/
@Override
public int hashCode() {
return getClass().hashCode();
}
}
Word类:
//终结符
@ToString
public abstract class Word extends Char {
@Getter
@ToString.Exclude
protected Pattern pattern;
//种别码
@Getter
protected int type;
//属性值
@Getter
protected Object value;
@Override
protected void setIsTerminal() {
isTerminal = true;
}
/**
* 编译正则表达式并赋值给pattern
* 返回this
*
* @return wordanalyzer.Word
* @author 李电楠
*/
public abstract Word makePattern();
/**
* 设置属性值
*
* @param value 匹配到的字符串
* @author 李电楠
*/
public void setValue(String value) {
this.value = '"' + value + '"';
}
}
Patterns类:
//正则表达式
public class Patterns {
public static final String OTHER = ".";
public static final String OTHER_ID = "\\d+[A-Za-z_]+";
public static final String ID = "[A-Za-z]\\w*";
public static final String FLOAT_VALUE = "\\d+\\.\\d+";
public static final String INT_VALUE = "\\d+";
public static final String STRING_VALUE = "\".*\"";
public static final String BOOL_VALUE = "false|true";
//关键字
public static final String FLOAT = "float";
public static final String INT = "int";
public static final String STRING = "string";
public static final String BOOL = "bool";
public static final String IF = "if";
public static final String ELSE = "else";
public static final String WHILE = "while";
public static final String CONTINUE = "continue";
public static final String BREAK = "break";
public static final String MAIN = "main";
public static final String RETURN = "return";
//分隔符
public static final String SPACE = "\\s+";
public static final String XKH_L = "\\(";
public static final String XKH_R = "\\)";
public static final String ZKH_L = "\\[";
public static final String ZKH_R = "\\]";
public static final String DKH_L = "\\{";
public static final String DKH_R = "\\}";
public static final String SEMICOLON = ";";
//运算符
public static final String ADD = "\\+";
public static final String SUBTRACT = "-";
public static final String MULTIPLY = "\\*";
public static final String DIVIDE = "/";
//注释
public static final String ANNOTATION = "//.*";
//赋值
public static final String ASSIGN = "=";
//判断
public static final String GREATER_EQUAL = ">=";
public static final String LESS_EQUAL = "<=";
public static final String GREATER = ">";
public static final String LESS = "<";
public static final String EQUAL = "==";
public static final String NOTEQUAL = "!=";
//#
public static final String END = "#";
}
Types类:
//种别码
public class Types {
public static final int OTHER = -1;
public static final int ID = 0;
public static final int FLOAT_VALUE = 1;
public static final int INT_VALUE = 2;
public static final int STRING_VALUE = 3;
public static final int BOOL_VALUE = 4;
//关键字
public static final int FLOAT = 5;
public static final int INT = 6;
public static final int STRING = 7;
public static final int BOOL = 8;
public static final int IF = 9;
public static final int ELSE = 10;
public static final int WHILE = 11;
public static final int CONTINUE = 12;
public static final int BREAK = 13;
public static final int MAIN = 14;
public static final int RETURN = 15;
//分隔符
public static final int SPACE = 16;
public static final int XKH_L = 17;
public static final int XKH_R = 18;
public static final int ZKH_L = 19;
public static final int ZKH_R = 20;
public static final int DKH_L = 21;
public static final int DKH_R = 22;
public static final int SEMICOLON = 23;
//运算符
public static final int ADD = 24;
public static final int SUBTRACT = 25;
public static final int MULTIPLY = 26;
public static final int DIVIDE = 27;
//注释
public static final int ANNOTATION = 28;
//赋值
public static final int ASSIGN = 29;
//判断
public static final int GREATER_EQUAL = 30;
public static final int LESS_EQUAL = 31;
public static final int GREATER = 32;
public static final int LESS = 33;
public static final int EQUAL = 34;
public static final int NOTEQUAL = 35;
//#
public static final int END = 36;
}
【例】ADD类:
public class ADD extends Word {
public ADD() {
type = Types.ADD;
}
@Override
public Word makePattern() {
pattern = Pattern.compile(Patterns.ADD);
return this;
}
}
AnalyzeOrder类:
//正则匹配顺序
public class AnalyzeOrder {
//终结符匹配顺序列表
@Getter
private static final ArrayList<Word> words = new ArrayList<>();
static {
words.add(new ANNOTATION().makePattern());
words.add(new OTHER_ID().makePattern());
words.add(new FLOAT_VALUE().makePattern());
words.add(new INT_VALUE().makePattern());
words.add(new STRING_VALUE().makePattern());
words.add(new BOOL_VALUE().makePattern());
words.add(new FLOAT().makePattern());
words.add(new INT().makePattern());
words.add(new STRING().makePattern());
words.add(new BOOL().makePattern());
words.add(new IF().makePattern());
words.add(new ELSE().makePattern());
words.add(new WHILE().makePattern());
words.add(new CONTINUE().makePattern());
words.add(new BREAK().makePattern());
words.add(new MAIN().makePattern());
words.add(new RETURN().makePattern());
words.add(new SPACE().makePattern());
words.add(new XKH_L().makePattern());
words.add(new XKH_R().makePattern());
words.add(new ZKH_L().makePattern());
words.add(new ZKH_R().makePattern());
words.add(new DKH_L().makePattern());
words.add(new DKH_R().makePattern());
words.add(new SEMICOLON().makePattern());
words.add(new ADD().makePattern());
words.add(new SUBTRACT().makePattern());
words.add(new MULTIPLY().makePattern());
words.add(new DIVIDE().makePattern());
words.add(new GREATER_EQUAL().makePattern());
words.add(new LESS_EQUAL().makePattern());
words.add(new GREATER().makePattern());
words.add(new LESS().makePattern());
words.add(new EQUAL().makePattern());
words.add(new NOTEQUAL().makePattern());
words.add(new ASSIGN().makePattern());
words.add(new ID().makePattern());
words.add(new OTHER().makePattern());
}
}
WordAnalyzer类:
//词法分析器
public class WordAnalyzer {
//终结符匹配顺序列表
private final ArrayList<Word> words;
//分析结果
private final ArrayList<Word> result = new ArrayList<>();
public WordAnalyzer() {
words = AnalyzeOrder.getWords();
}
/**
* 词法分析
*
* @param code 源程序
* @return java.util.ArrayList<wordanalyzer.Word>
* @author 李电楠
*/
public ArrayList<Word> analyze(String code) throws Exception {
int pos = 0;
while (pos < code.length()) {
for (Word i : words) {
Pattern pattern = i.getPattern();
//正则匹配
Matcher matcher = pattern.matcher(code);
//匹配范围
matcher.region(pos, code.length());
//lookingAt()会把pos位置的字符作为起点进行匹配
if (matcher.lookingAt()) {
//设置新起点
pos = matcher.end();
switch (i.getType()) {
//碰到其他字符,抛出异常
case Types.OTHER:
throw new WordAnalyzeException("非法文法符号:" + matcher.group());
//碰到是空白字符或注释,跳过
case Types.SPACE:
case Types.ANNOTATION:
break;
default:
addWordToResult(i, matcher.group());
}
break;
}
}
}
//将#加入result中
result.add(new END());
return result;
}
/**
* 将终结符加入result中
*
* @param origWord 匹配用的Word对象
* @param value 属性值
* @author 李电楠
*/
private void addWordToResult(Word origWord, String value) throws Exception {
Word newWord = origWord.getClass().newInstance();
newWord.setValue(value);
result.add(newWord);
}
}
NTChar类:
//非终结符(可看作产生式左部)
@ToString
public abstract class NTChar extends Char {
//产生式右部
@Getter
private final ArrayList<Char> prodRight = new ArrayList<>();
@Override
protected void setIsTerminal() {
isTerminal = false;
}
/**
* 将c加入产生式右部
*
* @param c 文法符号
* @return grammar.NTChar
* @author 李电楠
*/
public NTChar a(Char c) {
prodRight.add(c);
return this;
}
}
【例】A类:
public class A extends NTChar {
}
Grammar类:
//文法
public class Grammar {
//产生式
@Getter
private static final ArrayList<NTChar> G = new ArrayList<>();
//文法符号集
@Getter
private static final LinkedHashSet<Char> X = new LinkedHashSet<>();
static {
//S→O
G.add(new S().a(new O()));
//O→F
G.add(new O().a(new F()));
//O→OF
G.add(new O().a(new O()).a(new F()));
//F→TNXA
G.add(new F().a(new T()).a(new N()).a(new X()).a(new A()));
//T→<float>
G.add(new T().a(new FLOAT()));
//T→<int>
G.add(new T().a(new INT()));
//T→<string>
G.add(new T().a(new STRING()));
//T→<bool>
G.add(new T().a(new BOOL()));
//N→<main>
G.add(new N().a(new MAIN()));
//N→<ID>
G.add(new N().a(new ID()));
//X→()
G.add(new X().a(new XKH_L()).a(new XKH_R()));
//X→(T<ID>)
G.add(new X().a(new XKH_L()).a(new T()).a(new ID()).a(new XKH_R()));
//A→{}
G.add(new A().a(new DKH_L()).a(new DKH_R()));
//A→{L}
G.add(new A().a(new DKH_L()).a(new L()).a(new DKH_R()));
//L→Y
G.add(new L().a(new Y()));
//L→LY
G.add(new L().a(new L()).a(new Y()));
//Y→D
G.add(new Y().a(new D()));
//Y→Z
G.add(new Y().a(new Z()));
//Y→I
G.add(new Y().a(new I()));
//Y→W
G.add(new Y().a(new W()));
//Y→R
G.add(new Y().a(new R()));
//D→T<ID><;>
G.add(new D().a(new T()).a(new ID()).a(new SEMICOLON()));
//Z→B
G.add(new Z().a(new B()));
//Z→TB
G.add(new Z().a(new T()).a(new B()));
//B→<ID>=E<;>
G.add(new B().a(new ID()).a(new ASSIGN()).a(new E()).a(new SEMICOLON()));
//E→(E)
G.add(new E().a(new XKH_L()).a(new E()).a(new XKH_R()));
//E→V
G.add(new E().a(new V()));
//E→EGE
G.add(new E().a(new E()).a(new G()).a(new E()));
//V→<ID>
G.add(new V().a(new ID()));
//V→C
G.add(new V().a(new C()));
//C→<float_value>
G.add(new C().a(new FLOAT_VALUE()));
//C→<int_value>
G.add(new C().a(new INT_VALUE()));
//C→<string_value>
G.add(new C().a(new STRING_VALUE()));
//C→<bool_value>
G.add(new C().a(new BOOL_VALUE()));
//G→+
G.add(new G().a(new ADD()));
//G→-
G.add(new G().a(new SUBTRACT()));
//G→*
G.add(new G().a(new MULTIPLY()));
//G→/
G.add(new G().a(new DIVIDE()));
//I→J
G.add(new I().a(new J()));
//I→K
G.add(new I().a(new K()));
//J→<if>(P)A
G.add(new J().a(new IF()).a(new XKH_L()).a(new P()).a(new XKH_R()).a(new A()));
//K→J<else>A
G.add(new K().a(new J()).a(new ELSE()).a(new A()));
//P→VQV
G.add(new P().a(new V()).a(new Q()).a(new V()));
//Q→>
G.add(new Q().a(new GREATER_EQUAL()));
//Q→<
G.add(new Q().a(new LESS_EQUAL()));
//Q→>=
G.add(new Q().a(new GREATER()));
//Q→<=
G.add(new Q().a(new LESS()));
//Q→==
G.add(new Q().a(new EQUAL()));
//Q→!=
G.add(new Q().a(new NOTEQUAL()));
//W→<while>(P)A
G.add(new W().a(new WHILE()).a(new XKH_L()).a(new P()).a(new XKH_R()).a(new A()));
//R→<return>V<;>
G.add(new R().a(new RETURN()).a(new V()).a(new SEMICOLON()));
//动态生成文法符号集
for (NTChar i : G) {
X.add(i);
X.addAll(i.getProdRight());
}
}
}
Item类:
//项目
@ToString
@AllArgsConstructor
public class Item {
//项目左部非终结符
@Getter
private final NTChar leftChar;
//'·'在右部中的位置
//范围:0 ~ leftChar.getProdRight().size()
// 0 即位于最前 e.g. S→O ·
//leftChar.getProdRight().size() 即位于最后 e.g. S→·O
@Getter
private final int pointPos;
/**
* 重写
* 判断两项目是否相同
*
* @param o 项目
* @return boolean
* @author 李电楠
*/
@Override
public boolean equals(Object o) {
if (!getClass().isInstance(o)) return false;
Item x = (Item) o;
return leftChar.equals(x.leftChar) && leftChar.getProdRight().equals(x.leftChar.getProdRight()) && pointPos == x.pointPos;
}
/**
* 使相同的项目哈希值相同
*
* @return int
* @author 李电楠
*/
@Override
public int hashCode() {
return leftChar.hashCode() + leftChar.getProdRight().hashCode() + pointPos;
}
}
ParseTable类:
//SLR(1)分析表
@Data
public class ParseTable {
//ACTION表
private HashMap<Word, A_i>[] ACTION;
//GOTO表
private HashMap<NTChar, G_i>[] GOTO;
//ACTION表中的项 e.g. s1
@Data
@AllArgsConstructor
static class A_i {
//e.g. "s"
private String l;
//e.g. 1
private int r;
}
//GOTO表中的项 e.g. 1
@Data
@AllArgsConstructor
static class G_i {
//e.g. 1
private int r;
}
}
Parser类:
//SLR(1)语法分析器
public class Parser {
//产生式
private final ArrayList<NTChar> G;
//文法符号集
private final LinkedHashSet<Char> X;
//项目集族
private final LinkedHashSet<LinkedHashSet<Item>> C = new LinkedHashSet<>();
//FIRST集
private final HashMap<Char, HashSet<Word>> FIRST = new HashMap<>();
//FOLLOW集
private final HashMap<NTChar, HashSet<Word>> FOLLOW = new HashMap<>();
//SLR(1)分析表
private final ParseTable table = new ParseTable();
public Parser() {
G = Grammar.getG();
X = Grammar.getX();
doMakeTable();
}
/**
* 求给定项目集I的闭包
* CLOSURE(I)=I∪{B→·γ|A→α·Bβ∈CLOSURE(I), B→γ∈P}
*
* @param I 项目集
* @return java.util.LinkedHashSet<parser.Item>
* @author 李电楠
*/
private LinkedHashSet<Item> CLOSURE(LinkedHashSet<Item> I) {
LinkedHashSet<Item> J = new LinkedHashSet<>();
int size;
do {
size = I.size();
//I中的每个项A→α∙Bβ
for (Item A : I) {
ArrayList<Char> prod = A.getLeftChar().getProdRight();
int pointPos = A.getPointPos();
//A不是归约项目
if (pointPos < prod.size()) {
Char B = prod.get(pointPos);
//B是非终结符
if (!B.isTerminal()) {
//G的每个产生式B→γ
for (NTChar p : G) {
if (B.equals(p)) {
//将B→∙γ加入J中
J.add(new Item(p, 0));
}
}
}
}
}
//将J合并到I中
I.addAll(J);
J.clear();
} while (size != I.size());
return I;
}
/**
* 求项目集I对应于文法符号X的后继项目集闭包
* GOTO(I, X)=CLOSURE({A→αX·β|A→α·Xβ∈I})
*
* @param I 项目集
* @param X 文法符号
* @return java.util.LinkedHashSet<parser.Item>
* @author 李电楠
*/
private LinkedHashSet<Item> GOTO(LinkedHashSet<Item> I, Char X) {
//将J初始化为空集
LinkedHashSet<Item> J = new LinkedHashSet<>();
//I中的每个项A→α∙Xβ
for (Item A : I) {
ArrayList<Char> prod = A.getLeftChar().getProdRight();
int pointPos = A.getPointPos();
if (pointPos < prod.size() && prod.get(pointPos).equals(X)) {
//将项A→αX∙β加入集合J中
J.add(new Item(A.getLeftChar(), pointPos + 1));
}
}
return CLOSURE(J);
}
/**
* 求项目集族
* C={I0}∪{I|∃J∈C, X∈VN∪VT, I=GOTO(J, X)}
*
* @author 李电楠
*/
private void doMakeC() {
//C={CLOSURE({[S'→·S]})}
LinkedHashSet<Item> I0 = new LinkedHashSet<>();
I0.add(new Item(new S_().a(new S()), 0));
C.add(CLOSURE(I0));
LinkedHashSet<LinkedHashSet<Item>> J = new LinkedHashSet<>();
int size;
do {
size = C.size();
//C中的每个项集I
for (LinkedHashSet<Item> I : C) {
//每个文法符号x
for (Char x : X) {
LinkedHashSet<Item> g = GOTO(I, x);
if (!g.isEmpty()) {
//将GOTO(I, X)加入C中
J.add(g);
}
}
}
//将J合并到C中
C.addAll(J);
J.clear();
} while (size != C.size());
}
/**
* 求FIRST集
* 由于文法中不存在ε,此处没有考虑存在ε的情况
*
* @author 李电楠
*/
private void doMakeFIRST() {
for (Char x : X) {
HashSet<Word> Fx = new HashSet<>();
//如果x是一个终结符,那么FIRST(X)={X}
if (x.isTerminal()) Fx.add((Word) x);
FIRST.put(x, Fx);
}
int startSize;
AtomicInteger endSize = new AtomicInteger(0);
do {
startSize = endSize.get();
//G中的每个产生式X→Y1…Yk(k≥1)
for (NTChar X : G) {
//将FIRST(Y1)合并到FIRST(X)中
FIRST.get(X).addAll(FIRST.get(X.getProdRight().get(0)));
}
endSize.set(0);
FIRST.forEach((x, tChar) -> endSize.addAndGet(tChar.size()));
} while (startSize != endSize.get());
}
/**
* 求FOLLOW集
* 由于文法中不存在ε,此处没有考虑存在ε的情况
*
* @author 李电楠
*/
private void doMakeFOLLOW() {
for (Char x : X) {
if (!x.isTerminal()) FOLLOW.put((NTChar) x, new HashSet<>());
}
//将#加入FOLLOW(S)中
FOLLOW.get(new S()).add(new END());
int startSize;
AtomicInteger endSize = new AtomicInteger(1);
do {
startSize = endSize.get();
for (NTChar A : G) {
ArrayList<Char> right = A.getProdRight();
for (int i = 0; i < right.size() - 1; i++) {
//存在一个产生式A→αBβ
if (!right.get(i).isTerminal()) {
//将FIRST(β)合并到FOLLOW(B)中
NTChar B = (NTChar) right.get(i);
FOLLOW.get(B).addAll(FIRST.get(right.get(i + 1)));
}
}
//存在一个产生式A→αB
if (!right.get(right.size() - 1).isTerminal()) {
//将FOLLOW(A)合并到FOLLOW(B)中
NTChar B = (NTChar) right.get(right.size() - 1);
FOLLOW.get(B).addAll(FOLLOW.get(A));
}
}
endSize.set(0);
FOLLOW.forEach((x, tChar) -> endSize.addAndGet(tChar.size()));
} while (startSize != endSize.get());
}
/**
* 求SLR(1)分析表
*
* @author 李电楠
*/
private void doMakeTable() {
//构造G'的规范LR(0)项集族C={I0, I1, …, In}
doMakeC();
doMakeFIRST();
doMakeFOLLOW();
ArrayList<LinkedHashSet<Item>> I = new ArrayList<>(C);
table.setACTION(new HashMap[I.size()]);
table.setGOTO(new HashMap[I.size()]);
for (int i = 0; i < I.size(); i++) {
table.getACTION()[i] = new HashMap<>();
table.getGOTO()[i] = new HashMap<>();
LinkedHashSet<Item> Ii = I.get(i);
for (Item item : Ii) {
NTChar A = item.getLeftChar();
if (item.getPointPos() < A.getProdRight().size()) {
//A→α·aβ∈Ii
if (A.getProdRight().get(item.getPointPos()).isTerminal()) {
//GOTO(Ii, a)=Ij
Word a = (Word) A.getProdRight().get(item.getPointPos());
int j = I.indexOf(GOTO(Ii, a));
//ACTION[i, a]=sj
table.getACTION()[i].put(a, new ParseTable.A_i("s", j));
}
//A→α.Bβ∈Ii
else {
//GOTO(Ii, B)=Ij
NTChar B = (NTChar) A.getProdRight().get(item.getPointPos());
int j = I.indexOf(GOTO(Ii, B));
//GOTO[i, B]=j
table.getGOTO()[i].put(B, new ParseTable.G_i(j));
}
}
//A→α·∈Ii且A≠S'
else if (!(A instanceof S_)) {
for (int j = 0; j < G.size(); j++) {
//G[j]是产生式A→α
if (A.equals(G.get(j)) && A.getProdRight().equals(G.get(j).getProdRight())) {
//∀a∈FOLLOW(A)
for (Word a : FOLLOW.get(A)) {
//ACTION[ i, a ]=rj
table.getACTION()[i].put(a, new ParseTable.A_i("r", j));
}
break;
}
}
}
//S'→S·
else {
//ACTION[i , #]=acc
table.getACTION()[i].put(new END(), new ParseTable.A_i("acc", -1));
}
}
}
}
/**
* SLR(1)语法分析
*
* @param wordList 终结符列表
* @return boolean
* @author 李电楠
*/
public boolean parse(ArrayList<Word> wordList) {
Stack<Integer> state = new Stack<>();
Stack<Char> charStack = new Stack<>();
state.push(0);
Word a;
for (int w = 0; w < wordList.size(); ) {
a = wordList.get(w);
//s是栈顶的状态
int s = state.peek();
//i=ACTION[s,a]
ParseTable.A_i i = table.getACTION()[s].get(a);
//i不存在,即error
if (i == null) {
break;
}
//i=st
else if (i.getL().equals("s")) {
//将t压入栈中
state.push(i.getR());
charStack.push(a);
//a往后移动一个
w++;
}
//i=rt
else if (i.getL().equals("r")) {
//归约A→β
NTChar A = G.get(i.getR());
//从栈中弹出│β│个符号
for (int j = 0; j < A.getProdRight().size(); j++) {
state.pop();
charStack.pop();
}
//将GOTO[栈顶,A]压入栈中
charStack.push(A);
state.push(table.getGOTO()[state.peek()].get(A).getR());
//打印产生式A→β
System.out.print(A.getClass().getSimpleName() + "->");
for (Char c : A.getProdRight()) {
System.out.print(c.getClass().getSimpleName() + " ");
}
System.out.println();
}
//i=acc
else if (i.getL().equals("acc")) {
return true;
}
//打印文法符号栈
for (Char aChar : charStack) {
System.out.print(aChar.getClass().getSimpleName() + " ");
}
System.out.println();
}
return false;
}
public void printInfo() {
//打印文法
System.out.println("文法:");
for (int i = 0; i < G.size(); i++) {
System.out.print("P" + i + " " + G.get(i).getClass().getSimpleName() + "->");
for (Char aChar : G.get(i).getProdRight()) {
System.out.print(aChar.getClass().getSimpleName() + " ");
}
System.out.println();
}
System.out.println("--------------------------------------------------------------------------------");
//打印项目集族
System.out.println("项目集族:");
int t = 0;
for (LinkedHashSet<Item> items : C) {
System.out.println("I" + t + ":");
for (Item item : items) {
int pos = 0;
System.out.print(item.getLeftChar().getClass().getSimpleName() + "->");
for (Char aChar : item.getLeftChar().getProdRight()) {
if (pos == item.getPointPos()) System.out.print("·");
System.out.print(aChar.getClass().getSimpleName() + " ");
pos++;
}
if (pos == item.getPointPos()) System.out.print("·");
System.out.println();
}
t++;
}
System.out.println("--------------------------------------------------------------------------------");
//打印FIRST集
System.out.println("FIRST集:");
FIRST.forEach((aChar, chars) -> {
System.out.println(aChar.getClass().getSimpleName() + ":");
chars.forEach(aChar1 -> System.out.print(aChar1.getClass().getSimpleName() + " "));
System.out.println();
});
System.out.println("--------------------------------------------------------------------------------");
//打印FOLLOW集
System.out.println("FOLLOW集:");
FOLLOW.forEach((aChar, chars) -> {
System.out.println(aChar.getClass().getSimpleName() + ":");
chars.forEach(aChar1 -> System.out.print(aChar1.getClass().getSimpleName() + " "));
System.out.println();
});
System.out.println("--------------------------------------------------------------------------------");
//打印ACTION表
System.out.println("ACTION表:");
for (int i = 0; i < table.getACTION().length; i++) {
System.out.print("s" + i + " ");
table.getACTION()[i].forEach((x, tChar) -> System.out.print(x.getClass().getSimpleName() + " " + tChar.getL() + tChar.getR() + "|||"));
System.out.println();
}
System.out.println("--------------------------------------------------------------------------------");
//打印GOTO表
System.out.println("GOTO表:");
for (int i = 0; i < table.getGOTO().length; i++) {
System.out.print("s" + i + " ");
table.getGOTO()[i].forEach((x, tChar) -> System.out.print(x.getClass().getSimpleName() + " " + tChar.getR() + "|||"));
System.out.println();
}
System.out.println("--------------------------------------------------------------------------------");
}
}
code.txt(位于src/main/resources/):
int main() { //main
int a = 2;
float b = a * 2.5;
string c = "5555";
bool d = false;
if (a >= 5) {
d = true;
} else {
c = 3.14;
}
while (b < 10) {
c = "bro";
d = false;
b = b + 1;
}
return 0;
}
Main类:
public class Main {
public static void main(String[] args) {
URL url = Main.class.getResource("code.txt");
WordAnalyzer wordAnalyzer = new WordAnalyzer();
Parser parser = new Parser();
parser.printInfo();
try {
//从文件中读取源程序
File codeFile = new File(url.toURI());
byte[] codeBytes = new byte[(int) codeFile.length()];
FileInputStream inputStream = new FileInputStream(codeFile);
if (inputStream.read(codeBytes) != -1) {
String code = new String(codeBytes);
System.out.println("源程序:");
System.out.println(code);
System.out.println("--------------------------------------------------------------------------------");
//词法分析
ArrayList<Word> wordList = wordAnalyzer.analyze(code);
System.out.println("词法分析:");
for (Word word : wordList) {
System.out.println("<" + word.getClass().getSimpleName() + "> " + word);
}
System.out.println("--------------------------------------------------------------------------------");
//SLR(1)语法分析
System.out.println("SLR(1)语法分析:");
System.out.println(parser.parse(wordList) ? "语法正确" : "语法错误");
System.out.println("--------------------------------------------------------------------------------");
}
inputStream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
注释很多,也很全,有不懂的随便问,因为我一般也没有时间看。