如下,set中添加重复元素是不可以的,如下
php被添加了2次,但是输出的时候只有一次,那么其去重的原理是什么呢?
public class Test {
public static void main(String[] args) {
HashSet<String> set=new HashSet<>();
set.add("hello");
set.add("html");
set.add("php");
set.add("php");
System.out.println(set);//[php, html, hello]
}
}
查看源码,调用了map的put方法
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
继续溯源
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
继续溯源.可以看出依据hash以及equals判断的.
/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
如果是自己自定义的类,就不能利用set去重了,需要自己重写继承自Object类的hashCode方法以及equals方法,这样就可以自定义set添加时去重的逻辑了. 为什么用了hashCode方法还要再次重写equals方法呢?因为哈希值相同的未必是同一对象。所以还需要equals方法进一步确认.
如下实例,wangwu只能加进去一个,因为重写了hashCode方法和equals方法,如果不是这样,就会加进去2个wangwu哦.
public class Test {
public static void main(String[] args) {
HashSet<Student> set = new HashSet<>();
set.add(new Student("bob", 23));
set.add(new Student("lisi", 21));
set.add(new Student("wangwu", 20));
set.add(new Student("zhaoliu", 23));
set.add(new Student("wangwu", 20));
System.out.println(set);// [lisi, wangwu, zhaoliu, bob]
}
}
class Student {
private String name;
private int age;
Student(String name, int age) {
this.name = name;
this.age = age;
}
@Override
public int hashCode() {
return name.hashCode()+age*23;
}
@Override
public boolean equals(Object obj) {
if(!(obj instanceof Student)){
throw new RuntimeException();
}
Student stu=(Student)obj;
return this.name.equals(stu.name)&&this.age==stu.age;
}
@Override
public String toString() {
return name;
}
}
HashMap key唯一原理
HashMap的key是不能重复的,如果重复,会覆盖.如下. 有两个key为ss,但是输出只有最后一个键值对.
public class Test {
public static void main(String[] args) {
HashMap<String,String> map=new HashMap<>();
map.put("ss","dd");
map.put("ss","good");
System.out.println(map);//{ss=good}
}
}
HashMap保证key唯一的原理和HashSet是一样的.如果是自定义的类,也需要重写Object类的hashCode方法和equals,方法,才可以自定义自己的HashMap的key唯一的原理.
举例如下
public class Test {
public static void main(String[] args) {
// 保证键不重复的原理和HashSet一样 int hashCode、 boolean equals
//key是可以为任意Object的
HashMap<Student, String> map = new HashMap<>();
map.put(new Student("lisi", 20), "beijing");
map.put(new Student("zhangsan", 20), "beijing");
map.put(new Student("xiaobai", 20), "beijing");
map.put(new Student("liuqian", 27), "beijing");
map.put(new Student("xiaobai", 20), "shenzhen");
//key看出最后一个xiaobai只进去了一个
//如果不定义hashcode和equals方法,结果是{liuqian27=beijing, zhangsan20=beijing, lisi20=beijing, xiaobai20=beijing, xiaobai20=shenzhen}
//可以看到xiaobao有两个,因为两个key都是new的,肯定不是同一对象
System.out.println(map);//{lisi20=beijing, zhangsan20=beijing, xiaobai20=shenzhen, liuqian27=beijing}
//遍历输出
map.forEach((t,u)-> System.out.println(t+":"+u));
}
}
class Student {
String name;
int age;
public Student(String name, int age) {
super();
this.name = name;
this.age = age;
}
public int hashCode() {
return name.hashCode() + age * 36;
}
// 姓名年龄相同的认为是相同的键
public boolean equals(Object obj) {
if (!(obj instanceof Student))
throw new ClassCastException();
Student stu = (Student) obj;
return this.name.equals(stu.name) && this.age == stu.age;
}
public String toString() {
return name + age;
}
}
scala中用HashSet去重
在scala中应用HashSet,去重的原理也是一样的.
object Test extends App {
private val set = new util.HashSet[Person]()
set.add(new Person("wangwu", 23))
set.add(new Person("wangwu", 23))
//因为重写了hashCode以及equals方法,所以在向set中添加时就会自动判断,重复的内容不会被添加
//如果不重写hashCode和equals方法,set中输出就会包含两个wangwu,因为都是new的,属于不同的对象
println(set)//[wangwu--23]
}
class Person(var name: String, var age: Int) {
override def toString: String = {
s"$name--$age"
}
override def hashCode(): Int = {
return 23 * age
}
override def equals(obj: Any): Boolean = {
val person: Person = obj.asInstanceOf[Person]
return person.name == this.name && person.age == this.age
}
}
case class与HashSet去重
如果用case class,就可以不用重写,case class默认会重写hashCode和equals方法
object Test extends App {
private val set = new util.HashSet[Person]()
set.add(Person("wangwu", 23))
set.add(Person("wangwu", 23))
//case class 会自动写hashCode和equals方法,所以无需自己写,不会添加进重复的内容
println(set)//[Person(wangwu,23)]
}
case class Person(name:String,age:Int)
scala 可变set与去重
scala mutable.Set如果不重写hashCode换equals方法,同样会输出两个,不会去重
object Test extends App {
private val set = mutable.Set[Person]()
set.add(new Person("wangwu", 23))
set.add(new Person("wangwu", 23))
//scala mutable.Set如果不重写hashCode换equals方法,同样会输出两个
println(set)//Set(wangwu--23, wangwu--23)
}
class Person(var name: String, var age: Int) {
override def toString: String = {
s"$name--$age"
}
// override def hashCode(): Int = {
// return 23 * age
// }
//
// override def equals(obj: Any): Boolean = {
// val person: Person = obj.asInstanceOf[Person]
// return person.name == this.name && person.age == this.age
// }
}
如果是case class就不用写了
object Test extends App {
private val set = mutable.Set[Person]()
set.add(new Person("wangwu", 23))
set.add(new Person("wangwu", 23))
//即使是自定义类,case class也可以去重
println(set)//Set(Person(wangwu,23))
}
case class Person(name:String,age:Int)
总结
- hashmap key唯一和hashset 去重的原理是一样的
- 去重的保证是在add 或put元素时调用了重写的HashCode方法以及equals方法.
- scala 中set去重原理是java一样
- scala case class会自动完成hashCode以及equals方法,用起来会比较方便