反向代理
在计算机世界里,由于单个服务器的处理客户端(用户)请求能力有一个极限,当用户的接入请求蜂拥而入时,会造成服务器忙不过来的局面,可以使用多个服务器来共同分担成千上万的用户请求,这些服务器提供相同的服务,对于用户来说,根本感觉不到任何差别。
反向代理的实现
1)需要有一个负载均衡设备来分发用户请求,将用户请求分发到空闲的服务器上
2)服务器返回自己的服务到负载均衡设备
3)负载均衡将服务器的服务返回用户
以上的潜台词是:用户和负载均衡设备直接通信,也意味着用户做服务器域名解析时,解析得到的IP其实是负载均衡的IP,而不是服务器的IP,这样有一个好处是,当新加入/移走服务器时,仅仅需要修改负载均衡的服务器列表,而不会影响现有的服务。
反向代理的主要作用为:
对客户端隐藏伺服器(丛集)的IP位址
安全:作为应用层防火牆,为网站提供对基于Web的攻击行爲(例如DoS/DDoS)的防护,更容易排查恶意软体等
为后端伺服器(丛集)统一提供加密和SSL加速(如SSL终端代理)
负载均衡,若伺服器丛集中有负荷较高者,反向代理通过URL重写,根据连线请求从负荷较低者获取与所需相同的资源或备援
对于静态内容及短时间内有大量存取请求的动态内容提供快取服务
对一些内容进行压缩,以节约频宽或为网路频宽不佳的网路提供服务
减速上传
为在私有网路下(如区域网路)的伺服器丛集提供NAT穿透及外网发布服务
提供HTTP存取认证[2]
突破互联网封锁(不常用,因为反向代理与客户端之间的连线不一定是加密连线,非加密连线仍有遭内容审查进而遭封禁的风险;此外面对针对网域名称的关键字过滤、DNS快取污染/投毒攻击乃至深度封包检测也无能为力)
The previous answers were accurate, but perhaps too terse. I will try to add some examples.
First of all, the word "proxy" describes someone or something acting on behalf of someone else.
In the computer realm, we are talking about one server acting on the behalf of another computer.
For the purposes of accessibility, I will limit my discussion to web proxies - however, the idea of a proxy is not limited to websites.
FORWARD proxy
Most discussion of web proxies refers to the type of proxy known as a "forward proxy."
The proxy event, in this case, is that the "forward proxy" retrieves data from another web site on behalf of the original requestee.
A tale of 3 computers (part I)
For an example, I will list three computers connected to the internet.
- X = your computer, or "client" computer on the internet
- Y = the proxy web site, proxy.example.org
- Z = the web site you want to visit, www.example.net
Normally, one would connect directly from X --> Z.
However, in some scenarios, it is better for Y --> Z
on behalf of X
, which chains as follows: X --> Y --> Z
.
Reasons why X would want to use a forward proxy server:
Here is a (very) partial list of uses of a forward proxy server.
- 1) X is unable to access Z directly because
- a) Someone with administrative authority over
X
's internet connection has decided to block all access to siteZ
.
- Examples:
- The Storm Worm virus is spreading by tricking people into visiting
familypostcards2008.com
, so the system administrator has blocked access to the site to prevent users from inadvertently infecting themselves. - Employees at a large company have been wasting too much time on
facebook.com
, so management wants access blocked during business hours. - A local elementary school disallows internet access to the
playboy.com
website. - A government is unable to control the publishing of news, so it controls access to news instead, by blocking sites such as
wikipedia.org
. See TOR or FreeNet.
- b) The administrator of
Z
has blockedX
.
- Examples:
- The administrator of Z has noticed hacking attempts coming from X, so the administrator has decided to block X's IP address (and/or netrange).
- Z is a forum website.
X
is spamming the forum. Z blocks X.
REVERSE proxy
A tale of 3 computers (part II)
For this example, I will list three computers connected to the internet.
- X = your computer, or "client" computer on the internet
- Y = the reverse proxy web site, proxy.example.com
- Z = the web site you want to visit, www.example.net
Normally, one would connect directly from X --> Z.
However, in some scenarios, it is better for the administrator of Z
to restrict or disallow direct access and force visitors to go through Y first. So, as before, we have data being retrieved by Y --> Z
on behalf of X
, which chains as follows: X --> Y --> Z
.
What is different this time compared to a "forward proxy," is that this time the user X
does not know he is accessing Z
, because the user X
only sees he is communicating with Y
.
The server Z
is invisible to clients and only the reverse proxy Y
is visible externally. A reverse proxy requires no (proxy) configuration on the client side.
The client X
thinks he is only communicating with Y
(X --> Y
), but the reality is that Y
forwarding all communication (X --> Y --> Z
again).
Reasons why Z would want to set up a reverse proxy server:
- 1) Z wants to force all traffic to its web site to pass through Y first.
- a) Z has a large web site that millions of people want to see, but a single web server cannot handle all the traffic. So Z sets up many servers and puts a reverse proxy on the internet that will send users to the server closest to them when they try to visit Z. This is part of how the Content Distribution Network (CDN) concept works.
- Examples:
- Apple Trailers uses Akamai
- Jquery.com hosts its javascript files using CloudFront CDN (sample).
- etc.
- 2) The administrator of Z is worried about retaliation for content hosted on the server and does not want to expose the main server directly to the public.
- a) Owners of Spam brands such as "Canadian Pharmacy" appear to have thousands of servers, while in reality having most websites hosted on far fewer servers. Additionally, abuse complaints about the spam will only shut down the public servers, not the main server.
In the above scenarios, Z
has the ability to choose Y
.
Links to topics from the post:
Content Delivery Network
- Lists of CDNs
forward proxy software (server side)
- PHP-Proxy
- cgi-proxy
- phproxy (discontinued)
- glype
- Internet censorship wiki: List of Web Proxies
- squid (apparently, can also work as a reverse proxy)
reverse proxy software for HTTP (server side)
- apache mod_proxy (can also work as a forward proxy for HTTP)
- nginx (used on hulu.com, spam sites, etc.)
- HAProxy
- lighthttpd
- perlbal (written for livejournal)
- portfusion
- pound
- varnish cache (written by a freebsd kernel guru)
- repose
reverse proxy software for TCP (server side)
- balance
- delegate
- pen
- portfusion
- pure load balancer (web site defunct)
- python director
see also:
- Wikipedia - Content Delivery Network
- Wikipedia - Category:Reverse_proxy
- Wikipedia - Load Balancing
- Wikipedia - Scalability