Http 代理是如何工作的?

我在网上搜索了一些关于 http 代理的东西。 我在维基百科上看到过关于代理服务器的文章。 但我还是不明白 http 代理是如何工作的,愚蠢的我。

下面是我对 http 代理如何工作的假设: 如果我设置了一个特定的代理,比如代理 _ A,那么当我启动 chrome/IE 时,输入一个特定的 URL,比如 URL _ A,chrome/IE 是否会直接将请求发送到代理 _ A, 然后代理 _ A 将请求发送到 URL _ A 的实际服务器?

99859 次浏览

A HTTP proxy speaks the HTTP protocol, it's especially made for HTTP connections but can be abused for other protocols as well (which is kinda standard already)

The browser (CLIENT) sends GET http://SERVER/path HTTP/1.1 to the PROXY
Now the PROXY will forward the actual request to the SERVER.
The SERVER will only see the PROXY as connection and answer to the PROXY just like to a CLIENT.
The PROXY receives the response and forwards it back to the CLIENT.

It is a transparent process and nearly like directly communicating with a server so it's just a tiny overhead for the browser to implement a HTTP proxy.
There are some additional headers that can be sent to identify the client, reveal that he's using a proxy.
Proxies sometimes change/add content within the data stream for various purposes.
Some proxies for example include your real IP in a special HTTP HEADER which can be logged server-side, or intercepted in their scripts.

CLIENT <---> PROXY <---> SERVER

Update:
Related to using proxies as a security/privacy feature
As you can see in the ascii above, there is no direct communication between CLIENT and SERVER. Both parties just talk to the PROXY between them.
In modern worlds the CLIENT often is a Browser and the SERVER often is a Webserver (Apache for example).

In such an environment users often trust the PROXY to be secure and not leak their identity.
However there are many possible ways to ruin this security model due to complex software frameworks running on the browser.
For example Flash or Java applets are a perfect example how a proxy connection can get broken, Flash and Java both might not care much about the proxy settings of their parent application (browser).
Another example are DNS requests which can reach the destination nameserver without PROXY depending on the PROXY and the application settings.
Another example would be cookies or your browser meta footprint (resollution, response times, user-agent, etc.) which might both identify you if the webserver knows you from the past already (or meets you again without proxy).

And in the end, the proxy itself needs to be trusted as it can read all the data that goes through it and on top it might even be able to break your SSL security (read up on man in the middle)

Where to get proxies from
Proxies can be bought as a service, scanned for or simply run by yourself.

Public proxies
These are the most often used proxies and the usual term "public" is quite misleading.
The better term would be "open proxies". If you run a proxy server without firewall or authentication anyone in the world can find it and abuse it.
The large majority of companies selling proxies just scan the internet for such proxies or they use hacked windows computers (botnets) and sell them for mostly illegal/spam activity.
Most modern countries can see the use of an open proxy without authorization as abuse, it's a very common thing but can actually lead to prison time.
It's possible to scan for proxies by searching the internet for open ports, a typical free program would be https://nmap.org
As a word of caution: Larger scaled scanning will almost certainly get your internet connection banned by your ISP.

Paid proxies
Here we have 4 types of proxies:
1) Paid public (open) proxies
Basically these sellers sell or resell huge lists of proxies that are regularly refreshed to remove dead ones.
The proxies are abused on a massive scale and usually blacklisted on most sites, including Google.
Additional those proxies are usually very unstable and very slow.
The large majority of these proxies are simply abusing wrongly configured servers. It's a very competitive "market", Google will lead to many examples.

2) Paid hacked (botnet) proxies
These are abusing computers, mostly internet-of-things or windows desktops as proxy hosts. The attackers use them in large scale for various illegal purposes.
Sellers usually call them "residential proxies" to hide the illegal nature of them.
Using such a proxy is without doubt illegal and the abused user can easily log "your" IP if you connect to it, including the possibility to hijack your connection to the destination.
Depending on the source those IPs are not blacklisted, so the "quality" is much better than public proxies.

3) Paid shared proxies
These are datacenter proxies, usually legal and potential with a fast uplink.
Due to the fact that there is so much e-commerce spam going on those IPs are massively abused and usually found in blacklists.
A typical use would be circumvention of craigslist restrictions or geo-restrictions.

4) Paid private/dedicated proxies
"private" means dedicated. If the operator is professional it means your proxy is not shared among other people.
These are often used for more professional and legal activity, especially when the proxy IP is rented for alonger period.
A well known operator would be https://us-proxies.com

Own proxies
Running an own proxy is possible as well, there are various open-source projects available.
The mostly used proxy server is https://squid-cache.org

To add to John's great answer above, one important step is the initial CONNECT handshake between PROXY and CLIENT. From the Websocket RFC

CONNECT example.com:80 HTTP/1.1
Host: example.com

This is the same request that a CLIENT uses to open an SSL tunnel, which essentially uses a proxy