是否可以在 HttpClient 解压之前访问压缩数据?

我正在处理谷歌云存储.NET 客户端 有三个特性(在.NET 和我的客户端之间) 库和存储服务)组合在一个 令人不快的方式:

  • 下载文件时(Google 云存储中的对象) 术语) ,服务器包含存储数据的散列 然后,客户端代码根据它的数据验证该哈希值 下载

  • Google 云存储的一个独立特性是用户可以 设置对象的 Content-Encoding,并将其包含为 当请求包含匹配的 接受-编码。(目前,让我们忽略 请求不包括...)

  • HttpClientHandler可以解压缩 gzip (或者平减)内容 自动的,透明的

当这三者结合在一起,我们就有麻烦了 简短但完整的程序演示了这一点,但没有使用 客户端库(并打开一个可公开访问的文件) :

using System;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading.Tasks;


class Program
{
static async Task Main()
{
string url = "https://www.googleapis.com/download/storage/v1/b/"
+ "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.GZip
};
var client = new HttpClient(handler);


var response = await client.GetAsync(url);
byte[] content = await response.Content.ReadAsByteArrayAsync();
string text = Encoding.UTF8.GetString(content);
Console.WriteLine($"Content: {text}");


var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Console.WriteLine($"Hash header: {hashHeader}");


using (var md5 = MD5.Create())
{
var md5Hash = md5.ComputeHash(content);
var md5HashBase64 = Convert.ToBase64String(md5Hash);
Console.WriteLine($"MD5 of content: {md5HashBase64}");
}
}
}

.NET 核心项目文件:

<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp2.0</TargetFramework>
<LangVersion>7.1</LangVersion>
</PropertyGroup>
</Project>

产出:

Content: hello world
Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA==
MD5 of content: XrY7u+Ae7tCTyyK7j1rNww==

如您所见,内容的 MD5与 MD5不同 (在我的客户端库中,我使用的是 crc32c 散列,但显示相同的行为。)

这不是 HttpClientHandler中的一个错误——这是意料之中的,但也是一种痛苦 基本上,我需要在 内容之前 还有减压后。我不能找到任何方法 这么做。

为了在某种程度上说明我的需求,我知道如何防止 HttpClient中的解压缩,而是在从流中读取时进行解压缩——但是我需要能够在不改变使用 HttpClient中的结果 HttpResponseMessage的任何代码的情况下完成这项工作。(有很多代码处理响应,我只想在一个中心位置进行更改。)

我有一个计划,我已经制定了原型,目前为止还在运作 到目前为止发现,但是有点丑。它涉及到创建一个三层 处理人:

  • HttpClientHandler自动解压失效。
  • 用新的 Stream子类替换内容流的新处理程序 它委托给原始内容流,但在读取数据时对其进行哈希处理。
  • 基于 MicrosoftDecompressionHandler代码的仅解压缩处理程序。

虽然这种方法有效,但也有缺点:

  • 开放源码许可: 确切检查我需要按顺序做什么 创建一个新的文件在我的回购基于麻省理工学院授权 微软代码
  • 有效地分叉 MS 代码,这意味着我可能应该 定期检查,看看是否有任何错误已被发现在它
  • Microsoft 代码使用程序集的内部成员,因此 端口不像原来那么干净。

如果微软将 DecompressionHandler公开,那将有助于 很多-但这可能需要比我需要更长的时间。

我正在寻找的是一个替代方法,如果可能的话- 我以前忽略了一些东西,这些东西让我能够理解其中的内容 减压。我不想重新发明 HttpClient-反应 例如,我不想谈论 那边的事情。这是一个非常具体的拦截点 我在找。

6538 次浏览

我设法通过以下方法把头部散列正确了:

  • 创建继承 HttpClientHandler 的自定义处理程序
  • 重写 SendAsync
  • 使用 base.SendAsync以字节形式读取响应
  • 使用 GZipStream 压缩它
  • 将 Gzip Md5散列到 base64(使用代码)

这个问题,就像你说的“减压之前”在这里并没有得到真正的尊重

这个想法是让这个 if工作,因为你想要的 Https://github.com/dotnet/corefx/blob/master/src/System /src/System/Net/Http/ winhttpresponseparser.cs # L80-L91

吻合

class Program
{
const string url = "https://www.googleapis.com/download/storage/v1/b/storage-library-test-bucket/o/gzipped-text.txt?alt=media";


static async Task Main()
{
//await HashResponseContent(CreateHandler(DecompressionMethods.None));
//await HashResponseContent(CreateHandler(DecompressionMethods.GZip));
await HashResponseContent(new MyHandler());


Console.ReadLine();
}


private static HttpClientHandler CreateHandler(DecompressionMethods decompressionMethods)
{
return new HttpClientHandler { AutomaticDecompression = decompressionMethods };
}


public static async Task HashResponseContent(HttpClientHandler handler)
{
//Console.WriteLine($"Using AutomaticDecompression : '{handler.AutomaticDecompression}'");
//Console.WriteLine($"Using SupportsAutomaticDecompression : '{handler.SupportsAutomaticDecompression}'");
//Console.WriteLine($"Using Properties : '{string.Join('\n', handler.Properties.Keys.ToArray())}'");


var client = new HttpClient(handler);


var response = await client.GetAsync(url);
byte[] content = await response.Content.ReadAsByteArrayAsync();
string text = Encoding.UTF8.GetString(content);
Console.WriteLine($"Content: {text}");


var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Console.WriteLine($"Hash header: {hashHeader}");
byteArrayToMd5(content);


Console.WriteLine($"=====================================================================");
}


public static string byteArrayToMd5(byte[] content)
{
using (var md5 = MD5.Create())
{
var md5Hash = md5.ComputeHash(content);
return Convert.ToBase64String(md5Hash);
}
}


public static byte[] Compress(byte[] contentToGzip)
{
using (MemoryStream resultStream = new MemoryStream())
{
using (MemoryStream contentStreamToGzip = new MemoryStream(contentToGzip))
{
using (GZipStream compressionStream = new GZipStream(resultStream, CompressionMode.Compress))
{
contentStreamToGzip.CopyTo(compressionStream);
}
}


return resultStream.ToArray();
}
}
}


public class MyHandler : HttpClientHandler
{
protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
var response = await base.SendAsync(request, cancellationToken);
var responseContent = await response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);


Program.byteArrayToMd5(responseContent);


var compressedResponse = Program.Compress(responseContent);
var compressedResponseMd5 = Program.byteArrayToMd5(compressedResponse);


Console.WriteLine($"recompressed response to md5 : {compressedResponseMd5}");


return response;
}
}

如何禁用自动解压缩,手动添加 Accept-Encoding头,然后在散列验证后解压缩?

private static async Task Test2()
{
var url = @"https://www.googleapis.com/download/storage/v1/b/storage-library-test-bucket/o/gzipped-text.txt?alt=media";
var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.None
};
var client = new HttpClient(handler);
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");


var response = await client.GetAsync(url);
var raw = await response.Content.ReadAsByteArrayAsync();


var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Debug.WriteLine($"Hash header: {hashHeader}");


bool match = false;
using (var md5 = MD5.Create())
{
var md5Hash = md5.ComputeHash(raw);
var md5HashBase64 = Convert.ToBase64String(md5Hash);
match = hashHeader.EndsWith(md5HashBase64);
Debug.WriteLine($"MD5 of content: {md5HashBase64}");
}


if (match)
{
var memInput = new MemoryStream(raw);
var gz = new GZipStream(memInput, CompressionMode.Decompress);
var memOutput = new MemoryStream();
gz.CopyTo(memOutput);
var text = Encoding.UTF8.GetString(memOutput.ToArray());
Console.WriteLine($"Content: {text}");
}
}

看着“迈克尔做了什么”给了我一个我错过的暗示。获得压缩内容后,您可以使用 CryptoStreamGZipStreamStreamReader读取响应,而无需将其加载到内存中。在解压和读取压缩内容时,CryptoStream将对其进行哈希处理。将 StreamReader替换为 FileStream,您可以以最小的内存使用量将数据写入文件:)

using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading.Tasks;


class Program
{
static async Task Main()
{
string url = "https://www.googleapis.com/download/storage/v1/b/"
+ "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.None
};
var client = new HttpClient(handler);
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");


var response = await client.GetAsync(url);
var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Console.WriteLine($"Hash header: {hashHeader}");
string text = null;
using (var md5 = MD5.Create())
{
using (var cryptoStream = new CryptoStream(await response.Content.ReadAsStreamAsync(), md5, CryptoStreamMode.Read))
{
using (var gzipStream = new GZipStream(cryptoStream, CompressionMode.Decompress))
{
using (var streamReader = new StreamReader(gzipStream, Encoding.UTF8))
{
text = streamReader.ReadToEnd();
}
}
Console.WriteLine($"Content: {text}");
var md5HashBase64 = Convert.ToBase64String(md5.Hash);
Console.WriteLine($"MD5 of content: {md5HashBase64}");
}
}
}
}

产出:

Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA==
Content: hello world
MD5 of content: xhF4M6pNFRDQnvaRRNVnkA==

答案 V2

在阅读了 Jon 的回复和更新的答案后,我得到了以下版本。大致相同的想法,但我移动到一个特殊的 HttpContent流,我注入。虽然不是很漂亮,但主意还是有的。

using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading;
using System.Threading.Tasks;


class Program
{
static async Task Main()
{
string url = "https://www.googleapis.com/download/storage/v1/b/"
+ "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.None
};
var client = new HttpClient(new Intercepter(handler));
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");


var response = await client.GetAsync(url);
var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Console.WriteLine($"Hash header: {hashHeader}");
HttpContent content1 = response.Content;
byte[] content = await content1.ReadAsByteArrayAsync();
string text = Encoding.UTF8.GetString(content);
Console.WriteLine($"Content: {text}");
var md5Hash = ((HashingContent)content1).Hash;
var md5HashBase64 = Convert.ToBase64String(md5Hash);
Console.WriteLine($"MD5 of content: {md5HashBase64}");
}


public class Intercepter : DelegatingHandler
{
public Intercepter(HttpMessageHandler innerHandler) : base(innerHandler)
{
}


protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
var response = await base.SendAsync(request, cancellationToken);
response.Content = new HashingContent(await response.Content.ReadAsStreamAsync());
return response;
}
}


public sealed class HashingContent : HttpContent
{
private readonly StreamContent streamContent;
private readonly MD5 mD5;
private readonly CryptoStream cryptoStream;
private readonly GZipStream gZipStream;


public HashingContent(Stream content)
{
mD5 = MD5.Create();
cryptoStream = new CryptoStream(content, mD5, CryptoStreamMode.Read);
gZipStream = new GZipStream(cryptoStream, CompressionMode.Decompress);
streamContent = new StreamContent(gZipStream);
}


protected override Task SerializeToStreamAsync(Stream stream, TransportContext context) => streamContent.CopyToAsync(stream, context);
protected override bool TryComputeLength(out long length)
{
length = 0;
return false;
}


protected override Task<Stream> CreateContentReadStreamAsync() => streamContent.ReadAsStreamAsync();


protected override void Dispose(bool disposing)
{
try
{
if (disposing)
{
streamContent.Dispose();
gZipStream.Dispose();
cryptoStream.Dispose();
mD5.Dispose();
}
}
finally
{
base.Dispose(disposing);
}
}


public byte[] Hash => mD5.Hash;
}
}