如何可靠地确定在设计时使用 var 声明的变量的类型?

我正在为 Emacs 中的 C # 开发一个完成(智能感知)工具。

这个想法是,如果用户键入一个片段,然后要求通过一个特定的击键组合完成,完成工具将使用。NET 反射来确定可能的完成。

这样做需要知道要完成的事情的类型。如果它是一个字符串,则有一组已知的可能的方法和属性; 如果它是一个 Int32,则有一组独立的方法和属性,依此类推。

通过使用 emacs 中提供的代码 lexer/parser 包——语义,我可以定位变量声明及其类型。因此,可以直接使用反射来获取类型的方法和属性,然后向用户显示选项列表。(好吧,不完全 直截了当内心 emacs,但使用 在 emacs 中运行 Powershell 进程的能力,它变得容易得多。我写了一个习俗。NET 程序集进行反射,将其加载到 powershell 中,然后在 emacs 中运行的 elisp 可以通过 comint 向 powershell 发送命令并读取响应。因此,emacs 可以很快得到反射的结果。)

当代码在完成事情的声明中使用 var时,问题就出现了。这意味着没有显式指定类型,并且补全不起作用。

当使用 var关键字声明变量时,如何可靠地确定所使用的实际类型?澄清一下,我不需要在运行时确定它。我想在“设计时间”决定。

到目前为止,我有这些想法:

  1. 编译和调用:
    • 提取声明语句,例如‘ var foo = “ a string value”;’
    • 连接一个语句‘ foo. GetType () ; ;’
    • 动态地将生成的 C # 片段编译成一个新的程序集
    • 将程序集加载到新的 AppDomain 中,运行该段并获取返回类型。
    • 卸载和丢弃程序集

    我知道如何做到这一切。但是,对于编辑器中的每个完成请求来说,这听起来非常重要。

    我想我不需要每次都换一个新的应用程序域名。我可以为多个临时程序集重用单个 AppDomain,并分期偿还设置它的成本 在多个完成请求之间上升和拆除它。这更多的是一个基本思想的调整。

  2. 编制和检查 IL

    只需将声明编译成一个模块,然后检查 IL,以确定编译器推断的实际类型。这怎么可能?我将使用什么来检查 IL?

有更好的主意吗? 评论? 建议?


EDIT -进一步考虑这个问题,编译和调用是不可接受的,因为调用可能有副作用。所以必须排除第一个选择。

此外,我认为我不能假设存在.NET 4.0。


UPDATE -正确的答案,上面没有提到,但是 Eric Lippert 温和地指出,是实现一个完全保真的类型推理系统。这是在设计时可靠地确定变量类型的唯一方法。但是,这也不容易做到。因为我没有幻想过要尝试构建这样一个东西,所以我选择了选项2的捷径——提取相关的声明代码,编译它,然后检查产生的 IL。

对于完成场景的一个公平子集,这实际上是有效的。

例如,假设在下面的代码片段中,?是用户要求完成的位置。这种方法是有效的:

var x = "hello there";
x.?

补全实现了 x 是一个 String,并提供了适当的选项。它通过生成并编译以下源代码来完成这项工作:

namespace N1 {
static class dmriiann5he { // randomly-generated class name
static void M1 () {
var x = "hello there";
}
}
}

然后用简单的反射检查 IL。

这也行得通:

var x = new XmlDocument();
x.?

引擎将适当的 using 子句添加到生成的源代码中,以便正确编译,然后 IL 检查是相同的。

这也行得通:

var x = "hello";
var y = x.ToCharArray();
var z = y.?

它只是意味着 IL 检查必须找到第三个局部变量的类型,而不是第一个。

还有这个:

var foo = "Tra la la";
var fred = new System.Collections.Generic.List<String>
{
foo,
foo.Length.ToString()
};
var z = fred.Count;
var x = z.?

只比前面的例子深一层。

但是,没有的工作原理是完成任何局部变量,这些局部变量的初始化在任何时候都取决于实例成员或局部方法参数。比如:

var foo = this.InstanceMethod();
foo.?

也没有 LINQ 语法。

在我考虑通过明确的“有限设计”(用礼貌的话来说就是“黑客”)来完成它们之前,我必须先考虑这些东西的价值。

解决依赖于方法参数或实例方法的问题的一种方法是,在生成、编译然后进行 IL 分析的代码片段中,用同一类型的“合成”局部变量替换对这些东西的引用。


另一个更新 ——对依赖于实例成员的 vars 的完成,现在可以工作了。

我所做的是询问类型(通过语义) ,然后为所有现有成员生成合成替身成员。对于这样的 C # 缓冲区:

public class CsharpCompletion
{
private static int PrivateStaticField1 = 17;


string InstanceMethod1(int index)
{
...lots of code here...
return result;
}


public void Run(int count)
{
var foo = "this is a string";
var fred = new System.Collections.Generic.List<String>
{
foo,
foo.Length.ToString()
};
var z = fred.Count;
var mmm = count + z + CsharpCompletion.PrivateStaticField1;
var nnn = this.InstanceMethod1(mmm);
var fff = nnn.?


...more code here...

... 生成的代码被编译,这样我就可以从输出 IL 中学习本地 var nnn 的类型,看起来像这样:

namespace Nsbwhi0rdami {
class CsharpCompletion {
private static int PrivateStaticField1 = default(int);
string InstanceMethod1(int index) { return default(string); }


void M0zpstti30f4 (int count) {
var foo = "this is a string";
var fred = new System.Collections.Generic.List<String> { foo, foo.Length.ToString() };
var z = fred.Count;
var mmm = count + z + CsharpCompletion.PrivateStaticField1;
var nnn = this.InstanceMethod1(mmm);
}
}
}

所有实例和静态类型成员都可以在框架代码中找到。它成功地编译了。此时,通过反射确定局部变量的类型就很简单了。

让这一切成为可能的是:

  • 在 emacs 中运行 Powershell 的能力
  • C # 编译器真的很快。在我的机器上,编译一个内存程序集大约需要0.5秒。对于击键之间的分析来说,速度不够快,但是足以支持按需生成完成列表。

我还没有调查 LINQ。
这将是一个更大的问题,因为语义 lexer/解析器 emacs 为 C # 提供的,不“做”LINQ。

29838 次浏览

For solution "1" you have a new facility in .NET 4 to do this quickly and easily. So if you can have your program converted to .NET 4 it would be your best choice.

Intellisense systems typically represent the code using an Abstract Syntax Tree, which allows them to resolve the return type of the function being assigned to the 'var' variable in more or less the same way as the compiler will. If you use the VS Intellisense, you may notice that it won't give you the type of var until you've finished entering a valid (resolvable) assignment expression. If the expression is still ambiguous (for instance, it can't fully infer the generic arguments for the expression), the var type will not resolve. This can be a fairly complex process, as you might need to walk fairly deep into a tree in order to resolve the type. For instance:

var items = myList.OfType<Foo>().Select(foo => foo.Bar);

The return type is IEnumerable<Bar>, but resolving this required knowing:

  1. myList is of type that implements IEnumerable.
  2. There is an extension method OfType<T> that applies to IEnumerable.
  3. The resulting value is IEnumerable<Foo> and there is an extension method Select that applies to this.
  4. The lambda expression foo => foo.Bar has the parameter foo of type Foo. This is inferred by the usage of Select, which takes a Func<TIn,TOut> and since TIn is known (Foo), the type of foo can be inferred.
  5. The type Foo has a property Bar, which is of type Bar. We know that Select returns IEnumerable<TOut> and TOut can be inferred from the result of the lambda expression, so the resulting type of items must be IEnumerable<Bar>.

If you don't want to have to write your own parser to build the abstract syntax tree, you could look at using the parsers from either SharpDevelop or MonoDevelop, both of which are open source.

I can describe for you how we do that efficiently in the "real" C# IDE.

The first thing we do is run a pass which analyzes only the "top level" stuff in the source code. We skip all the method bodies. That allows us to quickly build up a database of information about what namespace, types and methods (and constructors, etc) are in the source code of the program. Analyzing every single line of code in every method body would take way too long if you're trying to do it between keystrokes.

When the IDE needs to work out the type of a particular expression inside a method body -- say you've typed "foo." and we need to figure out what are the members of foo -- we do the same thing; we skip as much work as we reasonably can.

We start with a pass which analyzes only the local variable declarations within that method. When we run that pass we make a mapping from a pair of "scope" and "name" to a "type determiner". The "type determiner" is an object that represents the notion of "I can work out the type of this local if I need to". Working out the type of a local can be expensive so we want to defer that work if we need to.

We now have a lazily-built database that can tell us the type of every local. So, getting back to that "foo." -- we figure out which statement the relevant expression is in and then run the semantic analyzer against just that statement. For example, suppose you have the method body:

String x = "hello";
var y = x.ToCharArray();
var z = from foo in y where foo.

and now we need to work out that foo is of type char. We build a database that has all the metadata, extension methods, source code types, and so on. We build a database that has type determiners for x, y and z. We analyze the statement containing the interesting expression. We start by transforming it syntactically to

var z = y.Where(foo=>foo.

In order to work out the type of foo we must first know the type of y. So at this point we ask the type determiner "what is the type of y"? It then starts up an expression evaluator which parses x.ToCharArray() and asks "what's the type of x"? We have a type determiner for that which says "I need to look up "String" in the current context". There is no type String in the current type, so we look in the namespace. It's not there either so we look in the using directives and discover that there's a "using System" and that System has a type String. OK, so that's the type of x.

We then query System.String's metadata for the type of ToCharArray and it says that it's a System.Char[]. Super. So we have a type for y.

Now we ask "does System.Char[] have a method Where?" No. So we look in the using directives; we have already precomputed a database containing all of the metadata for extension methods that could possibly be used.

Now we say "OK, there are eighteen dozen extension methods named Where in scope, do any of them have a first formal parameter whose type is compatible with System.Char[]?" So we start a round of convertibility testing. However, the Where extension methods are generic, which means we have to do type inference.

I've written a special type infererencing engine that can handle making incomplete inferences from the first argument to an extension method. We run the type inferrer and discover that there is a Where method that takes an IEnumerable<T>, and that we can make an inference from System.Char[] to IEnumerable<System.Char>, so T is System.Char.

The signature of this method is Where<T>(this IEnumerable<T> items, Func<T, bool> predicate), and we know that T is System.Char. Also we know that the first argument inside the parentheses to the extension method is a lambda. So we start up a lambda expression type inferrer that says "the formal parameter foo is assumed to be System.Char", use this fact when analyzing the rest of the lambda.

We now have all the information we need to analyze the body of the lambda, which is "foo.". We look up the type of foo, we discover that according to the lambda binder it is System.Char, and we're done; we display type information for System.Char.

And we do everything except the "top level" analysis between keystrokes. That's the real tricky bit. Actually writing all the analysis is not hard; it's making it fast enough that you can do it at typing speed that is the real tricky bit.

Good luck!

Since you are targeting Emacs, it may be best to start with the CEDET suite. All the details that Eric Lippert are covered in the code analyzer in the CEDET/Semantic tool for C++ already. There is also a C# parser (which probably needs a little TLC) so the only parts missing are related to tuning the necessary parts for C#.

The basic behaviors are defined in core algorithms that depend on overloadable functions that are defined on a per language basis. The success of the completion engine is dependent on how much tuning has been done. With c++ as a guide, getting support similar to C++ shouldn't be too bad.

Daniel's answer suggests using MonoDevelop to do parsing and analysis. This could be an alternative mechanism instead of the existing C# parser, or it could be used to augment the existing parser.

It's a hard problem to do well. Basically you need to model the language spec/compiler through most of the lexing/parsing/typechecking and build an internal model of the source code which you can then query. Eric describes it in detail for C#. You can always download the F# compiler source code (part of the F# CTP) and take a look at service.fsi to see the interface exposed out of the F# compiler that the F# language service consumes for providing intellisense, tooltips for inferred types, etc. It gives a sense of a possible 'interface' if you already had the compiler available as an API to call into.

The other route is to re-use the compilers as-is as you're describing, and then use reflection or look at the generated code. This is problematic from the point of view that you need 'full programs' to get a compilation output from a compiler, whereas when editing source code in the editor, you often only have 'partial programs' that do not yet parse, don't have all methods implemented yet, etc.

In short, I think the 'low budget' version is very hard to do well, and the 'real' version is very, very hard to do well. (Where 'hard' here measures both 'effort' and 'technical difficulty'.)

I can tell you roughly how the Delphi IDE works with the Delphi compiler to do intellisense (code insight is what Delphi calls it). It's not 100% applicable to C#, but it's an interesting approach which deserves consideration.

Most semantic analysis in Delphi is done in the parser itself. Expressions are typed as they are parsed, except for situations where this is not easy - in which case look-ahead parsing is used to work out what's intended, and then that decision is used in the parse.

The parse is largely LL(2) recursive descent, except for expressions, which are parsed using operator precedence. One of the distinct things about Delphi is that it's a single-pass language, so constructs need to be declared before being used, so no top-level pass is needed to bring that information out.

This combination of features means that the parser has roughly all the information needed for code insight for any point where it's needed. The way it works is this: the IDE informs the compiler's lexer of the position of the cursor (the point where code insight is desired) and the lexer turns this into a special token (it's called the kibitz token). Whenever the parser meets this token (which could be anywhere) it knows that this is the signal to send back all the information it has back to the editor. It does this using a longjmp because it's written in C; what it does is it notifies the ultimate caller of the kind of syntactic construct (i.e. grammatical context) the kibitz point was found in, as well as all the symbolic tables necessary for that point. So for example, if the context is in an expression which is an argument to a method, the we can check the method overloads, look at the argument types, and filter the valid symbols to only those which can resolve to that argument type (this cuts down in a lot of irrelevant cruft in the drop-down). If it's in a nested scope context (e.g. after a "."), the parser will have handed back a reference to the scope, and the IDE can enumerate all the symbols found in that scope.

Other things are also done; for example, method bodies are skipped if the kibitz token does not lie in their range - this is done optimistically, and rolled back if it skipped over the token. The equivalent of extension methods - class helpers in Delphi - have a kind of versioned cache, so their lookup is reasonably fast. But Delphi's generic type inference is much weaker than C#'s.

Now, to the specific question: inferring the types of variables declared with var is equivalent to the way Pascal infers the type of constants. It comes from the type of the initialization expression. These types are built from the bottom up. If x is of type Integer, and y is of type Double, then x + y will be of type Double, because those are the rules of the language; etc. You follow these rules until you have a type for the full expression on the right hand side, and that's the type you use for the symbol on the left.

NRefactory will do this for you.