为什么脚本语言(例如Perl、Python和Ruby)不适合作为shell语言?

shell语言如Bash (bash), Z shell (zsh), (fish)和上面的脚本语言之间有什么区别,使它们更适合shell?

在使用命令行时,shell语言似乎要容易得多。例如,对我来说,使用bash比使用IPython尽管报道与此相反中的shell配置文件更流畅。我想大多数人都会同意我的观点,在Python中进行大部分中型到大型的编程比在Bash中更容易。我使用Python作为我最熟悉的语言。PerlRuby也是如此。

我试着阐明原因,但我不能,除了假设两者对字符串的不同处理与此有关。

提出这个问题的原因是我希望开发一种两者都可用的语言。如果你知道这样的语言,也请贴出来。

正如S.Lott所解释的,这个问题需要一些澄清。我问的是shell 语言与脚本语言的特性。因此,比较不是关于各种交互式(REPL)环境的特征,如历史记录和命令行替换。这个问题的另一种表达方式是:

适合于复杂系统设计的编程语言是否能够同时表达可以访问文件系统或控制作业的有用的一行程序?一种编程语言能有效地扩展和缩小吗?

66600 次浏览

我认为这是一个分析的问题。Shell语言默认假定普通的$ xxx命令意味着要运行的命令。在Python和Ruby中,你需要做系统(“command")之类的事情。

并不是说它们不合适,只是还没有人真正做过;至少我是这么认为的。是Ruby中的一个例子,Python有IPython或类似的东西。

shell语言必须易于使用。您希望键入一次性的丢弃命令,而不是小程序。也就是说,你想要打字

ls -laR /usr

shell.ls("/usr", long=True, all=True, recursive=True)

这(也)意味着shell语言并不真正关心参数是选项、字符串、数字还是其他东西。

此外,shell中的编程构造是一个附加组件,甚至不总是内置的。例如,考虑Bash伯恩外壳 (sh)中的如果的组合,seq用于生成序列,等等。

最后,shell具有在编程中较少或不同的特定需求。例如,管道、文件重定向、进程/作业控制等等。

如果你知道这样的语言,也请贴出来。

Tcl就是这样一种语言。主要是因为它主要被设计为CAD程序的shell解释器。这里有一个铁杆Python程序员的经验,他意识到为什么Tcl是这样设计的:我真不敢相信我在称赞Tcl

对我来说,我已经编写并一直在使用和改进Tcl shell(当然是用Tcl编写的),作为我在自制路由器上的主要Linux登录shell: 纯Tcl读行

总的来说,我喜欢Tcl的一些原因与它与传统shell的语法相似有关:

  1. 在最基本的情况下,Tcl语法是command argument argument...。没有别的了。这与BashC shell甚至DOS shell相同。

  2. 裸词被认为是字符串。这同样类似于传统的shell,允许你写:open myfile.txt w+而不是open "myfile.txt" "w+"

  3. 由于1和2的基础,Tcl最终只有很少的多余语法。你可以用更少的标点符号写代码:puts Hello而不是printf("Hello");。当你写程序时,你不会感到那么多的伤害,因为你花了很多时间思考写什么。当你使用shell复制一个文件时,你不认为你只是键入,不得不一遍又一遍地键入(",);很快就会让人讨厌。

*注:不是我;我是一个铁杆的Tcl程序员

谁说他们不是?看看Zoidbergrepl(读Eval打印循环)使蹩脚的shell,因为每个命令必须在语法上正确,并且运行一个程序从:

foo arg1 arg2 arg3

system "foo", "arg1", "arg2", "arg3"

更不要让我开始尝试重定向。

因此,您需要一个自定义shell(而不是REPL)来理解命令和重定向,以及您希望使用的将命令绑定在一起的语言。我认为zoid (Zoidberg shell)做得很好。

我能想到一些不同之处;这里只是思想流,没有特别的顺序:

  1. Python,Co.被设计成善于编写脚本。Bash,Co.被设计为only擅长脚本,绝对没有妥协。IOW: Python被设计成擅长脚本和非脚本,Bash只关心脚本。

  2. <李> < p > Bash,Co.是无类型的,Python &Co.是强类型的,这意味着数字123、字符串123和文件123是完全不同的。然而,它们不是静态类型的,这意味着它们需要有不同的字面量,以保持它们分开 例子:< / p >
                    | Ruby             | Bash
    -----------------------------------------
    number          | 123              | 123
    string          | '123'            | 123
    regexp          | /123/            | 123
    file            | File.open('123') | 123
    file descriptor | IO.open('123')   | 123
    URI             | URI.parse('123') | 123
    command         | `123`            | 123
    
  3. Python & Co. are designed to scale up to 10000, 100000, maybe even 1000000 line programs, Bash & Co. are designed to scale down to 10 character programs.

  4. In Bash & Co., files, directories, file descriptors, processes are all first-class objects, in Python, only Python objects are first-class, if you want to manipulate files, directories etc., you have to wrap them in a Python object first.

  5. Shell programming is basically dataflow programming. Nobody realizes that, not even the people who write shells, but it turns out that shells are quite good at that, and general-purpose languages not so much. In the general-purpose programming world, dataflow seems to be mostly viewed as a concurrency model, not so much as a programming paradigm.

I have the feeling that trying to address these points by bolting features or DSLs onto a general-purpose programming language doesn't work. At least, I have yet to see a convincing implementation of it. There is RuSH (Ruby shell), which tries to implement a shell in Ruby, there is rush, which is an internal DSL for shell programming in Ruby, there is Hotwire, which is a Python shell, but IMO none of those come even close to competing with Bash, Zsh, fish and friends.

Actually, IMHO, the best current shell is Microsoft PowerShell, which is very surprising considering that for several decades now, Microsoft has continually had the worst shells evar. I mean, COMMAND.COM? Really? (Unfortunately, they still have a crappy terminal. It's still the "command prompt" that has been around since, what? Windows 3.0?)

PowerShell was basically created by ignoring everything Microsoft has ever done (COMMAND.COM, CMD.EXE, VBScript, JScript) and instead starting from the Unix shell, then removing all backwards-compatibility cruft (like backticks for command substitution) and massaging it a bit to make it more Windows-friendly (like using the now unused backtick as an escape character instead of the backslash which is the path component separator character in Windows). After that, is when the magic happens.

They address problem 1 and 3 from above, by basically making the opposite choice compared to Python. Python cares about large programs first, scripting second. Bash cares only about scripting. PowerShell cares about scripting first, large programs second. A defining moment for me was watching a video of an interview with Jeffrey Snover (PowerShell's lead designer), when the interviewer asked him how big of a program one could write with PowerShell and Snover answered without missing a beat: "80 characters." At that moment I realized that this is finally a guy at Microsoft who "gets" shell programming (probably related to the fact that PowerShell was neither developed by Microsoft's programming language group (i.e. lambda-calculus math nerds) nor the OS group (kernel nerds) but rather the server group (i.e. sysadmins who actually use shells)), and that I should probably take a serious look at PowerShell.

Number 2 is solved by having arguments be statically typed. So, you can write just 123 and PowerShell knows whether it is a string or a number or a file, because the cmdlet (which is what shell commands are called in PowerShell) declares the types of its arguments to the shell. This has pretty deep ramifications: unlike Unix, where each command is responsible for parsing its own arguments (the shell basically passes the arguments as an array of strings), argument parsing in PowerShell is done by the shell. The cmdlets specify all their options and flags and arguments, as well as their types and names and documentation(!) to the shell, which then can perform argument parsing, tab completion, IntelliSense, inline documentation popups etc. in one centralized place. (This is not revolutionary, and the PowerShell designers acknowledge shells like the DIGITAL Command Language (DCL) and the IBM OS/400 Command Language (CL) as prior art. For anyone who has ever used an AS/400, this should sound familiar. In OS/400, you can write a shell command and if you don't know the syntax of certain arguments, you can simply leave them out and hit F4, which will bring a menu (similar to an HTML form) with labelled fields, dropdown, help texts etc. This is only possible because the OS knows about all the possible arguments and their types.) In the Unix shell, this information is often duplicated three times: in the argument parsing code in the command itself, in the bash-completion script for tab-completion and in the manpage.

Number 4 is solved by the fact that PowerShell operates on strongly typed objects, which includes stuff like files, processes, folders and so on.

Number 5 is particularly interesting, because PowerShell is the only shell I know of, where the people who wrote it were actually aware of the fact that shells are essentially dataflow engines and deliberately implemented it as a dataflow engine.

Another nice thing about PowerShell are the naming conventions: all cmdlets are named Action-Object and moreover, there are also standardized names for specific actions and specific objects. (Again, this should sound familar to OS/400 users.) For example, everything which is related to receiving some information is called Get-Foo. And everything operating on (sub-)objects is called Bar-ChildItem. So, the equivalent to ls is Get-ChildItem (although PowerShell also provides builtin aliases ls and dir – in fact, whenever it makes sense, they provide both Unix and CMD.EXE aliases as well as abbreviations (gci in this case)).

But the killer feature IMO is the strongly typed object pipelines. While PowerShell is derived from the Unix shell, there is one very important distinction: in Unix, all communication (both via pipes and redirections as well as via command arguments) is done with untyped, unstructured strings. In PowerShell, it's all strongly typed, structured objects. This is so incredibly powerful that I seriously wonder why noone else has thought of it. (Well, they have, but they never became popular.) In my shell scripts, I estimate that up to one third of the commands is only there to act as an adapter between two other commands that don't agree on a common textual format. Many of those adapters go away in PowerShell, because the cmdlets exchange structured objects instead of unstructured text. And if you look inside the commands, then they pretty much consist of three stages: parse the textual input into an internal object representation, manipulate the objects, convert them back into text. Again, the first and third stage basically go away, because the data already comes in as objects.

However, the designers have taken great care to preserve the dynamicity and flexibility of shell scripting through what they call an Adaptive Type System.

Anyway, I don't want to turn this into a PowerShell commercial. There are plenty of things that are not so great about PowerShell, although most of those have to do either with Windows or with the specific implementation, and not so much with the concepts. (E.g. the fact that it is implemented in .NET means that the very first time you start up the shell can take up to several seconds if the .NET framework is not already in the filesystem cache due to some other application that needs it. Considering that you often use the shell for well under a second, that is completely unacceptable.)

The most important point I want to make is that if you want to look at existing work in scripting languages and shells, you shouldn't stop at Unix and the Ruby/Python/Perl/PHP family. For example, Tcl was already mentioned. Rexx would be another scripting language. Emacs Lisp would be yet another. And in the shell realm there are some of the already mentioned mainframe/midrange shells such as the OS/400 command line and DCL. Also, Plan9's rc.

由于这两种语言都是正式的编程语言,所以在一种语言中可以做的事情,在另一种语言中也可以做。实际上,这是一个设计重点问题。Shell语言是为交互使用而设计的,而脚本语言不是。

设计中的基本区别在于命令之间的数据存储和变量的作用域。在Bash等语言中,你必须通过一系列步骤来存储一个值(例如,set a='something'这样的命令),而在Python这样的语言中,你只需使用赋值语句(a = 'something')。当在shell语言中使用这些值时,你必须告诉语言你想要变量的值,而在脚本语言中,你必须告诉语言你想要字符串的直接值。这在交互使用时会产生影响。

在脚本语言中,ls被定义为命令

a = some_value


ls a*b

(a是什么意思?这意味着some_value *(不管b是什么)还是你的意思 “'anystring b ?。在脚本语言中,默认是存储在内存中的a.)

ls 'a*b'  Now means what the Unix ls a*b means.

用类似bash的语言

set a=some_value


ls a*b   means what the Unix ls a*b means.


ls $a*b  uses an explicit recall of the value of a.

脚本语言使得存储和检索值变得容易,而值上的瞬态作用域却很难实现。Shell语言可以存储和调用值,但是每个命令的作用域都是短暂的。

可伸缩性而且可扩展性?Common Lisp(你甚至可以在Unix环境中运行CLISP,以及可能的其他实现,作为登录shell)。

它的文化。伯恩外壳已经有25年的历史了;它是最早的脚本语言之一,它是第一个解决方案,以满足Unix管理员的核心需求。(例如,将所有其他实用程序捆绑在一起的“胶水”,可以执行典型的Unix任务,而不必每次都编译一个该死的C程序。)

以现代标准来看,它的语法很糟糕,奇怪的规则和标点即语句的风格(在上世纪70年代很有用,当时每个字节都算数)使得非管理员很难理解它。它的缺陷和缺点是通过它的后代(ksh, bash, zsh)的进化改进来解决的,而不必重新考虑它背后的思想。管理员们坚持使用核心语法,因为尽管它很奇怪,但没有什么能更好地处理简单的东西而不妨碍它们。

对于复杂的东西,Perl出现了,并演变成一种半管理、半应用程序的语言。但是,事情越复杂,它就越被视为一个应用程序,而不是管理工作,所以业务人员倾向于寻找“程序员”而不是“管理员”来做这件事,尽管事实上,合适的极客往往是两者兼而有。这就是重点所在,Perl应用程序功能的不断改进导致了……嗯,Python和Ruby。(这是一种过度简化,但Perl是这两种语言的灵感来源之一。)

结果呢?专业化。管理员们倾向于认为现代解释性语言对于他们每天的工作来说太重量级了。总的来说,他们是对的。他们不需要对象。他们不关心数据结构。他们需要命令。他们需要胶水。没有什么比Bourne shell概念更好地尝试做命令(可能除了Tcl,它已经在这里提到过);伯恩已经足够好了。

程序员们——他们现在不得不越来越多地学习devops——看着Bourne shell的局限性,想知道人们怎么能忍受它。但是他们所知道的工具,虽然他们肯定倾向于unix风格的I/O和文件操作,但并不是更好的的目的。我曾经用Ruby编写过备份脚本和一次性重命名文件之类的东西,因为我比bash更了解它,但任何专门的管理员都可以在bash——可能中用更少的行数和更少的开销完成同样的事情,但无论哪种方式,它都一样好。

人们经常会问“为什么每个人都使用Y,而Z更好?”——但技术的进化,就像其他事物的进化一样,往往止步于不够好。,“更好”的解决方案不会获胜,除非这种差异被视为一种破坏交易的挫败感。伯恩类型脚本可能会让感到沮丧,但对于一直使用它的人以及它所用于的工作来说,它总是能完成工作。

不。


不,脚本语言可能不适合shell。


问题是宏语言和,嗯,其他所有东西之间的二分法。

shell与其他遗留宏语言(如nroffm4)属于一类。在这些处理器中,所有内容都是字符串,处理器定义了从输入字符串到输出字符串的映射。

在所有语言中,某些边界都是双向交叉的,但通常很清楚系统的类别是还是,嗯,我不知道有一个官方术语……我将写入“一种真正的语言”。

所以当然,你可以在像Ruby这样的语言中输入所有的命令,它甚至可能是真正的shell的次优选择,但它永远不会是一种宏语言。有太多的语法需要考虑。它需要太多的引号。

但是在开始使用宏语言编程时,宏语言也有它自己的问题,因为必须做出太多妥协才能摆脱所有语法。输入字符串时不带引号。需要重新引入各种神奇的方法来注入缺失的语法。我在nroff中做了一次code-golf,只是为了有所不同。这很奇怪。宏语言中大型实现的源代码是可怕的。

对于Windows用户,我还没有觉得需要PowerShell,因为我仍然使用摩根大通的软件中的4元(现在是进入命令控制台)。它是一个非常好的shell,具有很多编程功能。所以它结合了两个世界的优点。

例如,当你看一下IRB (Ruby解释器)时,一定很有可能用更多的一行程序来扩展它,以完成日常脚本化或大规模文件管理以及分钟任务。

这些答案激励我接管基于perl的shell Zoidberg的维护。经过一些修复后,它又可以使用了!

检查用户指南或使用您最喜欢的CPAN客户端安装Bundle::Zoidberg

你在回避这个问题。并不是每个人都认为shell语言更好。首先,_Why不

不久前,一位朋友问我如何递归地搜索他的PHP脚本中的字符串。他在这些目录中有很多大的二进制文件和模板,这可能会让普通的grep陷入困境。我想不出使用grep来实现这一点的方法,所以我认为同时使用find和grep是最好的选择。

  find . -name "*.php" -exec grep 'search_string' {} \; -print

下面是Ruby中重新制作的上述文件搜索:

  Dir['**/*.php'].each do |path|
File.open( path ) do |f|
f.grep( /search_string/ ) do |line|
puts path, ':', line
end
end
end

你的第一反应可能是,“嗯,这比原来的啰嗦多了。”我只能耸耸肩,让它顺其自然。“这很容易扩展,”我说。而且它是跨平台的。