用于生成模拟数据的工具? ?

我正在寻找一个好的,免费的工具的建议生成样本数据的目的加载到测试数据库。打个比方,就是为任何 RDBMS 生成“ Lorem ipsum”文本的东西。我正在寻找的功能包括:

  • 为现有表定义生成数据的灵活性。
  • 能够生成小型和大型数据集(> 100万行或更多)。
  • 使用 SQL 脚本格式(INSERT语句)生成,或者使用适合大容量导入的平面文件格式(通常更快)生成。
  • 一个简单脚本的命令行界面。
  • 可扩展的、开放源码的、用动态语言编写的(这些都是很好的选择,而不是强烈的需求)。

PS: 我在 StackOverflow 上搜索了一个重复的问题,但是没有找到。如果有的话,我会很感激你的指点。


谢谢大家的热烈回应!我应该修改我的要求,使用 Mac OS X 作为我的主要开发环境,而不是 Windows (尽管我确实说过命令行界面是可取的,而且 abc 0排除了 Windows)。不过,Windows 特有的建议无疑对这个问题的其他读者有用,所以还是要谢谢你。


我的结论是:

  • GenerateData:
    • PHP web 应用程序界面,而不是命令行
    • 限制生成200条记录(或者为生成5000条记录的许可证支付20美元)
  • RedGate SQL 数据生成器
    • 不是免费的,价格是295美元
    • 需要 Windows、 .NET、 SQLServer
  • VisualStudio2008数据库版
    • 需要视窗
    • 需要昂贵的 MSDN 或 ISV 订阅
  • Banner Datadect
    • 不是免费的,价格是595美元
    • 需要视窗(?)
    • 不支持 MySQL (?)
    • GUI,不是命令行或可编写脚本的
  • Ruby Faker 宝石
    • 使用 ActiveRecord 进行大容量数据加载的速度太慢
  • 超级海洛因
    • 主要是负载测试工具,内置随机数据发生器
    • 不过用起来还是很简单的
    • 总的来说是一个好的亚军工具
  • 数据发生器
    • 满足我需求的最佳解决方案
    • XML 脚本,与 DbUnit 兼容
    • 开源(GPL) Java 代码
    • 命令行用法
    • 通过 JDBC 直接访问许多数据库
91285 次浏览

This looks quite promising: generatedata.com. Open-source, has lots of built-in data types.

There are several others listed here: Test (Sample) Data Generators. I don't have experience with any of them, but a few on that list look like they could be pretty decent.

I know you're not looking for actual lorem ipsum text; but in case anyone else searches for an actual lorem ipsum generator and finds this thread: lipsum.com does a great job of it.

If you are looking or willing to use something MySQL-specific, you could take a look at Super Smack. It is currently maintained by Tony Bourke.

Super Smack allows you to generate random data to insert into your database tables. It is customizable, allowing you to use the packaged words.dat file, or any test data of your choice.

One of the nice things about it is that it is command-line is highly customizable. There is some fairly decent examples of usage in the book High Performance MySQL which is also excerpted here.

Not sure if that is along the lines of what you are looking for, but just a thought.

A Ruby script with one of the available fake data generators should do you just fine.

http://faker.rubyforge.org/ is one such gem. Unfortunately, this doesn't fulfill all your requirements.

Here is another: http://random-data.rubyforge.org/

And a tutorial for using Faker: http://www.rubyandhow.com/how-to-generate-fake-names-addresses-in-ruby/


RE: Flexibility to generate data for an existing table definition. Combine the Faker gem with one of the available ORMs. ActiveRecord would probably be easiest.

Not direct answer to your question but this can be helpful for certain kind of data :

Fake Name Generator can be useful - http://www.fakenamegenerator.com/ , not for everything but user accounts or stuff like that. AFAIK They provide support for bulk order.

Not free, but Visual Studio 2008 Database Edition is a good alternative and it provides a lot more functionality (Integration with SCC, Unit Testing, DB Refactoring, etc...)

I know you said you were looking for a free tool, but this is one case where I would suggest that spending $295 will pay you back quickly in time saved. I've been using the RedGate tool SQL Data Generator for the last year and it is, to be short, an awesome tool. It allows for setting dependencies between columns, generates realistic data for business objects such as phone numbers, urls, names, etc. I can honestly state that this tool has paid for itself time and time again.

Take a look at databene benerator, a test data generator that looks close to your requirements.

  • it can generate data for an existing table definition (or even anonymize production data)
  • it can generate larges data set (unlimited size)
  • it supports various input (CSV, Flat Files, DBUnit) and output format (CSV, Flat Files, DBUnit, XML, Excel, Scripts)
  • it can be used on the command line or through a maven plugin
  • it's open source and customizable

I would give it a try.

BTW, a list of similar products is available on databene benerator's web site.

Normally very costly, but if you are a small ISV you can get Visual Studio 2008 Database Edition very cheaply, see the empower and bizspark promotions. It provides a lot more functionality then just generating test data (Integration with SCC, Unit Testing, DB Refactoring, etc.)

As I like the fact that Red-Grate tools are so easy to learn, I would still look at SQL Data Generator

I use a tool called Datatect:

  1. Generates data to flat files or any ODBC compliant database.
  2. Extensible via VBScript.
  3. Referentially aware; will populate foreign keys with values from parent table.
  4. Data is context aware; city, state and phone numbers for given zip codes, first names and titles with gender.
  5. Can create custom, complex data types.
  6. Generate over 2 billion proper names, business names, street addresses, cities, states, and zip codes.

I've used this tool to generate as many as 40,000,000 rows of data to a SQLServer database, and 8,000,000 rows of data to an Oracle database.

I am in no way affiliated with Banner Systems, just a satisfied customer.

+1 for Benerator: I tried 3 or 4 of the other tools on offer (including dbmonster) but found Benerator to be very quick, to deliver realistic data and to be flexible. I also got very quick & helpful feedback from the tool's creator when I posted on the forum.

Here is the list of such tools (both free and commercial): http://c2.com/cgi/wiki?TestDataGenerator

Try http://www.mockaroo.com

This is a tool my company made to help test our own applications. We've made it free for anyone to use. It's basically the Forgery ruby gem with a web app wrapped around it. You can generate data in CSV, txt, or SQL formats. Hope this helps.

a tool that really should not be missing from the list is the Data Generator from Datanamic that populates databases directly or generates insert scripts, has a large collection of pre-installed generators ( and supports multiple databases...

http://www.datanamic.com/datagenerator/index.html

For OS X there is Data Creator (US $ 7). Download is free for test purpose. You can use it to evaluate the software and its features.

It requires OS X Lion or successive. It can generate a lot of different field type and has a custom export mode plus some pre-set (TSV, CSV, Html table, web page with table inside).

http://www.tensionsoftware.com/osx/datacreator/

here at the App Store:

https://itunes.apple.com/us/app/data-creator/id491686136?mt=12

You can use DbSchema, www.dbschema.com it's a database management tool and it has a Random Data Generator to populate your database.