如何快速删除列表中的项目

我正在寻找一种方法,以快速删除项目从 C # List<T>。文档指出,List.Remove()List.RemoveAt()操作都是 O(n)

这严重影响了我的申请。

我编写了一些不同的删除方法,并在有500,000个条目的 List<String>上测试了它们。测试用例如下所示..。


概述

I wrote a method that would generate a list of strings that simply contains string representations of each number ("1", "2", "3", ...). I then attempted to remove every 5th item in the list. Here is the method used to generate the list:

private List<String> GetList(int size)
{
List<String> myList = new List<String>();
for (int i = 0; i < size; i++)
myList.Add(i.ToString());
return myList;
}

测试1: RemoveAt ()

下面是我用来测试 RemoveAt()方法的测试。

private void RemoveTest1(ref List<String> list)
{
for (int i = 0; i < list.Count; i++)
if (i % 5 == 0)
list.RemoveAt(i);
}

测试2: delete ()

下面是我用来测试 Remove()方法的测试。

private void RemoveTest2(ref List<String> list)
{
List<int> itemsToRemove = new List<int>();
for (int i = 0; i < list.Count; i++)
if (i % 5 == 0)
list.Remove(list[i]);
}

测试3: 设置为 null,sort,然后 RemoveRange

在这个测试中,我对列表进行了一次循环,并将待删除项设置为 null。然后,我对列表进行了排序(所以 null 应该在顶部) ,并删除了顶部所有设置为 null 的项目。 注意: 这重新排列了我的列表,所以我可能要把它放回正确的顺序。

private void RemoveTest3(ref List<String> list)
{
int numToRemove = 0;
for (int i = 0; i < list.Count; i++)
{
if (i % 5 == 0)
{
list[i] = null;
numToRemove++;
}
}
list.Sort();
list.RemoveRange(0, numToRemove);
// Now they're out of order...
}

Test 4: Create a new list, and add all of the "good" values to the new list

在这个测试中,我创建了一个新列表,并将所有 keep-item 添加到新列表中。然后,我把所有这些项目放入原始列表。

private void RemoveTest4(ref List<String> list)
{
List<String> newList = new List<String>();
for (int i = 0; i < list.Count; i++)
{
if (i % 5 == 0)
continue;
else
newList.Add(list[i]);
}


list.RemoveRange(0, list.Count);
list.AddRange(newList);
}

测试5: 设置为 null,然后 FindAll ()

在这个测试中,我将所有要删除的项目设置为 null,然后使用 FindAll()特性查找所有不是 null的项目

private void RemoveTest5(ref List<String> list)
{
for (int i = 0; i < list.Count; i++)
if (i % 5 == 0)
list[i] = null;
list = list.FindAll(x => x != null);
}

测试6: 设置为 null,然后 RemoveAll ()

在这个测试中,我将所有要删除的项目设置为 null,然后使用 RemoveAll()特性删除所有不是 null的项目

private void RemoveTest6(ref List<String> list)
{
for (int i = 0; i < list.Count; i++)
if (i % 5 == 0)
list[i] = null;
list.RemoveAll(x => x == null);
}

客户端应用程序和输出

int numItems = 500000;
Stopwatch watch = new Stopwatch();


// List 1...
watch.Start();
List<String> list1 = GetList(numItems);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());


watch.Reset(); watch.Start();
RemoveTest1(ref list1);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine();


// List 2...
watch.Start();
List<String> list2 = GetList(numItems);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());


watch.Reset(); watch.Start();
RemoveTest2(ref list2);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine();


// List 3...
watch.Reset(); watch.Start();
List<String> list3 = GetList(numItems);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());


watch.Reset(); watch.Start();
RemoveTest3(ref list3);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine();


// List 4...
watch.Reset(); watch.Start();
List<String> list4 = GetList(numItems);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());


watch.Reset(); watch.Start();
RemoveTest4(ref list4);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine();


// List 5...
watch.Reset(); watch.Start();
List<String> list5 = GetList(numItems);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());


watch.Reset(); watch.Start();
RemoveTest5(ref list5);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine();


// List 6...
watch.Reset(); watch.Start();
List<String> list6 = GetList(numItems);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());


watch.Reset(); watch.Start();
RemoveTest6(ref list6);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine();

结果

00:00:00.1433089   // Create list
00:00:32.8031420   // RemoveAt()


00:00:32.9612512   // Forgot to reset stopwatch :(
00:04:40.3633045   // Remove()


00:00:00.2405003   // Create list
00:00:01.1054731   // Null, Sort(), RemoveRange()


00:00:00.1796988   // Create list
00:00:00.0166984   // Add good values to new list


00:00:00.2115022   // Create list
00:00:00.0194616   // FindAll()


00:00:00.3064646   // Create list
00:00:00.0167236   // RemoveAll()

注释及意见

  • 前两个测试实际上并没有从列表中删除每5个项目,因为每次删除后列表都会重新排序。事实上,在500,000件物品中,只有83,334件被移除(应该是100,000件)。我对此没有意见——很明显,RemoveAt ()/RemoveAt ()方法并不是一个好主意。

  • 尽管我试图从列表中删除第5项,但在 reality中不会有这样的模式。要删除的条目将是随机的。

  • 尽管我在这个例子中使用了 List<String>,但情况并不总是如此。可能是 List<Anything>

  • Not putting the items in the list to begin with is 没有 an option.

  • The other methods (3 - 6) all performed much better, 相对而言, however I am a little concerned -- In 3, 5, and 6 I was forced to set a value to null, and then remove all the items according to this sentinel. I don't like that approach because I can envision a scenario where one of the items in the list might be null and it would get removed unintentionally.

My question is: What is the best way to quickly remove many items from a List<T>? Most of the approaches I've tried look really ugly, and potentially dangerous, to me. Is a List the wrong data structure?

Right now, I am leaning towards creating a new list and adding the good items to the new list, but it seems like there should be a better way.

96112 次浏览

I feel a HashSet, LinkedList or Dictionary will do you much better.

在移除时,List 不是一种有效的数据结构。您最好使用双链表(LinkedList) ,因为删除只需要相邻条目中的引用更新。

If you're happy creating a new list, you don't have to go through setting items to null. For example:

// This overload of Where provides the index as well as the value. Unless
// you need the index, use the simpler overload which just provides the value.
List<string> newList = oldList.Where((value, index) => index % 5 != 0)
.ToList();

但是,您可能希望查看其他数据结构,如 LinkedList<T>HashSet<T>。这实际上取决于您需要从数据结构中获得哪些特性。

好的,试试“移除”,都是这样使用的

static void Main(string[] args)
{
Stopwatch watch = new Stopwatch();
watch.Start();
List<Int32> test = GetList(500000);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
watch.Reset(); watch.Start();
test.RemoveAll( t=> t % 5 == 0);
List<String> test2 = test.ConvertAll(delegate(int i) { return i.ToString(); });
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());


Console.WriteLine((500000 - test.Count).ToString());
Console.ReadLine();


}


static private List<Int32> GetList(int size)
{
List<Int32> test = new List<Int32>();
for (int i = 0; i < 500000; i++)
test.Add(i);
return test;
}

它只循环两次,每次删除100,000个项目

我对这段代码的输出是:

00:00:00.0099495
00:00:00.1945987
1000000

Updated to try a HashSet

static void Main(string[] args)
{
Stopwatch watch = new Stopwatch();
do
{
// Test with list
watch.Reset(); watch.Start();
List<Int32> test = GetList(500000);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
watch.Reset(); watch.Start();
List<String> myList = RemoveTest(test);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine((500000 - test.Count).ToString());
Console.WriteLine();


// Test with HashSet
watch.Reset(); watch.Start();
HashSet<String> test2 = GetStringList(500000);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
watch.Reset(); watch.Start();
HashSet<String> myList2 = RemoveTest(test2);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine((500000 - test.Count).ToString());
Console.WriteLine();
} while (Console.ReadKey().Key != ConsoleKey.Escape);


}


static private List<Int32> GetList(int size)
{
List<Int32> test = new List<Int32>();
for (int i = 0; i < 500000; i++)
test.Add(i);
return test;
}


static private HashSet<String> GetStringList(int size)
{
HashSet<String> test = new HashSet<String>();
for (int i = 0; i < 500000; i++)
test.Add(i.ToString());
return test;
}


static private List<String> RemoveTest(List<Int32> list)
{
list.RemoveAll(t => t % 5 == 0);
return list.ConvertAll(delegate(int i) { return i.ToString(); });
}


static private HashSet<String> RemoveTest(HashSet<String> list)
{
list.RemoveWhere(t => Convert.ToInt32(t) % 5 == 0);
return list;
}

这给了我:

00:00:00.0131586
00:00:00.1454723
100000


00:00:00.3459420
00:00:00.2122574
100000

您总是可以从列表的末尾删除这些项。在最后一个元素上执行时,清除列表是 O (1) ,因为它所做的只是递减计数。没有涉及到下一个元素的转变。(这就是为什么删除列表通常是 O (n)的原因)

for (int i = list.Count - 1; i >= 0; --i)
list.RemoveAt(i);

我发现在处理大型列表时,这通常更快。“移除”的速度和在字典中找到要移除的正确项的速度,足以弥补创建字典的不足。不过,有几点需要注意,最初的列表必须具有唯一的值,而且我不认为一旦完成后就能保证顺序。

List<long> hundredThousandItemsInOrignalList;
List<long> fiftyThousandItemsToRemove;


// populate lists...


Dictionary<long, long> originalItems = hundredThousandItemsInOrignalList.ToDictionary(i => i);


foreach (long i in fiftyThousandItemsToRemove)
{
originalItems.Remove(i);
}


List<long> newList = originalItems.Select(i => i.Key).ToList();

或者你可以这样做:

List<int> listA;
List<int> listB;

...

List<int> resultingList = listA.Except(listB);

If the order does not matter then there is a simple O(1) List.Remove method.

public static class ListExt
{
// O(1)
public static void RemoveBySwap<T>(this List<T> list, int index)
{
list[index] = list[list.Count - 1];
list.RemoveAt(list.Count - 1);
}


// O(n)
public static void RemoveBySwap<T>(this List<T> list, T item)
{
int index = list.IndexOf(item);
RemoveBySwap(list, index);
}


// O(n)
public static void RemoveBySwap<T>(this List<T> list, Predicate<T> predicate)
{
int index = list.FindIndex(predicate);
RemoveBySwap(list, index);
}
}

这个解决方案对于内存遍历是友好的,所以即使您需要先找到索引,它也会非常快。

备注:

  • 查找项的索引必须是 O (n) ,因为列表必须是未排序的。
  • Linked lists are slow on traversal, especially for large collections with long life spans.

其他的答案(和问题本身)提供了各种方法来处理这个“鼻涕虫”(缓慢的错误)使用内置。NETFramework 类。

但是如果您愿意切换到第三方库,那么只需更改数据结构,并保持代码不变(列表类型除外) ,就可以获得更好的性能。

Loyc Core 库包括两种类型,它们的工作方式与 List<T>相同,但可以更快地删除项目:

  • DList<T> 是一个简单的数据结构,当从随机位置删除项目时,它比 List<T>提供2倍的加速
  • AList<T> 是一个复杂的数据结构,当列表非常长时(但是当列表很短时可能会比 List<T>慢) ,它可以给你一个很大的加速。

Lists 比 LinkedLists 快,直到 n 变得非常大。之所以会出现这种情况,是因为使用 LinkedList 比使用 List 更容易出现所谓的缓存错误。内存查找是相当昂贵的。当一个列表被实现为一个数组时,CPU 可以一次加载一堆数据,因为它知道所需的数据是相邻存储的。然而,链表并没有给 CPU 任何提示,指出接下来需要哪些数据,这迫使 CPU 进行更多的内存查找。顺便说一句。对于术语记忆,我的意思是 RAM。

更多细节请看: https://jackmott.github.io/programming/2016/08/20/when-bigo-foolsya.html

如果仍然希望使用 List 作为基础结构,可以使用以下扩展方法,这将为您完成繁重的工作。

using System.Collections.Generic;
using System.Linq;


namespace Library.Extensions
{
public static class ListExtensions
{
public static IEnumerable<T> RemoveRange<T>(this List<T> list, IEnumerable<T> range)
{
var removed = list.Intersect(range).ToArray();
if (!removed.Any())
{
return Enumerable.Empty<T>();
}


var remaining = list.Except(removed).ToArray();
list.Clear();
list.AddRange(remaining);


return removed;
}
}
}

一个简单的秒表测试结果在约200毫秒删除。请记住,这不是一个真正的基准测试用法。

public class Program
{
static void Main(string[] args)
{
var list = Enumerable
.Range(0, 500_000)
.Select(x => x.ToString())
.ToList();


var allFifthItems = list.Where((_, index) => index % 5 == 0).ToArray();


var sw = Stopwatch.StartNew();
list.RemoveRange(allFifthItems);
sw.Stop();


var message = $"{allFifthItems.Length} elements removed in {sw.Elapsed}";
Console.WriteLine(message);
}
}

Output:

00:00:00.2291337移除100000个元素