根据熊猫中的公共列值合并两个数据帧

如何从具有公共列值的两个数据帧中获取合并的数据帧,以便只有这些行使合并的数据帧在特定列中具有公共值。

我有5000行的 df1格式:-

    director_name   actor_1_name    actor_2_name    actor_3_name    movie_title
0   James Cameron   CCH Pounder Joel David Moore    Wes Studi     Avatar
1   Gore Verbinski  Johnny Depp Orlando Bloom   Jack Davenport   Pirates
of the Caribbean: At World's End
2   Sam Mendes   Christoph Waltz    Rory Kinnear    Stephanie Sigman Spectre

和10000行 df2作为

movieId                   genres                        movie_title
1       Adventure|Animation|Children|Comedy|Fantasy   Toy Story
2       Adventure|Children|Fantasy                    Jumanji
3       Comedy|Romance                             Grumpier Old Men
4       Comedy|Drama|Romance                      Waiting to Exhale

公共列“ film _ title”具有公共值,基于这些公共值,我希望获取所有与“ film _ title”相同的行。要删除的其他行。

如有任何帮助或建议,我将不胜感激。

注意: 我已经试过了

pd.merge(dfinal, df1, on='movie_title')

输出就像一行

director_name   actor_1_name    actor_2_name    actor_3_name    movie_title movieId title   genres

在 how = “ foreign”/“ left”,“ right”上,我尝试了所有方法,但是在删除 NaN 之后没有得到任何行,尽管存在许多共同的列。

192843 次浏览

You can use pd.merge:

import pandas as pd
pd.merge(df1, df2, on="movie_title")

Only rows are kept for which common keys are found in both data frames. In case you want to keep all rows from the left data frame and only add values from df2 where a matching key is available, you can use how="left":

pd.merge(df1, df2, on="movie_title", how="left")

We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.

import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')

For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title' as 'movie_name'.

dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')

If you want to be even more specific, you may read the documentation of pandas merge operation.

If you want to merge two DataFrames and you want a merged DataFrame in which only common values from both data frames will appear then do inner merge.

import pandas as pd


merged_Frame = pd.merge(df1, df2, on = id, how='inner')