the most popular ships in anime

Author

Emily Zou

Published

March 11, 2023

The other day, me and my brother were arguing over which ship was the most popular within anime (overall/historically)… I said it was L and Yagami Light from Death Note, and he said it was Eren and Mikasa from Attack on Titan. Without robustly defining “popular” and “within anime”, it’s kind of hard to decide what method is better than the other, but since this is just for fun, I decided not to dwell on that, rather, I just wanted to prove my brother wrong…

L and Light Yagami from Death Note (2007), the horse Emily is betting on

Eren Yeager and Mikasa Ackerman from Attack on Titan (2013), my brother’s bet

I decided to measure a ship’s “popularity” through how many fanfictions had been written about it (though there’s probably a lot of room to debate there). Archive of our Own had helpfully published a data dump on all published works on their site ending at March of 2021. So, I used that. Feel free to check it out! Though, you won’t find any Demon Slayer or Spy x Family or Jujutsu Kaisen, sadly.

Show me the Code

import os 
import glob
import pandas as pd

Show me the Code

df = pd.read_csv('tag.csv')
dg = df.sort_values(by='cached_count', ascending = False)
ds = dg[dg['name'] != "Redacted"]
dl = ds[ds['type'] == "Fandom"]

The way their data is (very, very nicely sorted) is through tags and work-data (which is a hefty 1GB), each tag has an ID and is sorted into different kinds of categories (like fandom type, relationship, character, etc…). These tags also have information on their “cached-count”, which is just how many times they show up, which will be very helpful later.

I wanted to first take a look at which ships were most popular overall, and quickly realized what the problem would be with finding the most popular ships in anime

Dataframe with the most popular ships across the board

Show me the Code

dr = ds[ds['type'] == "Relationship"]
dr.head(10)

	id	type	name	canonical	cached_count	merger_id
173025	264659	Relationship	Derek Hale/Stiles Stilinski	True	122223	NaN
4700	5672	Relationship	Castiel/Dean Winchester	True	111991	NaN
8900	11006	Relationship	Sherlock Holmes/John Watson	True	87435	NaN
76021	110293	Relationship	James "Bucky" Barnes/Steve Rogers	True	77276	NaN
85	99	Relationship	Draco Malfoy/Harry Potter	True	74244	NaN
5973	7265	Relationship	Steve Rogers/Tony Stark	True	64923	NaN
378701	607596	Relationship	Stucky	False	54045	110293.0
256699	450395	Relationship	Harry Styles/Louis Tomlinson	True	48225	NaN
352836	575567	Relationship	Aziraphale/Crowley (Good Omens)	True	39319	NaN
180531	276512	Relationship	Keith/Lance (Voltron)	True	37464	NaN

First, I don’t read that much fanfiction myself, and second, I haven’t watched enough anime to write down my own list of ships that would probably show up on this list, especically when its thousands and thousands of entries long… So how do you get entries that were about anime in the first place?

Show me the Code

reltags = dr['id'].tolist()
renames = dr['name'].tolist()

Show me the Code

titles = pd.read_csv('titles.csv')
tags = dl[dl['name'].isin (titles['title'])]
tag = tags [['id', 'name', 'cached_count']]
animetags = tag['id'].tolist()
animetags = [str(x) for x in animetags]

I sorted the ‘tags’ csv to be listed by fandom, and without a good/efficient computational way to determine if an entry was anime or not, I manually collected a list of titles of animes. This actually took a lot faster than you’d think, as there actually aren’t many titles that get a significant amount of fanfiction written about them. Of course, this was based on my own knowledge, but I think I did a pretty decent job…

Anime Titles only

Show me the Code

tags.head()

	id	type	name	canonical	cached_count	merger_id
494603	758208	Fandom	Haikyuu!!	True	130918	NaN
11296	13999	Fandom	Naruto	True	105108	NaN
466212	721553	Fandom	Shingeki no Kyojin \| Attack on Titan	True	60008	NaN
358410	582724	Fandom	Miraculous Ladybug	True	55895	NaN
10317	12845	Fandom	Hetalia: Axis Powers	True	43092	NaN

Show me the Code

def checker (s): 
    return any (x in animetags for x in s)

def multichecker (s): 
    if ',' in s: 
        return True
    else: 
        return False

def nonan (d): 
    return {k:v for k, v in d.items() if v == v}

def topfive (d): 
    return {k:v for k, v in d.items() if v >= 5.0}

Show me the Code

tagdict = pd.Series(tag.name.values, index = tag.id).to_dict()
tdict = {str(k):v for k,v in tagdict.items()}

def replace_titles (s): 
    return tdict[s]

Now, I could get information on every entry that had any anime tagged

Show me the Code

os.chdir(r'/Users/emilyzou/Desktop/final/chunks')
filelist = glob.glob('*.csv')
for file in filelist: 
    da = pd.read_csv(file)
    da ['taglist'] = da ['tags'].map(lambda s: str(s).split('+'))  
    da['checkisin'] = da['taglist'].apply(checker)
    animed = da[da['checkisin'] == True]
    animed ['anime'] = animed ['taglist'].apply(lambda x: list(set(x) & set(animetags))) 
    animed ['anime'] = animed ['anime'].apply(lambda x: x[0])
    anidf = animed [['creation date', 'language', 'word_count', 'taglist', 'anime']].reset_index()
    anidf['tag_length'] = anidf['taglist'].apply(len)
    explode = anidf.explode('taglist')
    exploded = explode.groupby(['anime', 'taglist']).size().unstack(fill_value = None).reset_index()
    removeplease = ['anime', 'taglist']
    list1 = [x for x in list(exploded.columns) if x not in removeplease]
    exploded ['anime'] =  exploded['anime'].astype(pd.StringDtype())
    exploded ['multicheck'] = exploded['anime'].apply(multichecker)
    data = exploded[exploded['multicheck'] == False]
    dict = data.set_index(['anime']).to_dict('index') 
    dic = {k:nonan(v) for k, v in dict.items()}
    dic5 = {k:topfive(v) for k, v in dic.items()}
    dictt = {replace_titles(k): v for k, v in dic5.items()}
    dm = df 
    tagged = list(data.columns)
    dm['id'] =  dm['id'].astype(pd.StringDtype())
    othertags = dm[dm['id'].isin (tagged)]
    tag2dict = pd.Series(othertags.name.values, index = othertags.id).to_dict()
    t2dict = {str(k):v for k,v in tag2dict.items()}
    def replace_tags (s):
        if s in list(t2dict.keys()): 
            return t2dict[s]
        else:
            return None
    def dictreplace_tags (dict): 
        return {replace_tags(k): v for k, v in dict.items()}
    dictt = {replace_titles(k): dictreplace_tags(v) for k, v in dic5.items()}
    pd.DataFrame.from_dict(dictt,  orient = 'index').to_csv('results{}'.format(file))

The works data, like I wrote before, is really really big, so I had split it into equal chunks of 10,000 lines… which created 105 seperate CSV files. The for loop above does all the processing I want to spit out every /other/ tag that got tagged along the anime, and does this on every single file.

Show me the Code

os.chdir(r'/Users/emilyzou/Desktop/final/chunks/results')
csv_files = []
filelist = glob.glob('*.csv')
for file in filelist: 
    _dg = pd.read_csv(file)
    csv_files.append(_dg)

merged = pd.concat(csv_files)

Then, I merged all 105 of my chunks into one big file. Let’s take a look at what we got!

Show me the Code

merged = merged.rename(columns = {'Unnamed: 0': 'anime'})

Show me the Code

all = merged.groupby('anime').sum()
all ['total'] = all.sum(numeric_only = True, axis = 1)
all.sort_values(by = ['total'], ascending = False).to_csv('all.csv')

Dataframe with every tag as a column, and counts by how many times they appeared with an anime in one work

Show me the Code

all.sort_values(by = ['total'], ascending = False).head()

	General Audiences	One Piece	Roronoa Zoro	Teen And Up Audiences	Fluff	Nami (One Piece)	Monkey D. Luffy	Nico Robin	Unnamed: 9	Portgas D. Ace	...	Magic-Users	Kamui (Gintama)	Crossdressing	Major character death - Freeform	Riding	Fukawa Touko/Togami Byakuya	POV Outsider	Alternate Universe - Bakery	Alternate Universe - Neighbors	total
anime
Haikyuu!!	9880.0	0.0	0.0	10344.0	8847.0	0.0	0.0	0.0	67.0	0.0	...	0.0	0.0	5.0	5.0	6.0	0.0	5.0	6.0	5.0	310418.0
Naruto	2757.0	11.0	0.0	3569.0	1353.0	0.0	0.0	0.0	67.0	0.0	...	9.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	99093.0
Shingeki no Kyojin \| Attack on Titan	1379.0	0.0	0.0	1941.0	1208.0	0.0	0.0	0.0	39.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	76630.0
Dangan Ronpa - All Media Types	1494.0	0.0	0.0	2263.0	1357.0	0.0	0.0	0.0	81.0	0.0	...	0.0	0.0	0.0	0.0	0.0	6.0	0.0	0.0	0.0	48513.0
Miraculous Ladybug	3066.0	0.0	0.0	2933.0	1523.0	0.0	0.0	0.0	198.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	38693.0

5 rows × 1428 columns

Show me the Code

d = pd.read_csv('all.csv').set_index(['anime'])
d.T.sort_values(by = ['Haikyuu!!'], ascending = False).to_csv('trans.csv')

Looks like Haikyuu is the most popular fandom within anime… let’s see which tags get circulated the most!

Most commonly used tag with Haikyuu

Show me the Code

d.T.sort_values(by = ['Haikyuu!!'], ascending = False).iloc[:,0:1].head(20)

anime	Haikyuu!!
total	310418.0
Haikyuu!!	31388.0
M/M	24043.0
No Archive Warnings Apply	17782.0
Choose Not To Use Archive Warnings	11513.0
Teen And Up Audiences	10344.0
General Audiences	9880.0
Fluff	8847.0
Hinata Shouyou	8539.0
Oikawa Tooru	7225.0
Kuroo Tetsurou	7123.0
Kageyama Tobio	6927.0
Bokuto Koutarou	6823.0
Akaashi Keiji	6086.0
Tsukishima Kei	5806.0
Iwaizumi Hajime	5582.0
F/M	5217.0
Kozume Kenma	5177.0
Angst	4798.0
Sugawara Koushi	4593.0

Looks about right.

I’m going to transponse the dataframe just to see if there’s anyhing glaringly wrong.

Transposed Dataframe

Show me the Code

d.T

anime	Haikyuu!!	Naruto	Shingeki no Kyojin \| Attack on Titan	Dangan Ronpa - All Media Types	Miraculous Ladybug	One Piece	RWBY	Avatar: Legend of Korra	Hunter X Hunter	Hetalia: Axis Powers	...	Yu-Gi-Oh! ARC-V	Umineko no Naku Koro ni \| When the Seagulls Cry	Fate/stay night (Visual Novel)	Fate/Zero	No. 6 (Anime & Manga)	Pocket Monsters: Diamond & Pearl & Platinum \| Pokemon Diamond Pearl Platinum Versions	Psycho-Pass	Senki Zesshou Symphogear	Pokemon Mystery Dungeon	Saiyuki
General Audiences	9880.0	2757.0	1379.0	1494.0	3066.0	1604.0	1212.0	899.0	632.0	959.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
One Piece	0.0	11.0	0.0	0.0	0.0	5429.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Roronoa Zoro	0.0	0.0	0.0	0.0	0.0	1794.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Teen And Up Audiences	10344.0	3569.0	1941.0	2263.0	2933.0	1654.0	1641.0	914.0	869.0	1072.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fluff	8847.0	1353.0	1208.0	1357.0	1523.0	759.0	426.0	545.0	477.0	161.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Fukawa Touko/Togami Byakuya	0.0	0.0	0.0	6.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
POV Outsider	5.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Alternate Universe - Bakery	6.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Alternate Universe - Neighbors	5.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
total	310418.0	99093.0	76630.0	48513.0	38693.0	37209.0	31140.0	20980.0	20486.0	19684.0	...	26.0	22.0	20.0	20.0	15.0	15.0	12.0	10.0	6.0	5.0

1428 rows × 80 columns

These numbers are a bit off… I know from looking at the initial overall ship data that Levi/ Eren Yeager should make up a significant chunk, but that isn’t reflected here.

I guess most people don’t simultaenously tag a fandom and their ship, but that’s okay. We’re just grabbing a list of anime ships through getting all tags that get tagged with anime. It’d be pretty hard for a ship to fall through the cracks this way.

I made a list of every relationship tag in the original tag CSV file, and then used list comprehension to create a new dataframe by filtering each “tag” column to match the relationship. We then get a list of every ship-tag that got mentioned with our list of animes.

Show me the Code

ships = all[[c for c in all.columns if c in renames]]
ships.T.max().sort_values(ascending = False).head(20)

anime
Haikyuu!!                                   3664.0
Naruto                                      1245.0
Avatar: Legend of Korra                      945.0
RWBY                                         907.0
Dangan Ronpa - All Media Types               867.0
Hunter X Hunter                              800.0
Shingeki no Kyojin | Attack on Titan         690.0
InuYasha - A Feudal Fairy Tale               525.0
Dragon Ball                                  210.0
Super Dangan Ronpa 2                         209.0
Fairy Tail                                   151.0
Fullmetal Alchemist: Brotherhood & Manga     140.0
Bleach                                       128.0
Fruits Basket                                103.0
Gintama                                       89.0
Durarara!!                                    73.0
Kuroshitsuji | Black Butler                   58.0
One Piece                                     48.0
Yu-Gi-Oh! 5D's                                37.0
Tennis no Oujisama | Prince of Tennis         33.0
dtype: float64

This is what we’d get if we only looked at ships that got tagged with an anime title, which is unfortunately not realistic.

Show me the Code

ships.max().sort_values(ascending = False)

Iwaizumi Hajime/Oikawa Tooru        3664.0
Hinata Shouyou/Kageyama Tobio       3325.0
Tsukishima Kei/Yamaguchi Tadashi    2107.0
Sawamura Daichi/Sugawara Koushi     1797.0
Uchiha Sasuke/Uzumaki Naruto        1245.0
                                     ...  
Kisaragi Shintaro/Tateyama Ayano       5.0
Hyuuga Neji/Nara Shikamaru             5.0
Ishimaru Kiyotaka & Oowada Mondo       5.0
Roy Mustang/Riza Hawkeye               5.0
Iason Mink/Riki                        5.0
Length: 211, dtype: float64

But, we don’t care about these numbers. With these column names, we can get our final list of a relevant ship tags

Show me the Code

shiplist = list(ships.columns.values)
shiplist[0:15]

['Higurashi Kagome/InuYasha',
 'Haruno Sakura/Uchiha Sasuke',
 'Hyuuga Hinata/Uzumaki Naruto',
 'Senju Hashirama/Senju Tobirama',
 'Hatake Kakashi/Umino Iruka',
 'Uchiha Sasuke/Uzumaki Naruto',
 'Hoshigaki Kisame/Uchiha Itachi',
 'Senju Tobirama/Uchiha Madara',
 'Naegi Makoto/Togami Byakuya',
 'Hinata Hajime/Komaeda Nagito',
 'Korra/Asami Sato',
 'Hisoka/Illumi Zoldyck',
 'Gon Freecs/Killua Zoldyck',
 'Gon Freecs & Killua Zoldyck',
 'Grimmjow Jaegerjaques/Kurosaki Ichigo']

Now, let’s filter it with our original tag CSV!

Show me the Code

animeships = ds[ds['name'].isin (shiplist)]
animeships.head(20)

	id	type	name	canonical	cached_count	merger_id
175042	267347	Relationship	Minor or Background Relationship(s)	True	35799	NaN
660211	976131	Relationship	Levi/Eren Yeager	False	21010	5582955.0
929282	1329922	Relationship	Iwaizumi Hajime/Oikawa Tooru	True	18027	NaN
494604	758209	Relationship	Hinata Shouyou/Kageyama Tobio	True	17150	NaN
11553	14303	Relationship	Uchiha Sasuke/Uzumaki Naruto	True	13393	NaN
376089	604125	Relationship	Other Relationship Tags to Be Added	True	11085	NaN
554549	836528	Relationship	Sawamura Daichi/Sugawara Koushi	True	8743	NaN
972783	1408234	Relationship	Tsukishima Kei/Yamaguchi Tadashi	True	8035	NaN
597975	893104	Relationship	Levi/Erwin Smith	False	7521	2403411.0
687472	1011847	Relationship	Marco Bott/Jean Kirstein	True	6755	NaN
11548	14298	Relationship	Hatake Kakashi/Umino Iruka	True	6741	NaN
577424	865991	Relationship	Nanase Haruka/Tachibana Makoto	True	6319	NaN
271602	471636	Relationship	Korra/Asami Sato	True	6188	NaN
8297	10230	Relationship	Edward Elric/Roy Mustang	True	6114	NaN
733271	1072769	Relationship	Blake Belladonna/Yang Xiao Long	True	5556	NaN
76081	110362	Relationship	Haruno Sakura/Uchiha Sasuke	True	5071	NaN
458156	711036	Relationship	Hinata Hajime/Komaeda Nagito	True	4572	NaN
667984	986370	Relationship	eruri	False	4273	2403411.0
953482	1362296	Relationship	Azumane Asahi/Nishinoya Yuu	True	4224	NaN
78454	113712	Relationship	Heiwajima Shizuo/Orihara Izaya	True	4127	NaN

And we’re done!

It looks like, as of 2021 at least, Levi and Eren Yeager from Attack on Titan were the most popular ship. They’re followed by Iwaizumi Hajime and Oikawa Tooru from Haikyuu!, by a margin of around 3000. Hinata Shouyou and Kageyama Tobio, also from Haikyuu! take third place, lagging only by 1000. Fourth place is Uchiha Sasuke and Uzumaki Naruto from Naruto.

Fifth and sixth place also go to the Haikyuu fandom. Fifth place is the Sawamura Daichi and Sugawara Koushi pairing, and sixth place is Tsukishima Kei/Yamaguchi Tadashi. Attack on Titan makes a come back at seventh and eighth place, with Levi/Erwin Smith and Marco Bott/Jean Kirstein, though both lag far behind Levi/Eren Yeager. Naruto takes ninth place with Hatake Kakashi/Umino Iruka.

Despite Attack on Titan’s spot in the first place, it looks like Haikyuu! dominates the leaderboard overall, with numerous ships enjoying similar levels of popularity.

Tenth place sees the first non- AOT, Haikyuu, and Naruto anime, with Nanase Haruka/Tachibana Makoto from Free! Eleventh place sees the first non- Male/Male pairing, with Korra/Asami Sato from Legend of Korra. Twelvth is Edward Elric/Roy Mustang from Full Metal Alchemist Brotherhood, and thirteenth is Blake Belladonna/Yang Xiao Long from RWBY.

Was my Brother or was I more correct?

Show me the Code

animeships = animeships.reset_index()

Show me the Code

print (animeships[animeships['name'] == "L/Yagami Light"])

    index     id          type            name  canonical  cached_count  \
23   9812  12239  Relationship  L/Yagami Light       True          3881   

    merger_id  
23        NaN

Show me the Code

print (animeships[animeships['name'] == "Mikasa Ackerman/Eren Yeager"])

     index       id          type                         name  canonical  \
54  687489  1011866  Relationship  Mikasa Ackerman/Eren Yeager       True   

    cached_count  merger_id  
54          1622        NaN

Looks, like I was closer to being correct! Even though we were both far, far from being right overall… Of course, this may be because of the way I chose to analyze a ship’s popularity… it’s pretty easy to argue that a more popular ship would inspire more fanfiction written about it, but not fanfiction-writers make up a specialized group of a fandom… and they tend to heavily favor Male/Male relationships. Thanks for reading! Apologies to any anime/Ao3 enthusiasts if I made any obvious missteps… I can only be considered an amateur in both realms.