the most popular ships in anime

Author

Emily Zou

Published

March 11, 2023

The other day, me and my brother were arguing over which ship was the most popular within anime (overall/historically)… I said it was L and Yagami Light from Death Note, and he said it was Eren and Mikasa from Attack on Titan. Without robustly defining “popular” and “within anime”, it’s kind of hard to decide what method is better than the other, but since this is just for fun, I decided not to dwell on that, rather, I just wanted to prove my brother wrong…

L and Light Yagami from Death Note (2007), the horse Emily is betting on
Eren Yeager and Mikasa Ackerman from Attack on Titan (2013), my brother’s bet

I decided to measure a ship’s “popularity” through how many fanfictions had been written about it (though there’s probably a lot of room to debate there). Archive of our Own had helpfully published a data dump on all published works on their site ending at March of 2021. So, I used that. Feel free to check it out! Though, you won’t find any Demon Slayer or Spy x Family or Jujutsu Kaisen, sadly.

Show me the Code
import os 
import glob
import pandas as pd
Show me the Code
df = pd.read_csv('tag.csv')
dg = df.sort_values(by='cached_count', ascending = False)
ds = dg[dg['name'] != "Redacted"]
dl = ds[ds['type'] == "Fandom"]

The way their data is (very, very nicely sorted) is through tags and work-data (which is a hefty 1GB), each tag has an ID and is sorted into different kinds of categories (like fandom type, relationship, character, etc…). These tags also have information on their “cached-count”, which is just how many times they show up, which will be very helpful later.

I wanted to first take a look at which ships were most popular overall, and quickly realized what the problem would be with finding the most popular ships in anime

Anime Titles only

Show me the Code
tags.head()
id type name canonical cached_count merger_id
494603 758208 Fandom Haikyuu!! True 130918 NaN
11296 13999 Fandom Naruto True 105108 NaN
466212 721553 Fandom Shingeki no Kyojin | Attack on Titan True 60008 NaN
358410 582724 Fandom Miraculous Ladybug True 55895 NaN
10317 12845 Fandom Hetalia: Axis Powers True 43092 NaN
Show me the Code
def checker (s): 
    return any (x in animetags for x in s)

def multichecker (s): 
    if ',' in s: 
        return True
    else: 
        return False

def nonan (d): 
    return {k:v for k, v in d.items() if v == v}

def topfive (d): 
    return {k:v for k, v in d.items() if v >= 5.0}
Show me the Code
tagdict = pd.Series(tag.name.values, index = tag.id).to_dict()
tdict = {str(k):v for k,v in tagdict.items()}

def replace_titles (s): 
    return tdict[s]

Now, I could get information on every entry that had any anime tagged

Show me the Code
os.chdir(r'/Users/emilyzou/Desktop/final/chunks')
filelist = glob.glob('*.csv')
for file in filelist: 
    da = pd.read_csv(file)
    da ['taglist'] = da ['tags'].map(lambda s: str(s).split('+'))  
    da['checkisin'] = da['taglist'].apply(checker)
    animed = da[da['checkisin'] == True]
    animed ['anime'] = animed ['taglist'].apply(lambda x: list(set(x) & set(animetags))) 
    animed ['anime'] = animed ['anime'].apply(lambda x: x[0])
    anidf = animed [['creation date', 'language', 'word_count', 'taglist', 'anime']].reset_index()
    anidf['tag_length'] = anidf['taglist'].apply(len)
    explode = anidf.explode('taglist')
    exploded = explode.groupby(['anime', 'taglist']).size().unstack(fill_value = None).reset_index()
    removeplease = ['anime', 'taglist']
    list1 = [x for x in list(exploded.columns) if x not in removeplease]
    exploded ['anime'] =  exploded['anime'].astype(pd.StringDtype())
    exploded ['multicheck'] = exploded['anime'].apply(multichecker)
    data = exploded[exploded['multicheck'] == False]
    dict = data.set_index(['anime']).to_dict('index') 
    dic = {k:nonan(v) for k, v in dict.items()}
    dic5 = {k:topfive(v) for k, v in dic.items()}
    dictt = {replace_titles(k): v for k, v in dic5.items()}
    dm = df 
    tagged = list(data.columns)
    dm['id'] =  dm['id'].astype(pd.StringDtype())
    othertags = dm[dm['id'].isin (tagged)]
    tag2dict = pd.Series(othertags.name.values, index = othertags.id).to_dict()
    t2dict = {str(k):v for k,v in tag2dict.items()}
    def replace_tags (s):
        if s in list(t2dict.keys()): 
            return t2dict[s]
        else:
            return None
    def dictreplace_tags (dict): 
        return {replace_tags(k): v for k, v in dict.items()}
    dictt = {replace_titles(k): dictreplace_tags(v) for k, v in dic5.items()}
    pd.DataFrame.from_dict(dictt,  orient = 'index').to_csv('results{}'.format(file))

The works data, like I wrote before, is really really big, so I had split it into equal chunks of 10,000 lines… which created 105 seperate CSV files. The for loop above does all the processing I want to spit out every /other/ tag that got tagged along the anime, and does this on every single file.

Show me the Code
os.chdir(r'/Users/emilyzou/Desktop/final/chunks/results')
csv_files = []
filelist = glob.glob('*.csv')
for file in filelist: 
    _dg = pd.read_csv(file)
    csv_files.append(_dg)

merged = pd.concat(csv_files)

Then, I merged all 105 of my chunks into one big file. Let’s take a look at what we got!

Show me the Code
merged = merged.rename(columns = {'Unnamed: 0': 'anime'})
Show me the Code
all = merged.groupby('anime').sum()
all ['total'] = all.sum(numeric_only = True, axis = 1)
all.sort_values(by = ['total'], ascending = False).to_csv('all.csv')

Dataframe with every tag as a column, and counts by how many times they appeared with an anime in one work

Show me the Code
all.sort_values(by = ['total'], ascending = False).head()
General Audiences One Piece Roronoa Zoro Teen And Up Audiences Fluff Nami (One Piece) Monkey D. Luffy Nico Robin Unnamed: 9 Portgas D. Ace ... Magic-Users Kamui (Gintama) Crossdressing Major character death - Freeform Riding Fukawa Touko/Togami Byakuya POV Outsider Alternate Universe - Bakery Alternate Universe - Neighbors total
anime
Haikyuu!! 9880.0 0.0 0.0 10344.0 8847.0 0.0 0.0 0.0 67.0 0.0 ... 0.0 0.0 5.0 5.0 6.0 0.0 5.0 6.0 5.0 310418.0
Naruto 2757.0 11.0 0.0 3569.0 1353.0 0.0 0.0 0.0 67.0 0.0 ... 9.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99093.0
Shingeki no Kyojin | Attack on Titan 1379.0 0.0 0.0 1941.0 1208.0 0.0 0.0 0.0 39.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 76630.0
Dangan Ronpa - All Media Types 1494.0 0.0 0.0 2263.0 1357.0 0.0 0.0 0.0 81.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 6.0 0.0 0.0 0.0 48513.0
Miraculous Ladybug 3066.0 0.0 0.0 2933.0 1523.0 0.0 0.0 0.0 198.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 38693.0

5 rows × 1428 columns

Show me the Code
d = pd.read_csv('all.csv').set_index(['anime'])
d.T.sort_values(by = ['Haikyuu!!'], ascending = False).to_csv('trans.csv')

Looks like Haikyuu is the most popular fandom within anime… let’s see which tags get circulated the most!

Most commonly used tag with Haikyuu

Show me the Code
d.T.sort_values(by = ['Haikyuu!!'], ascending = False).iloc[:,0:1].head(20)
anime Haikyuu!!
total 310418.0
Haikyuu!! 31388.0
M/M 24043.0
No Archive Warnings Apply 17782.0
Choose Not To Use Archive Warnings 11513.0
Teen And Up Audiences 10344.0
General Audiences 9880.0
Fluff 8847.0
Hinata Shouyou 8539.0
Oikawa Tooru 7225.0
Kuroo Tetsurou 7123.0
Kageyama Tobio 6927.0
Bokuto Koutarou 6823.0
Akaashi Keiji 6086.0
Tsukishima Kei 5806.0
Iwaizumi Hajime 5582.0
F/M 5217.0
Kozume Kenma 5177.0
Angst 4798.0
Sugawara Koushi 4593.0

Looks about right.

I’m going to transponse the dataframe just to see if there’s anyhing glaringly wrong.

Transposed Dataframe

Show me the Code
d.T
anime Haikyuu!! Naruto Shingeki no Kyojin | Attack on Titan Dangan Ronpa - All Media Types Miraculous Ladybug One Piece RWBY Avatar: Legend of Korra Hunter X Hunter Hetalia: Axis Powers ... Yu-Gi-Oh! ARC-V Umineko no Naku Koro ni | When the Seagulls Cry Fate/stay night (Visual Novel) Fate/Zero No. 6 (Anime & Manga) Pocket Monsters: Diamond & Pearl & Platinum | Pokemon Diamond Pearl Platinum Versions Psycho-Pass Senki Zesshou Symphogear Pokemon Mystery Dungeon Saiyuki
General Audiences 9880.0 2757.0 1379.0 1494.0 3066.0 1604.0 1212.0 899.0 632.0 959.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
One Piece 0.0 11.0 0.0 0.0 0.0 5429.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Roronoa Zoro 0.0 0.0 0.0 0.0 0.0 1794.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Teen And Up Audiences 10344.0 3569.0 1941.0 2263.0 2933.0 1654.0 1641.0 914.0 869.0 1072.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Fluff 8847.0 1353.0 1208.0 1357.0 1523.0 759.0 426.0 545.0 477.0 161.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Fukawa Touko/Togami Byakuya 0.0 0.0 0.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
POV Outsider 5.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Alternate Universe - Bakery 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Alternate Universe - Neighbors 5.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
total 310418.0 99093.0 76630.0 48513.0 38693.0 37209.0 31140.0 20980.0 20486.0 19684.0 ... 26.0 22.0 20.0 20.0 15.0 15.0 12.0 10.0 6.0 5.0

1428 rows × 80 columns

These numbers are a bit off… I know from looking at the initial overall ship data that Levi/ Eren Yeager should make up a significant chunk, but that isn’t reflected here.

I guess most people don’t simultaenously tag a fandom and their ship, but that’s okay. We’re just grabbing a list of anime ships through getting all tags that get tagged with anime. It’d be pretty hard for a ship to fall through the cracks this way.

I made a list of every relationship tag in the original tag CSV file, and then used list comprehension to create a new dataframe by filtering each “tag” column to match the relationship. We then get a list of every ship-tag that got mentioned with our list of animes.

Show me the Code
ships = all[[c for c in all.columns if c in renames]]
ships.T.max().sort_values(ascending = False).head(20)
anime
Haikyuu!!                                   3664.0
Naruto                                      1245.0
Avatar: Legend of Korra                      945.0
RWBY                                         907.0
Dangan Ronpa - All Media Types               867.0
Hunter X Hunter                              800.0
Shingeki no Kyojin | Attack on Titan         690.0
InuYasha - A Feudal Fairy Tale               525.0
Dragon Ball                                  210.0
Super Dangan Ronpa 2                         209.0
Fairy Tail                                   151.0
Fullmetal Alchemist: Brotherhood & Manga     140.0
Bleach                                       128.0
Fruits Basket                                103.0
Gintama                                       89.0
Durarara!!                                    73.0
Kuroshitsuji | Black Butler                   58.0
One Piece                                     48.0
Yu-Gi-Oh! 5D's                                37.0
Tennis no Oujisama | Prince of Tennis         33.0
dtype: float64

This is what we’d get if we only looked at ships that got tagged with an anime title, which is unfortunately not realistic.

Show me the Code
ships.max().sort_values(ascending = False)
Iwaizumi Hajime/Oikawa Tooru        3664.0
Hinata Shouyou/Kageyama Tobio       3325.0
Tsukishima Kei/Yamaguchi Tadashi    2107.0
Sawamura Daichi/Sugawara Koushi     1797.0
Uchiha Sasuke/Uzumaki Naruto        1245.0
                                     ...  
Kisaragi Shintaro/Tateyama Ayano       5.0
Hyuuga Neji/Nara Shikamaru             5.0
Ishimaru Kiyotaka & Oowada Mondo       5.0
Roy Mustang/Riza Hawkeye               5.0
Iason Mink/Riki                        5.0
Length: 211, dtype: float64

But, we don’t care about these numbers. With these column names, we can get our final list of a relevant ship tags

Show me the Code
shiplist = list(ships.columns.values)
shiplist[0:15]
['Higurashi Kagome/InuYasha',
 'Haruno Sakura/Uchiha Sasuke',
 'Hyuuga Hinata/Uzumaki Naruto',
 'Senju Hashirama/Senju Tobirama',
 'Hatake Kakashi/Umino Iruka',
 'Uchiha Sasuke/Uzumaki Naruto',
 'Hoshigaki Kisame/Uchiha Itachi',
 'Senju Tobirama/Uchiha Madara',
 'Naegi Makoto/Togami Byakuya',
 'Hinata Hajime/Komaeda Nagito',
 'Korra/Asami Sato',
 'Hisoka/Illumi Zoldyck',
 'Gon Freecs/Killua Zoldyck',
 'Gon Freecs & Killua Zoldyck',
 'Grimmjow Jaegerjaques/Kurosaki Ichigo']

Now, let’s filter it with our original tag CSV!

Show me the Code
animeships = ds[ds['name'].isin (shiplist)]
animeships.head(20)
id type name canonical cached_count merger_id
175042 267347 Relationship Minor or Background Relationship(s) True 35799 NaN
660211 976131 Relationship Levi/Eren Yeager False 21010 5582955.0
929282 1329922 Relationship Iwaizumi Hajime/Oikawa Tooru True 18027 NaN
494604 758209 Relationship Hinata Shouyou/Kageyama Tobio True 17150 NaN
11553 14303 Relationship Uchiha Sasuke/Uzumaki Naruto True 13393 NaN
376089 604125 Relationship Other Relationship Tags to Be Added True 11085 NaN
554549 836528 Relationship Sawamura Daichi/Sugawara Koushi True 8743 NaN
972783 1408234 Relationship Tsukishima Kei/Yamaguchi Tadashi True 8035 NaN
597975 893104 Relationship Levi/Erwin Smith False 7521 2403411.0
687472 1011847 Relationship Marco Bott/Jean Kirstein True 6755 NaN
11548 14298 Relationship Hatake Kakashi/Umino Iruka True 6741 NaN
577424 865991 Relationship Nanase Haruka/Tachibana Makoto True 6319 NaN
271602 471636 Relationship Korra/Asami Sato True 6188 NaN
8297 10230 Relationship Edward Elric/Roy Mustang True 6114 NaN
733271 1072769 Relationship Blake Belladonna/Yang Xiao Long True 5556 NaN
76081 110362 Relationship Haruno Sakura/Uchiha Sasuke True 5071 NaN
458156 711036 Relationship Hinata Hajime/Komaeda Nagito True 4572 NaN
667984 986370 Relationship eruri False 4273 2403411.0
953482 1362296 Relationship Azumane Asahi/Nishinoya Yuu True 4224 NaN
78454 113712 Relationship Heiwajima Shizuo/Orihara Izaya True 4127 NaN

And we’re done!

It looks like, as of 2021 at least, Levi and Eren Yeager from Attack on Titan were the most popular ship. They’re followed by Iwaizumi Hajime and Oikawa Tooru from Haikyuu!, by a margin of around 3000. Hinata Shouyou and Kageyama Tobio, also from Haikyuu! take third place, lagging only by 1000. Fourth place is Uchiha Sasuke and Uzumaki Naruto from Naruto.

Fifth and sixth place also go to the Haikyuu fandom. Fifth place is the Sawamura Daichi and Sugawara Koushi pairing, and sixth place is Tsukishima Kei/Yamaguchi Tadashi. Attack on Titan makes a come back at seventh and eighth place, with Levi/Erwin Smith and Marco Bott/Jean Kirstein, though both lag far behind Levi/Eren Yeager. Naruto takes ninth place with Hatake Kakashi/Umino Iruka.

Despite Attack on Titan’s spot in the first place, it looks like Haikyuu! dominates the leaderboard overall, with numerous ships enjoying similar levels of popularity.

Tenth place sees the first non- AOT, Haikyuu, and Naruto anime, with Nanase Haruka/Tachibana Makoto from Free! Eleventh place sees the first non- Male/Male pairing, with Korra/Asami Sato from Legend of Korra. Twelvth is Edward Elric/Roy Mustang from Full Metal Alchemist Brotherhood, and thirteenth is Blake Belladonna/Yang Xiao Long from RWBY.

Was my Brother or was I more correct?

Show me the Code
animeships = animeships.reset_index()
Show me the Code
print (animeships[animeships['name'] == "L/Yagami Light"])
    index     id          type            name  canonical  cached_count  \
23   9812  12239  Relationship  L/Yagami Light       True          3881   

    merger_id  
23        NaN  
Show me the Code
print (animeships[animeships['name'] == "Mikasa Ackerman/Eren Yeager"])
     index       id          type                         name  canonical  \
54  687489  1011866  Relationship  Mikasa Ackerman/Eren Yeager       True   

    cached_count  merger_id  
54          1622        NaN  

Looks, like I was closer to being correct! Even though we were both far, far from being right overall… Of course, this may be because of the way I chose to analyze a ship’s popularity… it’s pretty easy to argue that a more popular ship would inspire more fanfiction written about it, but not fanfiction-writers make up a specialized group of a fandom… and they tend to heavily favor Male/Male relationships. Thanks for reading! Apologies to any anime/Ao3 enthusiasts if I made any obvious missteps… I can only be considered an amateur in both realms.