{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(gc_analysis)=\n",
    "# Datenaufbereitung"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In diesem Notebook werden die Daten aus Webscrapping und der Sentimentanalyse für die Analyse vorbereitetet und angereichert. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import pandas as pd\n",
    "import numpy as np \n",
    "from pymongo import MongoClient\n",
    "from scipy.stats import ttest_ind\n",
    "from difflib import SequenceMatcher\n",
    "\n",
    "#import plotly.express as px\n",
    "#import plotly.figure_factory as ff\n",
    "#from plotly.offline import init_notebook_mode\n",
    "#init_notebook_mode() # To show plotly plots when notebook is exported to html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "mongodb_pass = json.load(open('API_Data.json'))['mongoDB_pass'] # password mongodb user\n",
    "client = MongoClient(mongodb_pass)\n",
    "db = client.gc_nfl\n",
    "mycoll = db.gc_games"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "````{margin}\n",
    "```{note} Da die Youtube-Kommentare sehr lang sein können, wird die maximale Zeichenlänge der Zellen im Dataframe erhöht.\n",
    "```\n",
    "````"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.options.display.max_colwidth = 1000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hole Daten von MongoDB"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dafür werden zu nächste die Daten aus mongodb abgerufen und in einen Dataframe umgewandelt. Die explorative Datenanalyse wird am Beispiel der Season 2021/2022 gezeigt, da diese Season bereits abgeschlossen ist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = [\n",
    "        {'$match':{'year':'2021'}},\n",
    "        {'$project':{\n",
    "                '_id':0\n",
    "        }}\n",
    "        ]\n",
    "\n",
    "x = mycoll.aggregate(pipeline)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "df0 = pd.DataFrame.from_dict(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Vorbereitung Entitäten"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Für den Erkenntnisgewinn sind die Enitäten der Kommentare interessant, welche Mannschaftsnamen enthalten. Die Mannschaftsnamen der NFL-Teams bestehen aus zwei Bestandteilen: einen Namen des Heimatsort (z.B. Tampa Bay) und einem Teamnamen (z.B. Buccaneers). Ziel der Data Preperation ist es so viele Enitäten, wie möglich als Mannschaftsnamen zu identifizieren. Dafür müssen Synonyme wie z.B. Spitznamen oder nur der Heimatort als Mannschaftsnamen und Rechtschreibfehler erkannt werden. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team1</th>\n",
       "      <th>score1</th>\n",
       "      <th>team2</th>\n",
       "      <th>score2</th>\n",
       "      <th>year</th>\n",
       "      <th>week</th>\n",
       "      <th>videoID</th>\n",
       "      <th>comment</th>\n",
       "      <th>entity</th>\n",
       "      <th>salience</th>\n",
       "      <th>score</th>\n",
       "      <th>magnitude</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>players</td>\n",
       "      <td>0.800480</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>NFL</td>\n",
       "      <td>0.136442</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>Tampa Bay Buccaneers</td>\n",
       "      <td>0.063078</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>#🐰💀 vs #🤠🦆</td>\n",
       "      <td>#🐰</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.1</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>OMG!! I really thought we were gonna beat the goat!! Such an amazing game!!</td>\n",
       "      <td>game</td>\n",
       "      <td>0.528493</td>\n",
       "      <td>0.9</td>\n",
       "      <td>0.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        team1 score1    team2 score2  year    week      videoID  \\\n",
       "0  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "1  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "2  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "3  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "4  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "\n",
       "                                                                                    comment  \\\n",
       "0  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "1  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "2  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "3                                                                                #🐰💀 vs #🤠🦆   \n",
       "4               OMG!! I really thought we were gonna beat the goat!! Such an amazing game!!   \n",
       "\n",
       "                 entity  salience  score  magnitude  \n",
       "0               players  0.800480    0.0        0.0  \n",
       "1                   NFL  0.136442    0.0        0.0  \n",
       "2  Tampa Bay Buccaneers  0.063078    0.0        0.0  \n",
       "3                    #🐰  1.000000    0.1        0.1  \n",
       "4                  game  0.528493    0.9        0.9  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df0.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Berücksichtigung von Rechtschreibungsfehler"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Insofern die Entitäten mit Rechtschreibfehler von Google NLP erkannt wurden, sollen diese mit Hilfe des SequenceMatchers und Casefold zu einem \"team_by_entity\"-Wert zugeordnet werden."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "df0['entity'] = df0['entity'].apply(lambda x: x.casefold())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Der SequenceMatcher berechent einen Ähnlichkeitswert von zwei Strings. Für das Projekt wird der Grenzwert auf 0.8 festgelegt. Mit der Funktion `match_sequence()` können die Teamnamen mit den Enität aus den Kommentaren abgeglichen werden. Liegt Ähnlichkeit vor, werden die Teamnamen der Enitäten korrigiert. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9230769230769231"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myStr1 = \"Titans\"\n",
    "myStr2 = \"tittans\"\n",
    "\n",
    "SequenceMatcher(a=myStr1.casefold(),b=myStr2.casefold()).ratio()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def match_sequence(string1, string2):\n",
    "    '''\n",
    "    Input: string1 -> Entität\n",
    "           string2 -> Textstelle im Kommentar\n",
    "    '''\n",
    "    value = string2\n",
    "    if (SequenceMatcher(a = string1.casefold(), b = string2.casefold()).ratio() > 0.8):\n",
    "        value = string1\n",
    "    return value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "df0['entity']= df0.apply(lambda x: match_sequence(x['team1'], x['entity']), axis=1)\n",
    "df0['entity']= df0.apply(lambda x: match_sequence(x['team2'], x['entity']), axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mit der Anwendung des SequenceMatcher konnten 965 weitere Enitäten zugeordnet werden. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Erstellungen eines Synonym-Wörterbuchs "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Die Datenbasis des Synonym-Wörterbuchs ist eine Tabelle aus Github, welche die vollständigen Mannschaftsnamen enthält. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "nfl_teams = pd.read_csv('https://gist.githubusercontent.com/cnizzardini/13d0a072adb35a0d5817/raw/f315c97c7677845668a9c26e9093d0d550533b00/nfl_teams.csv')\n",
    "nfl_teams['Name_1'] = nfl_teams['Name'].apply(lambda x : x.split()[-1])\n",
    "nfl_teams['Name_2'] = nfl_teams['Name'].apply(lambda x : \" \".join(x.split()[0:-1]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>Name</th>\n",
       "      <th>Abbreviation</th>\n",
       "      <th>Conference</th>\n",
       "      <th>Division</th>\n",
       "      <th>Name_1</th>\n",
       "      <th>Name_2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>ARI</td>\n",
       "      <td>NFC</td>\n",
       "      <td>West</td>\n",
       "      <td>Cardinals</td>\n",
       "      <td>Arizona</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>ATL</td>\n",
       "      <td>NFC</td>\n",
       "      <td>South</td>\n",
       "      <td>Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Baltimore Ravens</td>\n",
       "      <td>BAL</td>\n",
       "      <td>AFC</td>\n",
       "      <td>North</td>\n",
       "      <td>Ravens</td>\n",
       "      <td>Baltimore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>BUF</td>\n",
       "      <td>AFC</td>\n",
       "      <td>East</td>\n",
       "      <td>Bills</td>\n",
       "      <td>Buffalo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>CAR</td>\n",
       "      <td>NFC</td>\n",
       "      <td>South</td>\n",
       "      <td>Panthers</td>\n",
       "      <td>Carolina</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ID               Name Abbreviation Conference Division     Name_1  \\\n",
       "0   1  Arizona Cardinals          ARI        NFC     West  Cardinals   \n",
       "1   2    Atlanta Falcons          ATL        NFC    South    Falcons   \n",
       "2   3   Baltimore Ravens          BAL        AFC    North     Ravens   \n",
       "3   4      Buffalo Bills          BUF        AFC     East      Bills   \n",
       "4   5  Carolina Panthers          CAR        NFC    South   Panthers   \n",
       "\n",
       "      Name_2  \n",
       "0    Arizona  \n",
       "1    Atlanta  \n",
       "2  Baltimore  \n",
       "3    Buffalo  \n",
       "4   Carolina  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nfl_teams.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mit den Spalten 'Name_1' und 'Name_2' wird der vollständige Mannschaftsname in den Teamname und den Heimatort getrennt, um die ersten Synonyme zu erhalten. Der Teamname dient in den nachfolgenden Betrachtungen als zentrale Entität (\"team_by_entity\") für den Mannschaftsnamen. Alle anderen Synonyme werden dem Teamnamen zugeordnetet. \n",
    "\n",
    "Nachfolgend wird die Tabelle mit den gebildeten Entitäten verschlankt. Die Werte für \"team_by_entity\" werden mit dem zugehörigen Synonymen kombiniert und unterhalb der bestehenden Tabelle angefügt. Schließlich werden die Spaltennamen umbenannt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_name2 = nfl_teams[['Name_1','Name_2']].copy(deep=True)\n",
    "df_name1 = nfl_teams[['Name_1']].copy(deep=True)\n",
    "df_name1['Name'] = df_name1['Name_1']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "nfl_teams = nfl_teams.merge(df_name2.rename(columns={'Name_2':'Name'}),how='outer')\n",
    "nfl_teams = nfl_teams.merge(df_name1,how='outer')\n",
    "nfl_teams.drop(['ID','Abbreviation', 'Conference','Division','Name_2'], axis=1, inplace=True)\n",
    "nfl_teams.rename(columns={'Name':'entity','Name_1':'team_by_entity'}, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>entity</th>\n",
       "      <th>team_by_entity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Green Bay Packers</td>\n",
       "      <td>Packers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>Green Bay</td>\n",
       "      <td>Packers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>Packers</td>\n",
       "      <td>Packers</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               entity team_by_entity\n",
       "11  Green Bay Packers        Packers\n",
       "43          Green Bay        Packers\n",
       "75            Packers        Packers"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vikings = nfl_teams[nfl_teams['team_by_entity'] == \"Packers\"] \n",
    "vikings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Für die Mannschaft \"Packers\" ist \"Cheeseheads\" ein sehr gängiger Spitzname. Solche Entitäten sollen ebenfalls berücksichtigt werden. Mit der nachfolgenden Funktion können derartige Synonyme hinzugeügt werden. Exemplarisch wird dies für drei Teams durchgeführt. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def append_pairs(df, entity, team_by_entity, single=True):\n",
    "    '''\n",
    "    takes: pandas dataframe, entity and team_by_entity, trigger for multiple or single values\n",
    "    if single=False the input for entity and team_by_entity has to be a list.\n",
    "    returns a pandas dataframe object that includes old and new data.\n",
    "    '''\n",
    "    if single:\n",
    "        new_pair = {\n",
    "                    'entity':[entity],\n",
    "                    'team_by_entity':[team_by_entity]\n",
    "                }\n",
    "    else:\n",
    "        new_pair = {\n",
    "                    'entity':entity,\n",
    "                    'team_by_entity':team_by_entity\n",
    "                }   \n",
    "    return pd.concat([df,pd.DataFrame(new_pair)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "nfl_teams = append_pairs(nfl_teams, \"Cheeseheads\", \"Packers\", single=True)\n",
    "nfl_teams = append_pairs(nfl_teams, \"Redskins\", \"Commanders\", single=True)\n",
    "nfl_teams = append_pairs(nfl_teams, \"Bucs\", \"Buccaneers\", single=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Es ist denkbar, die Synonyme noch deutlich weiter zu ergänzen, z.B. mit den Namen der Quaterbacks. Dem Team \"Buccaneers\" könnte so weitere 1700 Kommentare mit der Entität \"Brady\" zugeordnet werden. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Anwendung des Synonym-Wörterbuchs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Das Synonym-Wörterbuch wird im folgenden auf den gesamten Dataframe angewendet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "df0['entity'] = df0['entity'].apply(lambda x: x.casefold())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "df0['team_by_entity'] = None\n",
    "for index, row in nfl_teams.iterrows():\n",
    "    df0['team_by_entity'] = np.where(df0['entity'] == row['entity'].casefold(),row['team_by_entity'], df0['team_by_entity'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([None, 'Buccaneers', 'Cowboys', 'Patriots', 'Jets', 'Rams',\n",
       "       'Broncos', '49ers', 'Eagles', 'Chiefs', 'Saints', 'Falcons',\n",
       "       'Steelers', 'Ravens', 'Browns', 'Dolphins', 'Texans', 'Lions',\n",
       "       'Commanders', 'Jaguars', 'Raiders', 'Giants', 'Bills', 'Packers',\n",
       "       'Titans', 'Colts', 'Vikings', 'Panthers', 'Cardinals', 'Bears',\n",
       "       'Chargers', 'Bengals', 'Seahawks'], dtype=object)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df0['team_by_entity'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "330066"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df0['team_by_entity'].isna().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Die Dokumente, welche keiner \"team_by_enity\" zugeordnet werden konnten, werden entfernt. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team1</th>\n",
       "      <th>score1</th>\n",
       "      <th>team2</th>\n",
       "      <th>score2</th>\n",
       "      <th>year</th>\n",
       "      <th>week</th>\n",
       "      <th>videoID</th>\n",
       "      <th>comment</th>\n",
       "      <th>entity</th>\n",
       "      <th>salience</th>\n",
       "      <th>score</th>\n",
       "      <th>magnitude</th>\n",
       "      <th>team_by_entity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>tampa bay buccaneers</td>\n",
       "      <td>0.063078</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Buccaneers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>Though Dallas has nothing to be ashamed of, this game was stripped from them. That was definitely a push-off by Godwin. Outside of that, it was an awesome game to watch.</td>\n",
       "      <td>dallas</td>\n",
       "      <td>0.295658</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Cowboys</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>I’m very lost Tom brad 50mil contract  best in the league barely get hurts. But go to Dallas 160mil for dak couldn’t clutch a last year game and get hurt a lot df🤔going on here the goat get paid less but a person who get hurt a lot get paid more with no rings🤨</td>\n",
       "      <td>dallas</td>\n",
       "      <td>0.021111</td>\n",
       "      <td>-0.1</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Cowboys</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>Brady the \"goat\" barely very barely  beating the cowboys.... I'd even go as far as to say struggled to beat them, barely beat them.... the goat.... ok sure... 2 points... :/</td>\n",
       "      <td>cowboys</td>\n",
       "      <td>0.089547</td>\n",
       "      <td>0.1</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Cowboys</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>Can someone tell me how to get the Patriots game and watch it on my phone thank you</td>\n",
       "      <td>patriots</td>\n",
       "      <td>0.073656</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Patriots</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         team1 score1    team2 score2  year    week      videoID  \\\n",
       "2   Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "12  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "30  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "36  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "50  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "\n",
       "                                                                                                                                                                                                                                                                 comment  \\\n",
       "2                                                                                                                                                                               They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "12                                                                                             Though Dallas has nothing to be ashamed of, this game was stripped from them. That was definitely a push-off by Godwin. Outside of that, it was an awesome game to watch.   \n",
       "30  I’m very lost Tom brad 50mil contract  best in the league barely get hurts. But go to Dallas 160mil for dak couldn’t clutch a last year game and get hurt a lot df🤔going on here the goat get paid less but a person who get hurt a lot get paid more with no rings🤨   \n",
       "36                                                                                         Brady the \"goat\" barely very barely  beating the cowboys.... I'd even go as far as to say struggled to beat them, barely beat them.... the goat.... ok sure... 2 points... :/   \n",
       "50                                                                                                                                                                                   Can someone tell me how to get the Patriots game and watch it on my phone thank you   \n",
       "\n",
       "                  entity  salience  score  magnitude team_by_entity  \n",
       "2   tampa bay buccaneers  0.063078    0.0        0.0     Buccaneers  \n",
       "12                dallas  0.295658    0.0        0.0        Cowboys  \n",
       "30                dallas  0.021111   -0.1        0.1        Cowboys  \n",
       "36               cowboys  0.089547    0.1        0.1        Cowboys  \n",
       "50              patriots  0.073656    0.2        0.2       Patriots  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df0.dropna(subset='team_by_entity').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ebenfalls werden die Dokumente entfernt, welche ein Teamnamen enthalten, welcher nicht für das betrachtete Spiel relevant ist und damit nicht die Stimmung der spielenden Mannschaften repräsentieren, z.B. ein Kommentar zu den Cowboys im Spiel der Packers gegen Buccaners."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Anreichung des DataFrames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "df0['team_by_entity'] = np.where((df0['team_by_entity'] == df0['team1']) | (df0['team_by_entity'] == df0['team2']), df0['team_by_entity'], None )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Im nächsten Schritt werden den Größen 'Score' und 'Magnitude' ein \"Sentiment\" zugeordnet. Dafür werden Schwellenwerte für negativ, neutral und positiv festgelegt. \n",
    "\n",
    "|senitment|score|magnitude|\n",
    "|---------|-----|------|\n",
    "|positive| > 0.1| > 0.1|\n",
    "|negative| > -0.1| >0.1|\n",
    "|neutral| <= abs(0.1) | <= 0.1|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Um an dieser Stelle den \"Floating-point error\" zu umgehen, werden die Fließkommazahlen mit 10 multipliziert und in einen Integer umgewandelt, sodass sie ohne Fehler verglichen werden können. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team1</th>\n",
       "      <th>score1</th>\n",
       "      <th>team2</th>\n",
       "      <th>score2</th>\n",
       "      <th>year</th>\n",
       "      <th>week</th>\n",
       "      <th>videoID</th>\n",
       "      <th>comment</th>\n",
       "      <th>entity</th>\n",
       "      <th>salience</th>\n",
       "      <th>score</th>\n",
       "      <th>magnitude</th>\n",
       "      <th>team_by_entity</th>\n",
       "      <th>sentiment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>players</td>\n",
       "      <td>0.800480</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>None</td>\n",
       "      <td>neutral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>nfl</td>\n",
       "      <td>0.136442</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>None</td>\n",
       "      <td>neutral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>tampa bay buccaneers</td>\n",
       "      <td>0.063078</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>neutral</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        team1 score1    team2 score2  year    week      videoID  \\\n",
       "0  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "1  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "2  Buccaneers     31  Cowboys     29  2021  Week 1  HzkUcSd3Utc   \n",
       "\n",
       "                                                                                    comment  \\\n",
       "0  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "1  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "2  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "\n",
       "                 entity  salience  score  magnitude team_by_entity sentiment  \n",
       "0               players  0.800480    0.0        0.0           None   neutral  \n",
       "1                   nfl  0.136442    0.0        0.0           None   neutral  \n",
       "2  tampa bay buccaneers  0.063078    0.0        0.0     Buccaneers   neutral  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df0['sentiment'] = \"\" \n",
    "series_score = (df0['score']*10).astype(int)\n",
    "series_mag = (df0['magnitude']*10).astype(int)\n",
    "df0['sentiment'] = np.where((abs(series_score) <= 1) & (series_mag <= 1), 'neutral', df0['sentiment'] )\n",
    "df0['sentiment'] = np.where((series_score <= -1) & (series_mag > 1), 'negative', df0['sentiment'] )\n",
    "df0['sentiment'] = np.where((series_score >= 1) & (series_mag  > 1), 'positive', df0['sentiment'] )\n",
    "df0.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Für die Analyse werden zwei weitere Spalten erzeugt. Die Spalte \"win_for_entity\" enthält die Attribute \"win\", \"draw\" oder \"defeat\", je nachdem ob die Entität aus der Zeile gewonnen oder verloren hat. \n",
    "Die Spalte \"winner\" enthält jeweils die Mannschaft, welche gewonnen hat. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team1</th>\n",
       "      <th>score1</th>\n",
       "      <th>team2</th>\n",
       "      <th>score2</th>\n",
       "      <th>year</th>\n",
       "      <th>week</th>\n",
       "      <th>videoID</th>\n",
       "      <th>comment</th>\n",
       "      <th>entity</th>\n",
       "      <th>salience</th>\n",
       "      <th>score</th>\n",
       "      <th>magnitude</th>\n",
       "      <th>team_by_entity</th>\n",
       "      <th>sentiment</th>\n",
       "      <th>win_for_entity</th>\n",
       "      <th>winner</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>players</td>\n",
       "      <td>0.800480</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>None</td>\n",
       "      <td>neutral</td>\n",
       "      <td>defeat</td>\n",
       "      <td>Buccaneers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>nfl</td>\n",
       "      <td>0.136442</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>None</td>\n",
       "      <td>neutral</td>\n",
       "      <td>defeat</td>\n",
       "      <td>Buccaneers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>31</td>\n",
       "      <td>Cowboys</td>\n",
       "      <td>29</td>\n",
       "      <td>2021</td>\n",
       "      <td>Week 1</td>\n",
       "      <td>HzkUcSd3Utc</td>\n",
       "      <td>They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated</td>\n",
       "      <td>tampa bay buccaneers</td>\n",
       "      <td>0.063078</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>neutral</td>\n",
       "      <td>win</td>\n",
       "      <td>Buccaneers</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        team1  score1    team2  score2  year    week      videoID  \\\n",
       "0  Buccaneers      31  Cowboys      29  2021  Week 1  HzkUcSd3Utc   \n",
       "1  Buccaneers      31  Cowboys      29  2021  Week 1  HzkUcSd3Utc   \n",
       "2  Buccaneers      31  Cowboys      29  2021  Week 1  HzkUcSd3Utc   \n",
       "\n",
       "                                                                                    comment  \\\n",
       "0  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "1  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "2  They made all the bad NFL players sign as Tampa Bay Buccaneers sometimes they get heated   \n",
       "\n",
       "                 entity  salience  score  magnitude team_by_entity sentiment  \\\n",
       "0               players  0.800480    0.0        0.0           None   neutral   \n",
       "1                   nfl  0.136442    0.0        0.0           None   neutral   \n",
       "2  tampa bay buccaneers  0.063078    0.0        0.0     Buccaneers   neutral   \n",
       "\n",
       "  win_for_entity      winner  \n",
       "0         defeat  Buccaneers  \n",
       "1         defeat  Buccaneers  \n",
       "2            win  Buccaneers  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df0['score1'] = df0['score1'].astype('int')\n",
    "df0['score2'] = df0['score2'].astype('int')\n",
    "df0['win_for_entity'] = 'draw'\n",
    "df0['winner'] = np.where(df0['score1']>df0['score2'],df0['team1'],df0['team2'])\n",
    "df0['win_for_entity'] = np.where(df0['winner']==df0['team_by_entity'],'win','defeat')\n",
    "df0.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Aggregation "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Die bisherige Bearbeitung dient als Grundlage, um für die Analyse ein Tabelle mit Kennzahlen pro Mannschaft und Spiel zu bilden. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Die aggregierte Analyse-Tabelle soll folgenden Aufbau haben: \n",
    "\n",
    "* Mannschaft\n",
    "* Spielwoche\n",
    "* Ausgang: Sieg/Niederlage/Unentschieden \n",
    "* Durchschnittlicher Sentiment Score\n",
    "* Anzahl Kommentare mit positiven Sentiment\n",
    "* Anzahl Kommentare mit negativen Sentiment\n",
    "* Anzahl Kommentare mit neutralen Sentiment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Zum Zählen der Sentimentwerte wurde einen Funktion geschrieben, um Fehler zu vermeiden falls eine Kategorie (positive, negative, neutral) nicht vorkommt. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "def count_sentiment(inp_series, sentiment_value): \n",
    "    try: \n",
    "        count = inp_series.value_counts()[sentiment_value]\n",
    "    except: \n",
    "        count = 0\n",
    "    return count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Die Aggragtion erfolgt über die Spalten \"team_by_entity\" und \"week\" mit den Funktionen: \n",
    "* first, zum Bestimmen des Ausgangs \n",
    "* mean, zur Bildung des durchschnittlichen Sentiment Scores\n",
    "* count_sentiment, zum Zählen der positiven, negativen und neutralen Kommentare"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary= df0.groupby(['team_by_entity','week']).agg(\n",
    "                                   outcome=('win_for_entity', 'first'),\n",
    "                                   avg_score=('score', 'mean'),\n",
    "                                   count_neg=('sentiment', lambda x: count_sentiment(x, 'negative')),\n",
    "                                   count_pos    =('sentiment',  lambda x: count_sentiment(x, 'positive')),\n",
    "                                   count_neutral=('sentiment', lambda x: count_sentiment(x, 'neutral')))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary = summary.reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Abschließen wird die Spielwoche in einen numerischen Wert umgewandelt. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary['week'] = summary['week'].apply(lambda x: int(x.split()[1]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary = summary.sort_values('week')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team_by_entity</th>\n",
       "      <th>week</th>\n",
       "      <th>outcome</th>\n",
       "      <th>avg_score</th>\n",
       "      <th>count_neg</th>\n",
       "      <th>count_pos</th>\n",
       "      <th>count_neutral</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>49ers</td>\n",
       "      <td>1</td>\n",
       "      <td>win</td>\n",
       "      <td>0.046951</td>\n",
       "      <td>29</td>\n",
       "      <td>47</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>Broncos</td>\n",
       "      <td>1</td>\n",
       "      <td>win</td>\n",
       "      <td>0.232381</td>\n",
       "      <td>14</td>\n",
       "      <td>117</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>Browns</td>\n",
       "      <td>1</td>\n",
       "      <td>defeat</td>\n",
       "      <td>0.128121</td>\n",
       "      <td>125</td>\n",
       "      <td>359</td>\n",
       "      <td>386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>Buccaneers</td>\n",
       "      <td>1</td>\n",
       "      <td>win</td>\n",
       "      <td>0.084448</td>\n",
       "      <td>87</td>\n",
       "      <td>178</td>\n",
       "      <td>281</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>Cardinals</td>\n",
       "      <td>1</td>\n",
       "      <td>win</td>\n",
       "      <td>0.193962</td>\n",
       "      <td>20</td>\n",
       "      <td>119</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>Bengals</td>\n",
       "      <td>18</td>\n",
       "      <td>defeat</td>\n",
       "      <td>0.017647</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>364</th>\n",
       "      <td>Panthers</td>\n",
       "      <td>18</td>\n",
       "      <td>defeat</td>\n",
       "      <td>0.325000</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Bears</td>\n",
       "      <td>18</td>\n",
       "      <td>defeat</td>\n",
       "      <td>-0.125000</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>467</th>\n",
       "      <td>Seahawks</td>\n",
       "      <td>18</td>\n",
       "      <td>win</td>\n",
       "      <td>0.079167</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>177</th>\n",
       "      <td>Colts</td>\n",
       "      <td>18</td>\n",
       "      <td>defeat</td>\n",
       "      <td>-0.042105</td>\n",
       "      <td>7</td>\n",
       "      <td>7</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>543 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    team_by_entity  week outcome  avg_score  count_neg  count_pos  \\\n",
       "0            49ers     1     win   0.046951         29         47   \n",
       "68         Broncos     1     win   0.232381         14        117   \n",
       "85          Browns     1  defeat   0.128121        125        359   \n",
       "102     Buccaneers     1     win   0.084448         87        178   \n",
       "119      Cardinals     1     win   0.193962         20        119   \n",
       "..             ...   ...     ...        ...        ...        ...   \n",
       "42         Bengals    18  defeat   0.017647          3          4   \n",
       "364       Panthers    18  defeat   0.325000          0          2   \n",
       "25           Bears    18  defeat  -0.125000          2          1   \n",
       "467       Seahawks    18     win   0.079167          1          7   \n",
       "177          Colts    18  defeat  -0.042105          7          7   \n",
       "\n",
       "     count_neutral  \n",
       "0               85  \n",
       "68              77  \n",
       "85             386  \n",
       "102            281  \n",
       "119            122  \n",
       "..             ...  \n",
       "42              10  \n",
       "364              2  \n",
       "25               5  \n",
       "467             15  \n",
       "177              5  \n",
       "\n",
       "[543 rows x 7 columns]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Das Ergebniss wird in einer CSV-Datei gespeichert und im anschließenden Notebook zur Datenanalyse genutzt. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary.to_csv(\"summary_2021.csv\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "condatascience",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "fd0a90840973ec53c960faac2ec5a80b0cd969d8e1e064cbc11596d09821ee2d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}