Monthly Archives: February 2016

Project Euler

I was planning on posting my solutions here, but I just realized the site discourages that so I decided to take down my previous answers I posted. I will likely still work on them when I have down time.

dataviz part 6 – Increasing scope of data

I’ve updated the code to grab more QB information so I don’t have to manually input the QB names.

# python3
# footballStatScraper_v7.py - pull stats into a list

import json, bs4, requests, csv
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

browser = webdriver.Firefox()
baseUrl = 'http://www.pro-football-reference.com/'
masterData = []

def getQBList():
    qbListUrl = 'http://www.pro-football-reference.com/players/qbindex.htm'
    soup = bs4.BeautifulSoup(requests.get(qbListUrl).text, "lxml")
    elems = soup.select('b > a')
    return dict(zip([playerName.text for playerName in elems], [playerUrl.get('href') for playerUrl in elems]))
    
def goToQBGameLogs(qb, qbUrl):
    browser.get(baseUrl + qbUrl)    
    gameLogsElem = browser.find_element_by_link_text('Gamelogs [+]')
    gameLogsElem.click()
    browser.implicitly_wait(10)
    
def getCsv(seasonType, qb):
    browser.find_element_by_xpath("//div[@tabindex='2']/span[contains(text(),'CSV')]").click()
    statList = FormatcsvText(browser.find_element_by_id(seasonType).text.split('\n'))
    return statList

def goToPlayoffPage():
    browser.get(browser.current_url + '/post')
    
def FormatcsvText(csvTextList):
    return [row.split(',') for row in csvTextList]

def addToDataList(statList, qbName, seasonType):
    fieldNames = defineFieldNames(statList)
    for row in range(0,len(statList)):
        dataDict = {'Player': qbName, 'seasonType': seasonType}
        if statList[row][0].isnumeric() and row > 1:
            dataDict.update(dict(zip(fieldNames,statList[row])))
            masterData.append(dataDict)

def defineFieldNames(statList):
    fieldNames = ['%s%s' % t for t in zip(statList[0], statList[1])]
    return ['Where' if field == '' else field for field in fieldNames]

def saveAsJson():
    with open('data.json', 'w') as outfile:
        json.dump(masterData, outfile)

qbDict = getQBList()

for qb, url in qbDict.items():
    try:
        goToQBGameLogs(qb, url)
        addToDataList(getCsv('csv_stats', qb), qb, 'R')
        goToPlayoffPage()
        addToDataList(getCsv('csv_stats_playoffs', qb), qb, 'P')
    except NoSuchElementException:
        continue
    

saveAsJson()
print('Done')

I also wrote something that will take the data and write it to a csv file. I separated the two functions because the scrapping part takes a really long time and I didn’t want the program to crash before saving the data somewhere. I think there are ways to learn how to scrape more efficiently, but for now, I think I want to move onto the next part of my project.

# python3
# convertToCsv.py - save json format to csv

import csv, json

def savetoCSV(data):
    with open('qbData.csv', 'w', newline='') as csvfile:
        fieldnames = list(set().union(*data))
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(data)

with open('data.json') as data_file:    
    masterData = json.load(data_file)
savetoCSV(masterData)

Cinesift

I was thinking about side project ideas and I recently discovered one of them while trying to learn how to scrape rotten tomatoes.

Cinesift is a movie rating aggregator. It combines ratings and also has functionality to do advanced searches and filters. You can also select which movies you are interested in or not so the list can change as you work through it. It’s really cool, I’ve already started using it.

 

Man vs Woman Size

I was at a bar the other night with some buddies and I was really impressed with one of the bar’s strategies. They sold beers in “man” and “woman” sizes. It’s interesting how you can take something as normal as serving sizes and bring in new connotations that likely results in a decent amount of upselling. I was with a bunch of dudes and we laughed at the clever marketing tactic, but the store got the last laugh in the end since no one ordered the smaller size.

Putting Yourself on the Hook

I’ve been back home and spending time with friends so things have been pretty busy. I’ll likely keep the next few posts fairly short.

I revisited the podcast that made me start writing everyday. It was James Altucher’s talk with Seth Godin (podcast notes here).

“Everyone should have a blog and everyone should blog everyday primarily because if you can cajole yourself to speak your truth, you will learn to dance with fear. You will learn to write until you are not afraid to write anymore.” – Seth Godin

I remember now that this quote was what made me start writing. A lot of people don’t like having a digital record of what they said, but that is when we do our best work. Seth believes there are huge opportunities for people who are willing to put themselves on the hook.

He offered specific tactical advice as well.

  • Blog on something you care about
  • Write something useful about it everyday
  • Organize and coordinate other people around your community
  • Start an organization or discussion board
  • Give up three hours a week of television
  • You will have a whole network of people cheering you on
  • You will be able to write and change minds
  • Maybe you can sell something
  • Even if you don’t, you will know how to take responsibility
  • Your leverage will go up because people are desperate for people that will take care of things
  • Climb the ladder of meaningfulness and generosity, but you need to be willing to say follow me and not wait to get permission

 

Commonplace Book

http://ryanholiday.net/how-and-why-to-keep-a-commonplace-book/

A commonplace book is a central resource or depository for ideas, quotes, anecdotes, observations and information you come across during your life and didactic pursuits.

I think having a system like the commonplace book is important. If you don’t record what you read, you’re likely to forget about it. Learning without execution is useless, but you can’t execute if you keep forgetting what you learn. My commonplace book project YCNotes.

dataviz part 5 – more code cleaning

I received some more feedback from my friend after my last iteration. He introduced me to “List Comprehensions” that make writing for loops much cleaner. There was also a section where I was previously entering in the playerName and the seasonType an unnecessary number of times. I was also setting dictionary keys and values manually, but there are built in functions that make that much easier.

# python3
# footballStatScraper_v5.py - pull stats into a list

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

qbList = ['Cam Newton']
browser = webdriver.Firefox()
browser.get('http://www.pro-football-reference.com/')
masterData = []

def qbSearch(qbName):
    searchElem = browser.find_element_by_name('search')
    searchElem.send_keys(qbName)
    searchElem.submit()

def testForQBProfile():
    try:
        goToGameLogs()
    except NoSuchElementException:
        browser.find_element_by_xpath("//div[@class='search-page-active']/div[contains(.,'QB')]//a").click()
        goToGameLogs()

def goToGameLogs():
    gameLogsElem = browser.find_element_by_link_text('Gamelogs [+]')
    gameLogsElem.click()
    browser.implicitly_wait(10)

def getCsv(seasonType):
    browser.find_element_by_xpath("//div[@tabindex='2']/span[contains(text(),'CSV')]").click()
    statList = FormatcsvText(browser.find_element_by_id(seasonType).text.split('\n'))
    return statList

def goToPlayoffPage():
    browser.get(browser.current_url + '/post')
    
def FormatcsvText(csvTextList):
    return [row.split(',') for row in csvTextList]

def addToDataList(statList, qbName, seasonType):
    fieldNames = defineFieldNames(statList)
    for row in range(0,len(statList)):
        dataDict = {'Player': qbName, 'seasonType': seasonType}
        if statList[row][0].isnumeric() and row > 1:
            dataDict.update(dict(zip(fieldNames,statList[row])))
            masterData.append(dataDict)

def defineFieldNames(statList):
    fieldNames = ['%s%s' % t for t in zip(statList[0], statList[1])]
    return ['Where' if field == '' else field for field in fieldNames]
    
for qb in qbList:
   qbSearch(qb)
   testForQBProfile()
   addToDataList(getCsv('csv_stats'), qb, 'R')
   goToPlayoffPage()
   addToDataList(getCsv('csv_stats_playoffs'), qb, 'P')

dataviz part 4 – Cleaning up the code

I met with a programming buddy of mine and got him to take a look at my code. He gave me some suggestions for best practices. Organizing different actions into functions, name them clearly so they can be read without comments, and reduce redundancies. I’m also compiling the data into a list of dictionaries now which allows for more control in data manipulation.

# python3
# footballStatScraper_v4.py - pull stats into a list

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

qbList = ['Cam Newton']
browser = webdriver.Firefox()
browser.get('http://www.pro-football-reference.com/')
masterData = []

def qbSearch(qbName):
    searchElem = browser.find_element_by_name('search')
    searchElem.send_keys(qbName)
    searchElem.submit()

def testForQBProfile():
    try:
        goToGameLogs()
    except NoSuchElementException:
        browser.find_element_by_xpath("//div[@class='search-page-active']/div[contains(.,'QB')]//a").click()
        goToGameLogs()

def goToGameLogs():
    gameLogsElem = browser.find_element_by_link_text('Gamelogs [+]')
    gameLogsElem.click()
    browser.implicitly_wait(10)

def getCsv(seasonType):
    browser.find_element_by_xpath("//div[@tabindex='2']/span[contains(text(),'CSV')]").click()
    statList = FormatcsvText(browser.find_element_by_id(seasonType).text.split('\n'))
    return statList

def goToPlayoffPage():
    browser.get(browser.current_url + '/post')
    
def FormatcsvText(csvTextList):
    statList = []
    for row in csvTextList:
        statList.append(row.split(","))
    return statList

def addToDataList(statList, qbName, seasonType):
    fieldNamesList = []
    for row in range(0,len(statList)):
        dataDict = {}
        if row == 0:
            continue
        elif row == 1:
            fieldTitles = ', '.join('%s%s' % t for t in zip(statList[0], statList[1]))
            fieldNamesList = defineWhereField(fieldTitles.split(", "))
        else:
            if statList[row][0].isnumeric():
                for cell in range(0,len(fieldNamesList)):
                    dataDict['Player'] = qbName
                    dataDict['seasonType'] = seasonType
                    dataDict[fieldNamesList[cell]] = statList[row][cell]
                masterData.append(dataDict)
            else:
                continue

def defineWhereField(fieldList):
    for i in range(0,len(fieldList)):
        if fieldList[i] == '':
            fieldList[i] = 'Where'
        else:
            continue
    return fieldList
    
for qb in qbList:
   qbSearch(qb)
   testForQBProfile()
   addToDataList(getCsv('csv_stats'), qb, 'R')
   goToPlayoffPage()
   addToDataList(getCsv('csv_stats_playoffs'), qb, 'P')

Online Course v2

My first landing page didn’t work out too well. I think my second page worked a bit better.

I ran it with the same budget $20-$25 for a week on both Facebook and google. For Facebook, I changed the targeting from financial analysts to 16-23 year old college graduates who studied Finance or Accounting. For the same $20, my clicks went from 18 to 42. I kept the google campaign the same and the clicks went up marginally from 100 to 118.

I changed the wording in the ads slightly for beginners as well. Looking at this now, I should have used similar copy. I mentioned free-mini course in the facebook ad, but not the google ad.

fbad

googlead

In the first experiment, I included the free course on the landing page with a signup for more information (no one signed up). I changed it so that the viewer had to include their email to receive the free mini-course. Because of this, I can’t compare this aspect of the experiment.

In the new version, I had 15 people put their e-mail address down. The WP Email Capture plugin auto responds to a signup with a request for authentication. I included in this e-mail a link to the free course and a message the said to click the authentication link for more courses like this. I had one person click the authentication link. In the future, I would create a identical landing pages for both facebook and google ads so I can see how the signup rates differ by the viewers coming from each site.

landingpage2

For now, I’m putting this project on hold. Mostly because I’ve been finding myself really interested in other projects. I get the sense that the market for learning excel through shortcut keys might be a bit too niche. It might be better to take a broader approach and focus on basic excel tutorials.

Scott Adams mentioned that things that turn out well usually start out well and none of his ideas that started out badly ever ended up working out. The key is to learn from what you’re doing. I learned how to make a landing page with email signups. I’ve learned how to make videos with my computer. I’ve learned that I need to think about how to experiment and measure changes that are made. I’ve gotten a beginner’s introduction to facebook and google ads. I think this will help me test and experiment with ideas in the future.