# Category Archives: Data Science

# Categorizing Listing Photos at Airbnb

- Posted by lhmay
- on Jun, 09, 2018
- in Data Science
- Blog No Comments.

https://medium.com/airbnb-engineering/categorizing-listing-photos-at-airbnb-f9483f3ab7e3 Large-scale deep learning models are changing the way we think about images of homes on our platform. Authors: Shijing Yao, Qiang Zhu, Phillippe Siclait Airbnb is a marketplace featuring millions of homes. Travelers around the world search on the platform and discover the best homes for their trips. Aside from location and price, listing photos […]

Read More# Big data:Hadoop vs. Spark

- Posted by lhmay
- on Jun, 09, 2018
- in Data Science
- Blog No Comments.

Big data is data sets that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. To understand the phenomenon that is big data, it is often described using five Vs: Volume, Velocity, Variety, Veracity and Value Volume refers to the vast amounts of […]

Read More# Simpson’s Paradox

- Posted by lhmay
- on May, 25, 2018
- in Data Science
- Blog No Comments.

Simpson’s paradox for quantitative data: a positive trend ( blue line and red line ) appears for two separate groups, whereas a negative trend (dotted line) appears when the groups are combined. Simpson’s paradox, or the Yule–Simpson effect, is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses […]

Read More# Experimentation Key Steps

- Posted by lhmay
- on May, 24, 2018
- in Data Science
- Blog No Comments.

Within webpages, nearly every element can be changed for a split test. Marketers and web developers may try testing: Visual elements: pictures, videos, and colors Text: headlines, calls to action, and descriptions Layout: arrangement and size of buttons, menus, and forms Visitor flow: how a website user gets from point A to B Some […]

Read More# Comparing Performance of 6 Classification Models

- Posted by lhmay
- on May, 22, 2018
- in Data Science
- Blog No Comments.

In the example below 6 different algorithms are compared: Logistic Regression Linear Discriminant Analysis K-Nearest Neighbors Classification and Regression Trees Naive Bayes Support Vector Machines # Python Code # Compare Algorithms import pandas import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis […]

Read More# Gradient Descent

- Posted by lhmay
- on May, 22, 2018
- in Data Science
- Blog No Comments.

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model. Parameters refer to coefficients in Linear Regression and weights in neural networks. Learning rate The size of these steps […]

Read More# Sanity Check of Experiments & Propensity Score Matching

- Posted by lhmay
- on May, 05, 2018
- in Data Science
- Blog No Comments.

Here’s a few questions to ask yourself to decide whether you should run an A/B test: Do I have an important question? Will answering this question make an impact worth the effort of running a test? What else could I be doing with my energy? Running experiments and being creative and visionary are two completely different brain […]

Read More# ANOVA, ANCOVA, MANOVA, & MANCOVA

- Posted by lhmay
- on May, 01, 2018
- in Data Science
- Blog No Comments.

Comparison Chart BASIS FOR COMPARISON ANOVA ANCOVA Meaning ANOVA is a process of examining the difference among the means of multiple groups of data for homogeneity. ANCOVA is a technique that remove the impact of one or more metric-scaled undesirable variable from dependent variable before undertaking research. Uses Both linear and non-linear model are used. […]

Read More# Data Visualization – How to Pick the Right Chart Type?

- Posted by lhmay
- on May, 01, 2018
- in Data Science
- Blog No Comments.

There are four basic presentation types that you can use to present your data: Comparison Composition Distribution Relationship you are most likely using only the two, most commonly used types of data analysis: Comparison or Composition. choosing-a-good-chart To determine which chart is best suited for each of those presentation types, first you must answer a few […]

Read More# Python Code for TTEST and ANOVA

- Posted by lhmay
- on May, 01, 2018
- in Data Science
- Blog No Comments.

TTEST What is t-score? The t score is a ratio between the difference between two groups and the difference within the groups. Types of t-tests? There are three main types of t-test: 1. An Independent Samples t-test compares the means for two groups. 2. A Paired sample t-test compares means from the same group at different times (say, one year apart). 3. A One sample t-test tests the mean of […]

Read More