## Programming Interview Questions 1: Array Pair Sum

Once again it’s the college recruiting season of the year and tech companies started the interview process for full time and internship positions. I had many interviews last year these days for a summer internship. Eventually I was an intern at Microsoft Bing, and will be joining there full time next summer. I won’t have any interviews this year, but since most of my friends are actively preparing for them nowadays, I thought it would be useful to share some good quality interview questions and provide my solutions. I come across this particular question pretty often recently: Given an integer array, output all pairs that sum up to a specific value k.

VN:F [1.9.11_1134]
Rating: 10.0/10 (3 votes cast)

Posted in Programming Interview | 6 Comments

## How to Implement a Search Engine Part 3: Ranking tf-idf

Overview
We have come to the third part of our implementing a search engine project, ranking. The first part was about creating the index, and the second part was querying the index. We basically have a search engine that can answer search queries on a given corpus, but the results are not ranked. Now, we will include ranking to obtain an ordered list of results, which is one of the most challenging and interesting parts. The first ranking scheme we will implement is . In the following articles, we’ll analyze , which is a variant of tf-idf. We will also implement Google’s . Then we will explore Machine Learning techniques such as , (SVM), ,  and so forth.

Tf-idf is a weighting scheme that assigns each term in a document a weight based on its term frequency (tf) and inverse document frequency (idf).  The terms with higher weight scores are considered to be more important. It’s one of the most popular weighting schemes in Information Retrieval.

VN:F [1.9.11_1134]
Rating: 9.5/10 (4 votes cast)

## How to Implement a Search Engine Part 2: Query Index

Overview
This is the second part of our implementing a search engine project. The first part was about creating the inverted index. Now, we will use the index to answer actual search queries.

Query Types
Let’s first remember the query types. Our search engine is going to answer 3 types of queries that we generally use while searching.
1) One Word Queries (OWQ): OWQ consist of only a single word. Such as computer, or university. The matching documents are the ones containing the single query term.
2) Free Text Queries (FTQ): FTQ contain sequence of words separated by space like an actual sentence. Such as computer science, or Brown University. The matching documents are the ones that contain any of the query terms.
3) Phrase Queries (PQ): PQ also contain sequence of words just like FTQ, but they are typed within double quotes. The meaning is, we want to see all query terms in the matching documents, and exactly in the order specified. Such as “Turing Award”, or “information retrieval and web search”.

VN:F [1.9.11_1134]
Rating: 10.0/10 (3 votes cast)

## My Favorite Interview Question

I am working at Microsoft Bing as an intern this summer. To get an internship I had lots of interviews with various tech companies this year, and this was my favorite question: In an integer array with N elements (N is large), find the minimum k elements (k << N).

VN:F [1.9.11_1134]
Rating: 10.0/10 (4 votes cast)

Posted in Programming Interview | 9 Comments

## Python Optimization

Fast executing efficient code is always desirable, especially for programs that will operate on web-scale data. Small performance gains will lead to big time improvements due to massive size of the input. Here I want to share python optimization lessons I learned while coding the create index program of the previous post. I assume that you are familiar with python basics such as list comprehensions, map, filter, etc.

Map vs List Comprehension
If you’re going to use lambda functions, use list comprehension instead of map.
If you’re going to use built-in functions, use map instead of list comprehension.

VN:F [1.9.11_1134]
Rating: 10.0/10 (1 vote cast)

## How to Implement a Search Engine Part 1: Create Index

Overview
Ok, let’s start! We will implement a search engine that answers queries on Wikipedia articles. There will be two main parts of the project. First creating the index by going through the documents, and second answering the search queries using the index we created. Then we will also add ranking, classification, compression, and duplicate detection mechanisms.

What is our corpus
Our corpus (document collection) is Wikipedia articles. To simplify our work and focus on the core algorithms, we won’t write the crawler to get the articles from Wikipedia. We will use a prebuilt file which contains approximately 50,000 articles from various fields ranging from computer science to arts. The structure of the file is as follows:

VN:F [1.9.11_1134]
Rating: 10.0/10 (3 votes cast)