Wednesday, September 30, 2015

MapReduce 算法思想

对MapReduce的过程有一些了解,
Map: 对读到的每一条数据,产生一个<key, value> pair
Reduce: 以key为依据,将这些<key, value> pair汇总,并输出。
每一个Reducer 只能处理 同一个key 的 <key, value > data。

首先我们需要确定用多少个map的Machine, 多少个reduce的 Machine.
这些map的机器分别处理一部分input的数据,并且在local file上产生
《key, value》 pair, 然后有一个sort and shuffle的过程, 将产生的<key, value>
pair 按照key的顺序 shuffle到不同的机器上,并且使得Key的顺序是递增的,
这样做的目的是为了reduce的阶段产生的output file 里面的key, value 是有序的,
方便以后文件的利用。 最后 reduce 阶段, reduce的每个机器对shuffle 阶段产生的
intermedia file 按照Key进行汇总, 并且将结果存到  output file里面。

参照 Terasort的map reduce 思想, 我的理解是, 我们需要先确定 需要多少个amp 的machine, 多少个reduce的machine,  在每个map的machine, 他们在local 进行Merge sort,  然后有一个partition的过程,这个过程跟上面的shuffle过程类似,把每一个map machine 中处理的数据分成R个partition 块,并且保证下一个partition 块比上一个partition块的数据大, 依次递增的关系,每个partition处理好之后,需要把这些Intermedia files 通过一定的方式传给 reduce machine。 最后reduce阶段, 每个reduce的机器对每个partition 块进行汇总 Merge sort 处理,最后输出的处理后的output file 。 其实这个过程中, 每个map 机器做的事情跟reduce机器做的相同。

Map Reduce是一种 multiple machine parallel 处理一个问题的模式, 通过map reduce这种方式可以很高效地并行处理一些数据。 map 阶段其实就是把这个大规模的问题细分成几个小规模的问题, 分别在map 机器上进行小规模常规处理, 然后这些小规模问题的解 通过一些partition 加 shuffle 的处理,使得这些解的数据有一定的顺序,最后得出的最终结果能让用户方便地按照一定的方式去处理。 这个过程产生一些Intermedia 的文件,这些文件要被发送给 reduce task, reduce 阶段会根据key来分别对这些中间文件数据进行汇总处理, 最终就能得到大规模数据处理后的结果。 整个过程中, 这些map machine 和 reduce machine 可以看做是一个个的 slaves, 或者worker, 这些 worker 由一个 master 来进行协调和管理,


Master 的任务是给map worker 和 reduce worker来分配任务,并且当 其中一个 machine 不 work 或者 fail的时候,如果保证整个系统正常的运行。
通常,一个parallel 系统,需要能够处理: parallelization, fault-tolerance, data distribution, load balancing. 
其中, parallelization 通过multiple machine 实现, fault- tolerance通过master-slave来实现,data distribution 通过 平均分配 数据给每一个 machine,可以通过 sampling  +  num of reducer 来计算每个machine 分配几个任务, 

fault tolerance: 如果一个 worker failed, master 如何协调?
master pings every worker periodically. if no response is received from a worker in a certain amount of time, the master mask this worker as failed;   Any map tasks completed by the worker are reset back to their initial idle state and become eligible for scheduling on other workers. Completed map tasks are re-executed on a failure because their output is stored on the local tasks of the failed machine and is therefore inaccessible. Complete reduced tasks  don't need to be re-executed since their output is stored in a global file system. 

IF master dies, a new copy of the master will run. 

Task Graunularity:

We subdivide the map phase into M pieces and the re- duce phase into R pieces, as described above. Ideally, M and R should be much larger than the number of worker machines. 

Map Reduce 模式如果应用在 算法题中呢? 
数据规模大的时候, 如何实现 parallelization ? 

eg:  mapreduce in Dikjastra algorithm :

1. Map: 需要多少个machine, Reduce: 需要多少个machine .
2. Map 阶段怎么定义? 
3. Reduce 阶段怎么定义? 

Map 和 reduce 的 machine数目可以通过 graph 里面有多少的Node 来求。

每个 node离 start的最短距离 = 1 + min{distance(P), P: neighbors of N}

Map:  计算start 到这个Node的距离,
           key: node N, value: D( distance from start to P) S: neighbors of P

Reduce: minimal D , S: neighbors of P

由于graph 需要通过 bfs来进行 explore, 所以 第一步的map只有一个machine在进行,而reduce阶段所有machine 都进行了参与。 随着map阶段 explore的Nodes 越来越多, map上参与的machine也越来越多, reduce 会把map求出的这些distance, 根据Key node, 来求最小值。 不同 map上面处理的node不同, reduce处理的node也不同,这样就实现了parallel 处理大规模node的dikjastra 算法。 



Friday, September 25, 2015

Topic3: Tree

Tree: 
binary search tree as an example, to simulate insertion , preorder, inorder, and postorder.



#include <iostream>
#include <vector>
#include <stack>

using namespace std;

class TreeNode {
public:
    int val;
    TreeNode* left;
    TreeNode* right;
    TreeNode(int val){
        this->val = val;
        left = NULL;
        right = NULL;
    }
};

//preorder: root, left, right
//after pop root, you need to push root->right immediately

TreeNode* insert(TreeNode* root, int val) {
    TreeNode* newNode = new TreeNode(val);
    if (root == NULL) return newNode;
    if (root->val < val) {
        root->right = insert(root->right, val);
       //return insert(root->right, val); //just return the node, no reference 
    } else {
        root->left = insert(root->left, val);
       // return insert(root->left, val); //just return the node, no reference
    }
    return root;
}

//every time before push the left node into the stack, 
//you have to push the right node first, because the order of the preorder is that
//you have to pop left first and then pop right
vector<int> preorder(TreeNode* root) {
    vector<int> rst;
    if (root == NULL) return rst;
    stack<TreeNode*> buf;
    buf.push(root);
    while (!buf.empty()) {
        TreeNode* tmp = buf.top();
        buf.pop();
        rst.push_back(tmp->val);
        if(tmp->right) buf.push(tmp->right);
        if (tmp->left) buf.push(tmp->left);
    }
    return rst;
}

//for every node we currently visiting, we need to traverse to its left subtree first,
//if we reach the leftmost subtree, we need to pop up the top of the stack, that's the leftmost root node, then we have to traverse the right part of this root node, then we set the root to be the right node of the leftmost root node. 

vector<int> inorder(TreeNode* root) {
    vector<int> rst;
    if (root == NULL) return rst;
    stack<TreeNode*> buf;
    TreeNode* cur = root; //current node
    while (!buf.empty() || cur != NULL) {
        while (cur) {
            buf.push(cur);
            cur = cur->left;
        }
        TreeNode* tmp = buf.top();
        buf.pop();
        rst.push_back(tmp->val);
        cur = tmp->right;
    }
    return rst;
}

//the thought of below code is very straight forward.
//in post order traversal, we traverse left, right,root.
//in a real traverse, we have 3 cases, 
//1. we traverse down of the tree, if there is left subtree, we push left into stack, if right subtree, we push right into stack. otherwise, we reach a leaf node,just pop the top of the stack. 
//2. we traverse up the tree from left subtree, that means we finish traversing the left part of the tree, we need to traverse right part and root
//3. we traverse up the tree from right subtee, that means we finish traversing the right part of the tree, we need to pop the root. 

how to mark the 3 different cases? we can use a parent node and a current node
from up tree or down tree, the parent and current node has a different relationships. they have a reverse relationship. 

vector<int> postorder(TreeNode* root) {
    vector<int> rst;
    if (root == NULL) return rst;
    stack<TreeNode*> buf;
    buf.push(root);
    TreeNode* pre = NULL;
    while (!buf.empty()) {
        TreeNode* cur = buf.top();
        //traverse down the tree
        if ( !pre || pre->left == cur || pre->right == cur) {
            if (cur->left) {
                buf.push(cur->left);
            } else if (cur->right) {
                buf.push(cur->right);
            } else {
                rst.push_back(cur->val);
                buf.pop();
            }
        } else if (cur->left == pre) {
            //traverse from left up
            if (cur->right) {
                buf.push(cur->right);
            } else {
                rst.push_back(cur->val);
                buf.pop();
            }
        } else if (cur->right == pre) {
            //traverse from right up
            rst.push_back(cur->val);
            buf.pop();
        }
        pre = cur;
    }
    return rst;
}

//general DFS, usually used to traverse the tree/graph, the result has no specific order, 
//the general DFS is very important, because this can be used not only in tree, graph, or any DFS traversal. used very commonly in iterator traversal
//the process is the most simple one among other 3 dfs process.
// mostly like a BFS, but using a stack, the output of this general traversal, has no order. 


vector<int> generalDFS(TreeNode* root) {
    vector<int> rst;
    if (root == NULL) return rst;
    stack<TreeNode*> buf;
    buf.push(root);
    while (!buf.empty()) {
        TreeNode* tmp = buf.top();
        buf.pop();
        rst.push_back(tmp->val);
        if (tmp->left) buf.push(tmp->left);
        if (tmp->right) buf.push(tmp->right);
    }
    return rst;
}

int main(int argc, const char * argv[]) {
    // insert code here...
    std::cout << "Hello, World!\n";
    TreeNode* root = NULL;
    root = insert(root, 3);
    root = insert(root, 1);
    root = insert(root, 2);
    root = insert(root, 6);
    root = insert(root, 4);
    root = insert(root, 5);
    
    vector<int> rst = preorder(root);
    rst = inorder(root);
    rst = postorder(root);
    rst = generalDFS(root);
    return 0;

}
Tree 题型:
1. Binary tree concept :
   Q1: validate binary search tree
   Q2: symmetric tree / same tree
前面两题算是很直观的tree recursion,都是从up to bottom的遍历顺序,一边遍历一边进行条件判断, 用最基本的preorder的遍历就能解题。
   Q3: balanced binary tree
这个题目要使时间复杂度为O(N), 需要用到另一个tree 常用的 recursion 解法, bottom up的解法, 即从下往上返值, bottom up的解法在binary tree中是很重要的思想,能用于解决很复杂的问题,
   Q4:  complete binary tree
利用按照level order 层次遍历放在一个数组里面, 数组里面都是left, right, 中间不会出现为NULL的情况,所以complete binary tree的判断,需要用到level order 来遍历。
下面这个是写过这么多次isComplete函数以来,写的逻辑最清晰的一次, 先画下有几种可能 invalid 的complete binary tree, 然后可以总结到如下的解题规律:
1. 首先如果在遍历过程中, left subtree 不存在, 但是有right subtree的时候, 此时return false;
2. 需要有一个flag 来标记是否已经发现有Node缺失一个subtree,即在第一次发现缺失的时候将flag 赋值为true
3. 如果flag 已经为true了,那么后面再继续遍历的时候,不能再有node 入queue,如果有,说明不是complete的。
所以, 这个题目逻辑清晰的写法就是以flag 是否为true来进行判断, 如果为true, 说明已经有缺失了,后面再有node过来,直接返回true; 如果flag 为 false,那么就继续去看是否有node 缺失。
以前写总是按照过程走来进行判断,但是这样的话, 有多个变量控制,很容易逻辑混乱,但是如果我们拿出一个变量为分类依据的话,逻辑会清晰很多。

特别是这种 带有flag 判断的题目,逻辑最清晰的做法就是按照flag来进行分类。感觉我比较适应这种做法。

class Solution {
 public:
  bool isCompleted(TreeNode* root) {
    if (root == NULL) return true;
    queue<TreeNode*> explore;
    explore.push(root);
    bool flag = false;
    while (!explore.empty()) {
      TreeNode* tmp = explore.front();
      explore.pop();
      if (tmp->left == NULL && tmp->right != NULL) {
        return false;
      }
      if (flag == true) {
        if (tmp->left == NULL && tmp->right == NULL) {
          continue;
        }
        return false;
      } else {
        if (tmp->left == NULL || tmp->right == NULL) {
          flag = true;
        }
      }
      if (tmp->left) explore.push(tmp->left);
      if (tmp->right) explore.push(tmp->right);
    }
    return true;
  }
};

2. basic operation
   search / insert / delete

kth smallest node in binary search tree.  (inorder traversal )
kth largest node in binary search tree????
逆inorder traversal就可以了。inorder: 左中右, 逆inorder traversal: 右中左

在做 iterative的写法的时候要注意push进stack的顺序。
这两个小问题可以应用在  2sum in binary search tree 的题目中。


3. traversal (pre / in / post order)  + BFS  + 变种遍历 (:对角线遍历 +  竖着遍历)

inorder traversal :
follow up: inorder traversal, implement two function : findFirstNode(), findNextNode() (successor in inorder traversal of binary tree),

带有Parent的TreeNode:

class TreeNode {
public:
    int val;
    TreeNode* left;
    TreeNode* right;
    TreeNode* parent;
    TreeNode(int val){
        this->val = val;
        left = NULL;
        right = NULL;
        parent = NULL;
    }

};

TreeNode* findFirstNode(TreeNode* root) {
    if (root == NULL) {
        return NULL;
    }
    while (root->left) {
        root = root->left;
    }
    return root;
}

TreeNode* findNextNode(TreeNode* cur) {
    if (cur == NULL) {
        return NULL;
    }
    if (cur->right) {
        return findFirstNode(cur->right);
    }
    while (cur->parent != NULL || cur == cur->parent->right) {
        cur = cur->parent;
    }
    return cur->parent;
}
vector<int> InOrderTraversal(TreeNode* root) {
    vector<int> rst;
    if (root == NULL) {
        return rst;
    }
    TreeNode* firstNode = findFirstNode(root);
    while (firstNode != NULL) {
         rst.push_back(firstNode->val);
         firstNode = findNextNode(firstNode);
    }
    return rst;

}

recursion traversal:
对于tree的 recursion traversal:
recursion function:

[return type] recur(TreeNode* root, other arguments....) {

}

TreeNode* recur(TreeNode* root, other arguments...) {
  .............
这里这样递归的意思是, 左子树已经按照要求处理好了,返回给root->left
       root->left = recur(root->left, other arguments....);
     右子树已经按照要求处理好了, 返回给 root->right
       root->right = recur(root->right, other arguments...);
这是递归的基础的思想。
  .................
}

Q1 : Given a BST, retain all node in a range [min, max], and delete all nodes outside of the range, minimize change the tree structure.

TreeNode* retainRange(TreeNode* root, int minv, int maxv) {
    if (root == NULL) return NULL;
    if (root->val <= maxv && root->val >= minv) {
        root->left = retainRange(root->left, minv, maxv);
        root->right = retainRange(root->right, minv, maxv);
    } else if (root->val < minv) {
        //recursively delete  nodes in the right subtree. 
        root = root->right;
        return retainRange(root, minv, maxv);
    } else {
        //recursively delete nodes in the left subtree
        root = root->left;
        return retainRange(root, minv, maxv);
    }
    return root;

}

Q2: Recursively remove all the trailing zero nodes from leaves, binary tree only contain 0s and 1s.

TreeNode* removeTrailingZero(TreeNode* root) {
    if (root == NULL) return NULL;
    root->left = removeTrailingZero(root->left);
    root->right = removeTrailingZero(root->right);
    if (root->left == NULL && root->right == NULL && root->val == 0) {
        root = NULL;
    }
    return root;

}

4.  serialization & deserialization
将tree 线性化,preorder, inorder, postorder, level order, 
根据线性化的tree, 还原tree

binary tree:  preorder/postorder/level order + inorder 序列 ==》 还原tree
binary search tree: preorder / postorder / level order ==> 还原binary search tree

1. 如果可以利用额外空间, 利用hashmap 把inorder的元素和其index的对应关系 存在hashmap中,方便在preorder中O(1)找到root在inorder的index,时间复杂度O(n)
2. 如果不可以利用额外空间, 这种情况下我们需要在inorder中找到root的值,在找到root值之前的所有元素都是属于left subtree, 在找到root值之后的属于right subtree。下面的做法很巧妙, 利用两个index, 在构建树的recursion中同时进行查找。
时间复杂度:O(n^2) worst case: left skew binary tree, 每次都需要找到最后才能找到root的值。查找时间复杂度最坏 O(n).
这个过程其实是最左子树的leaf node开始创建,然后是右子树的Leaf node, 往上return 直到 root, 从bottom up的创建过程。
对于这种算法的思想广泛应用在tree serialization 和 tree deserialization 上面:

eg: construct binary tree from preorder and inorder traversal
class Solution {
private:
    TreeNode* helper(vector<int> &preorder, vector<int> &inorder, int &preIdx, int &inIdx, int inOrderTarget) {
        if(inIdx == inorder.size() || inorder[inIdx] == inOrderTarget) {
            return NULL;
        }
        TreeNode* root = new TreeNode(preorder[preIdx]);
        preIdx++;
        root->left = helper(preorder, inorder, preIdx, inIdx, root->val);
        inIdx++;
        root->right =  helper(preorder, inorder, preIdx, inIdx, inOrderTarget);
        return root;
    }
public:
    TreeNode* buildTree(vector<int>& preorder, vector<int>& inorder) {
        TreeNode* rst = NULL;
        if (preorder.size() == 0 || inorder.size() == 0) return rst;
        int preIdx = 0;
        int inIdx = 0;
        return helper(preorder, inorder, preIdx, inIdx, INT_MAX);
    }
};

eg2: convert binary tree from inorder and postorder traversal:

inorder , postorder 构建树的关系要对应,post order是先左右中,root在右边,index 往左边走,是先convert 右子树, Inorder是左中右, 在inorder中找到postorder中的root,也是需要先convert 右子树的,所以inorder的index也是从右边往左边走。

class Solution {
private:
    TreeNode* helper(vector<int>& inorder, vector<int>& postorder, int &inIdx, int &postIdx, int inorderTarget) {
        if ( inIdx < 0 || inorderTarget == inorder[inIdx]) {
            return NULL;
        }
        TreeNode* root = new TreeNode(postorder[postIdx]);
        postIdx--;
        root->right = helper(inorder, postorder, inIdx, postIdx, root->val);
        inIdx--;
        root->left = helper(inorder, postorder, inIdx,  postIdx, inorderTarget);
        return root;
    }
public:
    TreeNode* buildTree(vector<int>& inorder, vector<int>& postorder) {
        TreeNode* rst = NULL;
        if (inorder.size() == 0 || postorder.size() == 0) return rst;
        int inIdx = inorder.size()-1;
        int postIdx = postorder.size()-1;
        return helper(inorder, postorder, inIdx, postIdx, INT_MAX);
    }
};

eg3: convert binary search tree from preorder traversal

class Solution {
private:
    TreeNode* helper(vector<int> &preorder, int& preIdx, int targetMax) {
        if (preIdx == preorder.size() || preorder[preIdx] >= targetMax) {
            return NULL;
        }
        TreeNode* root = new TreeNode(preorder[preIdx]);
        preIdx++;
        root->left = helper(preorder,preIdx, root->val);
        root->right = helper(preorder, preIdx, targetMax);
        return root;
    }
public:
    TreeNode* buildTree(vector<int>& preorder) {
        TreeNode* rst = NULL;
        if (preorder.size() == 0) return rst;
        int preIdx = 0;
        return helper(preorder,preIdx, INT_MAX);
    }

};

eg4: known pre-order and inorder, return postorder without constructing the binary tree
这个题目其实就是利用 上面的那种不用额外空间的 recursion 来 deserialization binary tree的特征,从最下面那层的left subtree 开始构建,然后是right subtree, 这个其实就是postorder的顺序。

class Solution {
private:
    void helper(vector<int> &preorder, vector<int> &inorder, int &preIdx, int &inIdx, int inOrderTarget, vector<int>& postorder) {
        if(inIdx == inorder.size() || inorder[inIdx] == inOrderTarget) {
            return NULL;
        }
        TreeNode* root = new TreeNode(preorder[preIdx]);
        preIdx++;
        helper(preorder, inorder, preIdx, inIdx, root->val,postorder);
        inIdx++;
        helper(preorder, inorder, preIdx, inIdx, inOrderTarget, postorder);
        postorder.push_back(root->val);
        return root;
    }
public:
    vector<int>  buildTree(vector<int>& preorder, vector<int>& inorder) {
        vector<int> postorder;
        if (preorder.size() == 0 || inorder.size() == 0) return postorder;
        int preIdx = 0;
        int inIdx = 0;
        helper(preorder, inorder, preIdx, inIdx, INT_MAX, postorder);
        return postorder;
    }
};
eg5: convert sorted list to binary search tree
Given a singly linked list where elements are sorted in ascending order, convert it to a height balanced BST.
这里linked list的preorder traversal, head 是 从 left->right的链表的头, 需要将这部分链表转化成Binary search tree。 而在sorted linked list里面, left, mid-1以head 为链表是root的左子树,左子树构造完成时, head此时处于mid的位置,  mid+1, right以head ->next为链表是root的右子树。

时间复杂度: O(N)
class Solution {
private:
    TreeNode* helper(ListNode* &head, int left, int right) {
        if (left > right) {
            return NULL;
        }
        int mid = left + (right - left)/2;
        TreeNode* leftC = helper(head, left, mid-1);
        TreeNode* rootC = new TreeNode(head->val);
        head = head->next;
        TreeNode* rightC = helper(head,mid+1, right);
        root->left = leftC;
        root->right = rightC;
    }
public:
    TreeNode* sortedListToBST(ListNode* head) {
        if (head == NULL) return NULL;
        int len = 0;
        ListNode* cur = head;
        while (cur) {
            cur = cur->next;
            len++;
        }
        return helper(head, 0 ,len-1);
    }

};
eg6: Tree  Serialize 化成Linked list. 
Prev node, head node, cur node
binary tree to double linked list
flattern binary tree

Tree : DFS:  Tree Path 

类型1: from up to down 直上直下的path   -->   inorder dfs, 
特别要注意:root to leaf node, 在tree里面遍历node有三种情况, 
case1: root == NULL
case2: root->left == NULL && root->right == NULL (leaf node)
case3: 一般Node
当直上直下, 涉及leaf node的时候,一定要注意leaf node这个条件要单独进行判断。

Q1: all visible nodes - if the node is the largest one on the path from root to leaf, then the node is visible, find all the nodes in a binary tree.

首先对于这种 tree 中存在元素比较的题目, 要问清面试官 是否存在重复的元素, 即是否有duplicates。

对于这个题目而言,有duplicates的话,那个duplicate 的元素也算是整条path上面的最大值。

void helper(TreeNode* root, vector<TreeNode*> &rst, int maxv) {
    if (root == NULL) return;
    if (root->val >= maxv) {
        rst.push_back(root);
        maxv = root->val;
    }
    helper(root->left, rst, maxv);
    helper(root->right, rst, maxv);
}

vector<TreeNode*> visibleNode(TreeNode* root) {
    vector<TreeNode*> rst;
    if (root == NULL) return rst;
    helper(root, rst, INT_MIN);
    return rst;

}

Q2:Given a binary tree, return all root-to-leaf paths.
For example, given the following binary tree:
   1
 /   \
2     3
 \
  5
All root-to-leaf paths are:
["1->2->5", "1->3"]
这种从 root -> leaf的path, 要考虑分两种情况, case1: root == NULL, case2 : leaf node, root->left == NULL && root->right == NULL.
因为在preorder的时候,leaf node的left subtree 和right subtree都去检测了,会有两条重复的路径, 所以我们在做的时候, 对于leaf node只考虑一次就好, 所以Leaf node要特别对待。 
上面的思想,在很多种情况下都要用到,特别是从 root->leaf node path 的问题。 

class Solution {
private:
    void helper(vector<string> &rst, TreeNode* root, string buf) {
        if (root == NULL) return;
        if (root->left == NULL && root->right == NULL) {
            buf+= to_string(root->val);
            rst.push_back(buf);
            return;
        }
        string tmp = buf;
        buf += to_string(root->val);
        buf += "->";
        helper(rst, root->left,buf);
        helper(rst, root->right,buf);
        buf = tmp;
    }
public:
    vector<string> binaryTreePaths(TreeNode* root) {
        vector<string> rst;
        if (root == NULL) return rst;
        helper(rst, root,"");
        return rst;
    }
};

Q3: Path sum I
计算从 root->leaf的path sum 是够与一个target 相等, 如果相等,返回true
Q4: Path sum II
计算从root->leaf的path sum 与 target相等的 所有可能路径。
这两个题目其实是一样的,pass一个path value计算路径值,分三种node的情况进行讨论即可。
class Solution {
private:
    void helper(TreeNode* root, int sum, int path, vector<vector<int>> &rst, vector<int> solu) {
        if (root == NULL) {
            return;
        }
        if (root->left == NULL && root->right == NULL && sum == path + root->val) {
            solu.push_back(root->val);
            rst.push_back(solu);
            return;
        }
        solu.push_back(root->val);
        helper(root->left, sum, path+root->val, rst, solu);
        helper(root->right, sum, path+root->val, rst, solu);
        solu.pop_back();
    }
public:
    vector<vector<int>> pathSum(TreeNode* root, int sum) {
        vector<vector<int>> rst;
        vector<int> solu;
        helper(root, sum, 0, rst, solu);
        return rst;
    }
};
Q5: Sum root to lead node
For example,
    1
   / \
  2   3
The root-to-leaf path 1->2 represents the number 12.
The root-to-leaf path 1->3 represents the number 13.
Return the sum = 12 + 13 = 25.
这个题目在leaf node的时候,要获得整条path的值,所以需要传入一个计算这条path值的参数val 在recursion 函数中。 
class Solution {
private:
    void helper(TreeNode* root, int &sum, int val) {
        if (root == NULL) {
            return;
        }
        if (root->left == NULL && root->right == NULL) {
            val += root->val;
            sum += val;
            return;
        }
        val += root->val;
        helper(root->left, sum, val*10);
        helper(root->right, sum, val*10);
    }
public:
    int sumNumbers(TreeNode* root) {
        if (root == NULL) return 0;
        int sum = 0;
        helper(root, sum, 0);
        return sum;
    }
};
类型2: 人字形路径 、 subpath from root to leaf node
Bottom up recursion
bottom up recursion 方法在解决tree的题目中非常常用, 但是要注意一下几点:
1. bottom up recursion 感觉这种recursion用起来十分巧妙,因为这个recursion 函数可以携带多个信息, return type 可以携带对整个过程有用的信息,函数参数里面也可以携带在bottom up recursion过程中的一些需要的信息。
2. 需要弄清楚整个recursion 函数的return type是什么含义, recursion的参数是什么含义
3. 一般bottom up recursion的思考过程: 
  • what current node want from its children
  • what current node do in current level
  • what current node return to its parent. 

Path related:
Q1: maximum path sum from one leaf node to another leaf node
因为是leaf node, 所以必须对root ->left, root->right来分情况进行讨论。即后面分三种情况return,如果部分情况return 结果是错误的,因为对于只有一个child的node, return 的是child + root,由于NULL的时候是0, 当child 为负数的时候,会把NULL那边的数返回,而这个NULL并不是leaf node, 导致结果错误。 
class Solution {
 private:
 //return the max sum from one way of path
  int helper(TreeNode* root, int &gmax) {
    if (root == NULL) return 0;
    //current node wants the max path from its left child and right child
    int left = helper(root->left, gmax);
    int right = helper(root->right, gmax);
    //current node do in current level
    if (root->left != NULL && root->right != NULL) {
      gmax = max(gmax, left + right + root->value);
    }
    //return what to its parent, the max sum from left or right
    if (root->left == NULL) {
      return right + root->value;
    }
    if (root->right == NULL) {
      return left + root->value;
    }
    return max(left, right) + root->value;
  }
 public:
  int maxPathSum(TreeNode* root) {
    if (root == NULL) return INT_MIN;
    int gmax = INT_MIN;
    helper(root, gmax);
    return gmax;
  }
};
Q2:maximum path sum from any node to any node
这个题目和上面的题目相比,不同点在于不需要是Leaf node to leaf node, 所以对于有一边NULL的情况也是可以直接return的, 对于任何一个node, 以这个Node为path的最大值可能是:node->value, left +right+ node->value, max(left, right) + node->value;在current level的时候进行判断就好了。  往parent return value的时候直接return max(max(left, right) + node->value, node->value), 就是current node 能返给root的最大可能sum了。 
class Solution {
 private:
 //return max sum from one way path
  int helper(TreeNode* root, int &gmax) {
    if (root == NULL) return 0;
    //get max path sum from left and right child
    int left = helper(root->left, gmax);
    int right = helper(root->right, gmax);
    //what current node do in current level
    gmax = max(gmax, max(root->value,max(max(left, right) + root->value, left+right+root->value)));
    //return max sum to parent
    return max(max(left, right) + root->value, root->value);
  }
 public:
  int maxPathSum(TreeNode* root) {
    int gmax = INT_MIN;
    helper(root, gmax);
    return gmax;
  }
};
Q3:maximum subpath sum, the path should be subpath from root to leaf node
subpath 其实也是属于直上直下的path里面的一种,但是因为其start node不一定是root, 所以也是可以通过bottom up recursion来完成的。但是这个跟上面那个题目的区别在于往上传值只能从一边传,不存在left+right+root->val的情况了。
class Solution {
 private:
 //return max path sum from one way path
  int helper(TreeNode* root, int &gmax) {
    if (root == NULL) return 0;
    //current node want the max path sum from its children
    int left = helper(root->left, gmax);
    int right = helper(root->right, gmax);
    //do what in current level
    gmax = max(gmax, max(root->value, max(left, right) + root->value));
    //return what to its parent
    return  max(root->value, max(left, right) + root->value);
  }
 public:
  int maxPathSum(TreeNode* root) {
    int gmax = INT_MIN;
    helper(root, gmax);
    return gmax;
  }
};
Q4: subpath sum equal to target 
这个题目是上面那个题目的扩展,需要找到所有的subpath sum 看是否跟target 相等。
所以我们需要通过递归求出所有可能subpath的sum, 同时跟target比较,如果找到就返回。
由于是subpath, 所以往上返值的时候只有三种可能, from left child , from right child, from root node itself.
class Solution {
 private:
 //all possible subpath sum from one way path
   vector<int> helper(TreeNode* root, int target, bool &flag) {
     vector<int> rst;
     if (root == NULL) return rst;
     //what current node want from its children
     vector<int> left = helper(root->left, target, flag);
     vector<int> right = helper(root->right, target, flag);
     //what current node do in current level
     //return what to its parent
     if (root->value == target) {
       flag = true;
     }
     rst.push_back(root->value);
     for (int i = 0; i < left.size(); i++) {
       if (left[i] + root->value == target) {
         flag =  true;
       }
       rst.push_back(left[i] + root->value);
     }
     for (int j = 0; j < right.size(); j++) {
       if (right[j] + root->value == target) {
         flag = true;
       }
       rst.push_back(right[j] + root->value);
     }
     return rst;
   }
 public:
  bool exist(TreeNode* root, int target) {
    bool flag = false;
    helper(root, target, flag);
    return flag;
  }
};
Lowest Common ancestor: 各种变种
Q1:  two nodes are in the tree
其实是在tree里面找 target node.
class Solution {
 public:
//return the lowest common ancestor of root. 
  TreeNode* solve(TreeNode* root, TreeNode* one, TreeNode* two) {
    if (root == NULL) return NULL;
    if (root == one || root == two) return root;
    TreeNode* left = solve(root->left, one, two);
    TreeNode* right = solve(root->right, one, two);
    if (left != NULL && right != NULL) {
      return root;
    }
    return (left) ? left : right;
  }
};
Q2:  nodes may not in the tree
class Solution {
 private:
  TreeNode* helper(TreeNode* root, TreeNode* one, TreeNode* two, bool &flag1, bool &flag2) {
    if (root == NULL) return NULL;
    TreeNode* left = helper(root->left, one, two, flag1, flag2);
    TreeNode* right = helper(root->right, one, two, flag1, flag2);
    if (root == one) {
      flag1 = true;
      return root;
    }
    if (root == two) {
      flag2 = true;
      return root;
    }
    if (left != NULL && right != NULL) {
      return root;
    }
    return left ? left : right;
  }
 public:
  TreeNode* solve(TreeNode* root, TreeNode* one, TreeNode* two) {
    bool flag1 = false;
    bool flag2 = false;
    TreeNode* rst = helper(root, one, two, flag1, flag2);
    return (flag1 && flag2) ? rst : NULL;
  }
};

counter methods can also be applied here.

Q3:  k nodes lowest common ancestor
2 --> k: use counter to count how many nodes are appear in the tree. 
assumption1: nodes are all in the tree
class Solution {
 public:
  TreeNode* solve(TreeNode* root, vector<TreeNode*> nodes) {
    if (root == NULL || nodes.size() == 0) return root;
    if (find(nodes.begin(), nodes.end(), root) != nodes.end()) {
      return root;
    }
    TreeNode* left = solve(root->left, nodes);
    TreeNode* right = solve(root->right, nodes);
    if (left != NULL && right != NULL) {
      return root;
    }
    return left ? left : right;
  }
};
assumption2: nodes are not all in the tree
class Solution {
 private:
  int helper(TreeNode* root, TreeNode* &rst, vector<TreeNode*> nodes) {
    if (root == NULL) return 0;
    int numLeft = helper(root->left, rst, nodes);
    int numRight = helper(root->right, rst, nodes);
    int num = numLeft + numRight;
    if (find(nodes.begin(), nodes.end(), root) != nodes.end()) {
      num += 1;
    }
    if (num == nodes.size() && rst == NULL) {
      rst = root;
    }
    return num;
  }
 public:
  TreeNode* solve(TreeNode* root, vector<TreeNode*> nodes) {
    if (root == NULL || nodes.size() == 0) return root;
    TreeNode* rst = NULL;
    helper(root, rst, nodes);
    return rst;
  }
};



DFS: preorder / inorder / postorder的应用

Q1: Kth smallest number in binary search tree

在构建tree的时候,需要同时记录numLeftElement, 和 numRightElement. 
如果需要count, 在node里面添加新的field来记录left subtree有多少Nodes, right subtree有多少nodes。 这样的话特别方便 frequent query  。

class TreeNode2 {
public:
    int val;
    TreeNode2* left;
    TreeNode2* right;
    int numLeftElements;
    int numRightElements;
};

int kthSmallest(TreeNode2* root, int k) {
    if (root == NULL) {
        return -1;
    }
    if (root->numLeftElements == k-1) {
        return root->val;
    }
    if (root->numLeftElements < k-1) {
        return kthSmallest(root->right, k - root->numLeftElements-1);
    }
    return kthSmallest(root->left, k);
}

Q2: 利用DFS 来实现level order traversal

DFS preorder 遍历也是从root 到 leaf一层层下来,利用vector来实现一个map的功能, 遍历到相应层的时候把相应的元素push 进该层对应的vector. 

class Solution {
private:
    void preOrder(vector<vector<int>> &rst, TreeNode* root, int level) {
        if (root == NULL) return;
        if (rst.size() == level) {
            vector<int> tmp;
            rst.push_back(tmp);
        }
        rst[level].push_back(root->val);
        preOrder(rst, root->left, level+1);
        preOrder(rst, root->right, level+1);
    }
public:
    vector<vector<int>> levelOrder(TreeNode* root) {
        vector<vector<int>> rst;
        preOrder(rst, root, 0);
        return rst;
    }
};


 树 traverse 的变形 (new topic in tree problems)

1. 2SUM in Binary search Tree
M1: in-place: double-linked list
class TSUM_BST {
private:
    void helper(TreeNode* &head, TreeNode* &prev, TreeNode* cur) {
        if ( cur == NULL) {
            return;
        }
        helper(head, prev, cur->left);
        if (prev == NULL && head == NULL) {
            head = cur;
        } else {
            prev-> right= cur;
            cur->left = prev;
        }
        prev = cur;
        helper(head, prev, cur->right);

    }
public:
    bool tsum_BST(TreeNode* root, int target) {
        if (root == NULL) return false;
        TreeNode* head = NULL;
        TreeNode* prev = NULL;
        TreeNode* tail = NULL;
        helper(head, prev, root);
        tail = prev;
        while (head !=tail) {
            if (head->val + tail->val < target) {
                head = head->right;
            } else if (head->val + tail->val > target) {
                tail = tail->left;
            } else {
                return true;
            }
        }
        return false;
    }

};
M2: 2 stacks to traverse the binary tree
class TSUM_BST {
public:
    bool tsum_BST(TreeNode* root, int target) {
        if (root == NULL) return false;
        stack<TreeNode*> left_half;
        stack<TreeNode*> right_half;
        TreeNode* cur1 = root;
        TreeNode* cur2 = root;
        int left = 0;
        int right = 0;
        bool l_flag = true;
        bool r_flag = true;
        while (1) {
            while (l_flag == true && (cur1 != NULL || !left_half.empty())) {
                while (cur1) {
                    left_half.push(cur1);
                    cur1 = cur1->left;
                }
                TreeNode* tmp = left_half.top();
                left = tmp->val;
                l_flag = false;
                left_half.pop();
                cur1 = tmp->right;
            }
            while (r_flag == true && (cur2 != NULL || !right_half.empty())) {
                while (cur2) {
                    right_half.push(cur2);
                    cur2 = cur2->right;
                }
                TreeNode*tmp = right_half.top();
                right = tmp->val;
                r_flag = false;
                right_half.pop();
                cur2 = tmp->left;
            }
            if (left + right < target) {
                l_flag = true;
            } else if (left + right > target) {
                r_flag = true;
            } else if (left >= right) { //left value > right value, no such pair, terminate in advance.
                return false;
            } else {
                return true;
            }
        }
    }

};
2. right view of a binary tree / top view of a binary tree / bottom view of a binary tree/ left view of a binary tree
right view of a binary tree:  level order 每一层最后那个数,单层遍历
left view of a binary tree: level order 每一层的第一个数, 单层遍历
top view of a binary tree: 每个node距离root的水平距离,如果必须从左到右,可以利用map,key: hd, value: node, Preorder traversal,  value只存第一个在map里面的数。如果可以是any order,其实map可以直接换成 unordered_set。O(N)
bottom view of a binary tree: 与top view of binary tree类似,但是是取同一个hd的最后那个元素。

3.  diagonal printing a binary tree:
tree的哪种打印,无外乎就是preorder, inorder, postorder, level order这几种general 遍历的应用,应用这些最基本的树的遍历来进行其他的遍历。go right level 不变, go left level + 1, 直接用vector<vector<int>>来存储就可以了,不需要用unordered_map,因为这里level的值是从0开始+1的。vector<vector<int>>可以直接存储结果。
diagonal sum of a  binary tree.
这个问题和上面的diagonal printing binary tree是一样的, 每一层的值求sum就可以了。
4. two nodes are cousin in binary tree
two nodes are in the same level but with different parents
step1: check if two nodes are in the same level
step2: get the LCA and check if the two nodes are left child/right child of their LCA

5.













bottom up recursion ,  max (left height, right height)


Binary Tree 的基本解法:
通常利用 recursion 的解法:
1. up to bottom traversal: preorder, 经常用于解一些很直观的从上往下遍历的问题,或者是从root to leaf node路径问题。
2. bottom to up traversal: postorder,  经常用于解决一些较复杂的tree的问题,它的特点在于 recursion function 在往上返回值的同时还能进行其他的条件判断,比up to bottom traversal 更加灵活。
也有 BFS level order 才能解决的问题:check if complete tree, bipartite,
3.