Wednesday, July 11, 2018

MR/TEZ作业数据倾斜导致OOM问题排查思路

对于如下sql,可能会出现最后几个reducer task失败,日志显示OutOfMemory Exception.
select *
from a
join b
on a.uid = b.uid


这种情况一般是由于join的key存在严重倾斜导致的,所以需要分别看下在a表和b表里,uid的分布情况:
select uid, count(1) as cnt
from a
group by uid
order by cnt desc
limit 1000
实际情况可能是如下图所示。


如果在sql中通过where filter去掉这些uid,则任务成功。对于这些倾斜的value,可以分开单独处理(通过增加reducer内存等方式)。

19 comments:

  1. Hmm, it seems like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog. I as well as an aspiring blog writer, but I’m still new to the whole thing. Do you have any recommendations for newbie blog writers? I’d appreciate it.

    AWS Interview Questions And Answers

    AWS Training in Pune | Best Amazon Web Services Training in Pune

    Amazon Web Services Training in Pune | Best AWS Training in Pune

    AWS Online Training | Online AWS Certification Course - Gangboard

    ReplyDelete
  2. Thanks for your great and helpful presentation I like your good service.I always appreciate your post.That is very interesting I love reading and I am always searching for informative information like this.Well written article Thank You for Sharing with Us project management courses in chennai | pmp training class in chennai | pmp training fee | project management training certification | project management training in chennai | project management certification online |

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. I got what i am seraching from last few days in your Blog. I hope you will share more info about it. Please keep sharing.
    Laptop Service center in Ameerpet
    Dell Service center in Ameerpet
    HP Service center in Ameerpet
    Lenovo Service center in Ameerpet

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. I always spent my half an hour to read this webpage’s content every day along with a cup of coffee.

    BA 1st year result

    ReplyDelete
  14. This is a great post. I like this topic.This site has lots of advantage.I found many interesting things from this site. It helps me in many ways.Thanks for posting.
    ibm full form in india |
    ssb ka full form |
    what is the full form of dp |
    full form of brics |
    gnm nursing full form |
    full form of bce |
    full form of php |
    bhim full form |
    nota full form in india |
    apec full form

    ReplyDelete
  15. This is an awesome post. Really very informative and creative contents. Visit my website to get best Information About Best IAS Coaching Institute in Thane.
    Best IAS Coaching Institute in Thane
    Top IAS Coaching Institute in Thane

    ReplyDelete
  16. Great insights on handling data skew in Tez! Data skew is a common challenge that can significantly impact performance, especially in big data environments. As you've pointed out, optimizing data distribution is crucial. In a similar vein, understanding data distribution and analytics is vital in digital marketing for targeting the right audience. If anyone's interested in mastering these skills, our Best Digital Marketing Course In Noida By Digiperform offers comprehensive training that covers data analysis techniques alongside marketing strategies.

    ReplyDelete