SET hive.groupby.skewindata = true;
可以使执行逻辑自动优化避免data skew. 但如果有两个及以上则上述参数不会起作用。
具体case:
如下sql每天会执行11h,切换成优化后的
select count(1) from (select * from ... group by ...) tmp
之后,只需要28min即可。
--before optimization
select
COUNT(DISTINCT (CASE WHEN field_a = 1 THEN field_b ELSE NULL END)) / COUNT(DISTINCT field_b)
from
table_name
where
label_spam_user = 0
--after optimization
with tmp1 as(
select
field_b
from table_name
where field_a = 1
group by field_b
)
, tmp2 as(
select
count(1) as cnt_1
from tmp1
)
, tmp3 as(
select
field_b
from table_name
group by field_b
)
, tmp4 as(
select
count(1) as cnt_2
from tmp3
)
select
cnt_1/cnt_2
from tmp2
join tmp4 on 1=1;
I would like to thank you for your nicely written post, its informative and your writing style encouraged me to read it till end. Thanks
ReplyDeleteangularjs Training in chennai
angularjs Training in chennai
angularjs-Training in tambaram
angularjs-Training in sholinganallur
angularjs-Training in velachery
I would really like to read some personal experiences like the way, you've explained through the above article. I'm glad for your achievements and would probably like to see much more in the near future. Thanks for share.
ReplyDeletePython training in marathahalli | Python training institute in pune
It would have been the happiest moment for you,I mean if we have been waiting for something to happen and when it happens we forgot all hardwork and wait for getting that happened.
ReplyDeleteJava training in Marathahalli | Java training in Btm layout
Java training in Marathahalli | Java training in Btm layout
Thank you so much for ding the impressive job here, everyone will surely like your post. Order Form
ReplyDeleteYou’ve written a really great article here. Your writing style makes this material easy to understand.. I agree with some of the many points you have made. Thank you for this is real thought-provoking content
ReplyDeleteMicrosoft Azure online training
Selenium online training
Java online training
Python online training
uipath online training
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI perceived a lots of Knowledge from this Resource. This is a Good way to Circulate the Educated Things...Hope people will also Like the Below Information's also...
ReplyDeleteLooking For BEST JAVA TRAINING IN CHENNAI WITH PLACEMENT Visit Below...
Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery
Informative blog post. Thanks for this wonderful Post.
ReplyDeleteSAP Training in Chennai
AWS Training in Chennai
Hardware and Networking Training in Chennai
QTP Training in Chennai
CCNA Training in Chennai
Great Blog. Thnaks.
ReplyDeleteSAP Training in Chennai
Java Training in Chennai
Software Testing Training in Chennai
.Net Training in Chennai
Hardware and Networking Training in Chennai
AWS Training in Chennai
Azure Training in Chennai
Selenium Training in Chennai
QTP Training in Chennai
Android Training in Chennai
Mindblowing blog very useful thanks
ReplyDeleteAWS Training in Velachery
AWS Training in Chennai
I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
ReplyDeleteMLSU BA 1st Year Exam Result
MLSU BA 2nd Year Exam Result
MLSU BA 3rd Year Exam Result
ReplyDeleteGreat post. keep sharing such a worthy information.
Python Institute In Chennai
Superb Information, I really appreciated with it, This is fine to read and valuable pro potential, I really bookmark it, pro broaden read. Appreciation pro sharing. I like it.
ReplyDeleteBest React-js Training Institute in Hyderabad
Great insights on optimizing the COUNT DISTINCT function in Hive SQL when dealing with data skew. This is a common issue in big data processing, and your solution can greatly improve query performance. It's fascinating how optimizing data operations is critical, much like optimizing digital marketing strategies for better ROI. If anyone is looking to enhance their skills, consider exploring Digiperform's link text for practical insights into data-driven marketing techniques. Thanks for sharing these valuable tips!
ReplyDelete