At Cloudera, we believe that data can make what is impossible today, possible tomorrow. We empower people to transform complex data into clear and actionable insights. Cloudera delivers an enterprise data cloud for any data, anywhere, from the Edge to AI. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises. Learn more at Cloudera.com.
Â
Please join us for a very unique Tech Meetup hosted by Cloudera The enterprise data cloud company, on July 24th at the Novotel Bengaluru Outer Ring Road at 1:30 PM.
Â
This Meetup will be unveiling and demo-ing Cloudera's flagship product, the Cloudera Data Platform/CDP (to be released in early Fall 2019), in addition to talks about YuniKorn (Next Generation Scheduler for Apache YARN & Kubernetes) and Ozone (Scaling HDFS to trillions of objects).
Â
Our Distinguished panel consists of Cloudera Apache PMC members, Committers, and our India Site Leader.
If you are a technology professional, or just someone who is interested in learning more about Cloudera's new products, we encourage you to register and attend this event.
Â
Cloudera Data Platform (CDP)Â is a new offering from Cloudera that will enable enterprise customers to consume and manage data in the cloud and on-prem environments through a consolidated suite of applications including Data Engineering, Analytics and Machine Learning. Whether the customer would like to consume their data completely on-prem, or burst some workloads to the cloud for a short term capacity expansion, or migrate from on-prem to cloud or one cloud to another for purposes of cost efficiency or corporate policy, CDP promises to partner with them through their data journey.
Â
A cloud-first offering followed by an on-prem release, CDP provides a micro-service based control plane using which customers can manage their hybrid compute and data environments, and most importantly provide a security and governance framework for consistently managing these. CDP realizes synergy between the Hortonworks Data Platform (HDP) and the Cloudera Distribution of Hadoop (CDH), and further enhances the usability of the platform through a fresh product experience and ruthless automation of infrastructure setup.
Â
Resource Scheduler of a container orchestration system, such as YARN and Kubernetes, is a critical component that users rely on to plan resources and manage applications. YARN has two power schedulers (Fair and Capacity scheduler) and both serve many strong use cases in big data ecosystem. K8s default scheduler is an industry-proven solution to efficiently manage long-running services.
Fragmented resource scheduling is a main concern to have seamless Big Data user experience across any of the container orchestrators. At this point, there is no solution that exists to address the needs of having a unified resource scheduling experiences across platforms. That makes it extremely difficult to manage workloads running on different environments, from on-premise to cloud.
YuniKorn is a unified scheduler powered from YARN and K8s’s legacy capabilities and improving towards cloud use cases. YuniKorn will be a common scheduler for both YARN and Kubernetes.
Ozone is an object store for Hadoop. Ozone solves the small file problem of HDFS, which allows users to store trillions of files in Ozone and access them as if there are on HDFS. Ozone plugs into existing Hadoop deployments seamlessly, and programs like Hive, LLAP, and Spark work without any modifications. This talk looks at the architecture, reliability, and performance of Ozone.
In this talk, we will also explore Hadoop distributed storage layer, a block storage layer that makes this scaling possible, and how we plan to use the Hadoop distributed storage layer for scaling HDFS.
Data Warehouse Experience is a new Offering from Cloudera that will allow enterprise customers to run their Warehouse on Cloud Infrastructure. This offering will leverage Apache Hive at its core, to build a strong foundation for various Warehouse use-cases, along with the flexibility of a Cloud Service by allowing seamless autoscaling up/down of clusters in the cloud.
The service will contain tools to allow customers to identify what workloads are running in their clusters, how to debug problems with workloads, reporting on how well their data model is for their workloads etc.
Chid Kollengode is serving as the VP Engineering and Country Head of Cloudera from January 2019. Chid is a 25+ year engineering and management professional who assembled the big data team at Nokia and centralized all of the company’s worldwide data.
Previously, he led the open source Hadoop MapReduce team at Yahoo! in building the scalable platform driving Yahoo! Search and user data analytics. As Senior Manager/Architect at Amazon A9 team, he built the company’s first non-Oracle system, one of the early big data systems, to store web search and advertisement data for rigorous analysis.
Hemanth is currently leading the effort from Cloudera Bangalore to build the set of new generation capabilities called DataPlane Services which is a platform for building hybrid multi-cluster data and infra management applications and also leads the team that builds Hortonworks Data Steward Studio - an application that attempts to solve data governance and security problems for large organisations using the Hadoop stack.
His primary area of interest is in building large scale distributed systems and has experience both in building frameworks, and applications that use frameworks. He was an early contributor, committer and project lead of Hadoop MapReduce, Hadoop on Demand - the earliest provisioning system for Hadoop on a shared cluster, and the first version of the Capacity Scheduler - which continues to be one of the main schedulers in Hadoop today.Â
Vinod Kumar Vavilapalli has been contributing to Apache Hadoop project full-time since mid-2007. At Apache Software Foundation, he is V.P. of Apache Hadoop, a long-term Hadoop contributor, committer, member of the Project Management Committee, and a ASF member. He is Director of Engineering at Cloudera and runs the Compute platform teams there. Before Hortonworks, he was at Yahoo!, working in the Grid team that made Hadoop what it is today, running at large scale - upto tens of thousands of nodes.
Vinod loves reading books of all kinds and is passionate about using computers to change the world for better, bit by bit. He has a bachelor’s degree in computer science and engineering from the Indian Institute of Technology Roorkee.
Sunil Govindan is Engineering Manager at Cloudera leading Compute Platform team from Bengaluru, India. He is contributing to Apache Hadoop project since 2013 in various roles as Hadoop Contributor, Hadoop Committer and member Project Management Committee (PMC). He is majorly contributing in YARN Scheduling improvements such as Intra-Queue Resource preemption, Multiple Resource types support in YARN with Resource Profiles, Absolute Resource configuration support in Queues etc.
Mukul is currently associated with Cloudera as an Engineering Manager, where he is leading the HDFS team.  He has also been working on Storage Systems and File systems for 9 years and has played various roles as open source contributer PMC member, researcher and Software developer.
He also has worked with Nimble Storage and NetApp and worked on WAFL and CASL filesystems respectively. He graduated from Carnegie Mellon University, where his thesis was on a file system for Shingled Magnetic recording disks.
Anishek has overall 15+ years of experience in software industry and is at present playing the role of Engineering Manager based out of our Bangalore office. He is looking at various teams at Cloudera including, Replication work for Apache Hive, Data Analytics Studio, Hive Warehouse Connector and DWX UI. He has been working in the Big data space for about 8 years, with experience in building entire data platforms. He is also an Apache Hive Committer.