Why Attend
Big data is a change agent that challenges the ways in which organizational leaders have traditionally made decisions. This course provides participants with the confidence to articulate big data architectures to support analytics driven solutions within their organizations. The course also provides hands on experience with key big data technologies used to deploy data intensive applications. Participants will gain the knowledge and skills they need to assemble and manage a large-scale big data analytics project. Lastly, participants will receive a conceptual introduction to the data structures that support machine learning algorithms and artificial intelligence use cases.
Participants will work to identify areas within their organization that can be improved through big data-driven implementations, and the types of improvements that can be made through analytical processes. Participants will be led through a series of hands-on exercises and workshops, where they will have the opportunity to apply the test methods and practical approaches that they learn throughout the course. At the end of the course, participants will produce an actionable big data plan and architectural diagram to be used as a blueprint proposal within their own organizations.
Course Methodology
This course will be highly interactive with group discussions, case studies, hands-on practical exercises, and group activities being the core focus.
Course Objectives
By the end of the course, participants will be able to:
- Design big data implementation plans and create strategies for data driven solutions
- Explain the challenges of big data and traditional technologies like Excel
- Discuss the main challenges and advantages of Hadoop ecosystem and other big data distributed architectures
- Demonstrate and discuss key technologies for big data storage and compute, such as PostgreSQL and MongoDB
- Discuss popular machine learning algorithms and the importance of ethics in data analytics and artificial intelligence
- Deliver an architectural diagram for analytics focused use cases
Target Audience
This course is ideal for data professionals, such as database administrators, system administrators, business analysts or business intelligence specialists. It is also ideal for less technically-inclined management and administrative professionals seeking to understand big data strategies and technologies. Recommended pre-knowledge includes experience analyzing data in Excel, knowledge of basic database technologies, and awareness of analytics driven business initiatives.
Target Competencies
- Big data implementation planning
- Big data analytics structures and technologies
- Ethics and integrity for big data analytics
- Big data storage and computer system implementation
- Architecture diagram design
Course Outline
- Storing Big Data
- What is big data?
- 5 “V’s” of big data
- How big data relates to data analytics
- Big data impact on technologies
- Open source revolution
- Key big data concepts and data types
- Text, audio, images
- Big data professional roles
- Big data architectures and paradigms
- The Hadoop Ecosystem
- Overview of Hadoop
- Hadoop Distributed File System (HDFS)
- Massively parallel processing (MPP) versus distributed in-memory applications
- RDBMSs vs NoSQL DBs
- PostgreSQL, MongoDB, Cassandra
- Streaming data
- The Hadoop Ecosystem
- Data-warehousing vs Data Mart
- Lambda Architecture vs Kappa Architecture
- What is big data?
- Computing Big Data
- How to access big data
- Role of cloud computing
- Data movement risk
- Networking and co-location
- Big data extract, transform, load (ETL)
- Big data compute technologies
- Hadoop continued
- MapReduce and beyond
- Distributed compute
- High performance clusters
- Spark
- Streaming: Storm, Spark structured streaming
- Other big data technologies: Kafka, etc.
- Hadoop continued
- Cloud applications for big data
- How to access big data
- Introducing Big Data Analytics and Artificial Intelligence (AI)
- Basics of data analytics
- Roles and objectives
- Key math and statistics concepts
- Supervised vs Unsupervised
- Key technologies and applications
- Analytics architecture
- Cloud vs On-premise
- Data storage
- Analytics Tools
- Databricks
- SAS Viya
- Cloud ML & AI solutions
- Introduction to Artificial Intelligence
- Linear Algebra 101
- Image classification
- Importance of Ethics
- Basics of data analytics
- Planning A Big Data Project For Analytics
- How big data projects meet organizational needs
- Big data case studies:
- Netflix
- Orbitz
- Dell
- And others
- Best practices in project design
- Assessing the current state of your organization
- Vertical data teams and discussions
- Considerations for big data project plans
- Brainstorm a data-driven strategy
- Practice designing architecture diagrams
- Architecting Big Data Solutions
- Identifying analytical opportunities
- Define and assess the problem
- Describe the impact and use of data to address the problem
- Identify potential data sources
- Brainstorm an analytics strategy to implement
- Storage and compute
- Identify a cloud environment strategy
- Brainstorm key storage systems and compute environments
- Identifying analytical opportunities