My studies of Java EE and preparation for Oracle Certified Java Expert 1Z0-897 exam

Posted on December 30, 2015 by admin

The third level of Java certification is becoming an Oracle Certified Java Expert. Unlike for the previous 2 levels, the learners have a choice of passing either “Enterprise JavaBeans Developer“, or “Web Component Developer“, or “Web Services Developer” exam. I chose to concentrate on the Web Services exam as the one covering most in-demand material, which also aligns well with my job at Deutsche Bank. I was spending about 1hr every day at work to work through the tutorials and code samples + I’m exposed to Web Services within my work projects. It still took about 1 year to prepare, as the exam is quite hard.

Here is what I did to prepare:
Step 1. Made a study plan based on my preparations for the professional certification. I wanted to read an official Oracle tutorial, a book on Web Services, some online study guides, look through and run some sample code, also check all use cases of Web Services within my applications at work.

Step 2. Read Java EE 7 Oracle tutorial. Only about 25% of topics are relevant for the exam, but I wanted to get a feel for all enterprise technology, which comes out-of-the box with Java. Some constructs, e.g. listeners and interceptors, are widely used in software development. Some non-trivial concepts of Enterprise Java Beans (EJB) do find their way into Web Services exam.

Step 3. Bought and worked through “Java Web Services: Up and Running” book by Martin Kalin. The book describes the development of Web Services from XML RPC to state transfer and method invocations on proxies. A variety of examples provide hands-on experience with Web Servers (Tomcat), build tools (Ant), generation and usage of WSDL files & XSD schemas. I was able to transition most code from GlassFish to Tomcat and successfully run it via IntelliJ IDEA.

Step 4. Read through Oracle Certified Expert in Web Services guide by Mikalai Zaikin. This one is specifically aimed at the exam. It has a good coverage of Jersey client and asynchronous JAX-WS method invocations.

Step 5. Skimmed through SCDJWS 5 exam study guide by Ivan A. Kriszan, which can be found here. The guide is for the previous version of the exam, but it still very informative. It covers in detail XML and WSDL as languages and schemas + Web Services definitions as special instances of objects written in those languages. This guide was the first to describe business registry (UDDI) as well as XML parsing in great detail. It has the whole chapters dedicated to security, interoperability, and design principles.

Step 6. Read through the specifications of JAX-RS 1.19 and JAX-WS 2.2 used on the exam. The specifications themselves are very dry and can only be used to clarify unclear concepts, e.g. how precisely a RESTful server chooses, how to parse the URI and which method to invoke with which input and output content types. JAX-WS specification does a great job elaborating on precise rules for SOAP/Logical handler invocations in case of chain interruptions/exceptions.

Step 7. Read through the APIs and implementations of relevant classes and packages from URLConnection to WebServiceRef: Jersey Core, Jersey Server, Jersey Client, JAX-WS.

Step 8. In parallel to steps 6 and 7 did Enthuware practice exams. I was lucky to be able to take 5 exams on consecutive Saturdays and work through the answers. The exams served to create some mnemonic rules on, e.g. which SEI and SIB methods are exposed; what the names are of Service, Port, PortType, Operation etc in relation to names of classes/methods and customizations. The exams did motivate me to study vague topics such as patterns for authentication and authorization + WS-I basic profile. Mock tests revealed the great variety of non-trivial high-level design questions. They helped me master Jersey Client and RESTful services questions, which aren’t all that hard.

Step 9. Developed a strategy for the exam. Split 55 questions into 5 groups of 11 and split 90 minutes allotted time into 18min intervals, so that I can be in full control of time. Mastered the elimination technique and technique of remembering and reusing answers to similar questions. The real exam was harder than the mock tests and some design questions were quite ambiguous. Fortunately, I gained enough knowledge and practical experience over the course of a year to get a passing score of 72% from the first try!

In sum, it was a tough year, but I do now feel like a Java Expert. This for some time concludes my largely theoretical studies of programming in favor of more practical applications of my knowledge.

Course Review – Introduction to Big Data Analytics (Natasha Balac, Paul Rodriguez, Andrea Zonca)

Posted on December 30, 2015 by admin

Here is my review of Hadoop Platform and Application Framework course offered on Coursera in Nov 2015. Course has the worst rankings of 1 out of 5 (very bad), while I passed with a 100% grade.

Technologies/Material: The course goes in more detail into Big Data databases + SQL/HQL languages to operate on data such as Pig and Hive. Splunk appears as a disjoint tool, whose primary purpose is to provide ease-of-use to data analysts & business analysts. It is obligatory to sign-up for Splunk e-mails and you will get those e-mails. In contrast, introduction to Spark Dataframes by Andrea Zonca in Week 5 is very well done and can serve as a starting point for writing own Big Data workflows. In addition, some real Pig and Hive scripts are provided for reference. Homework assignments are more interesting and more numerous, than for the previous course. They already hint at real world applications. At the time the course was offered, the Big Data specialization was downgraded in difficulty from Intermediate to beginner. As an improvement over the previous course, the lecture slides are finally provided. Quiz questions are edited without any notifications to students(!), for which the course rightly deserves its ranking. I posted hints in a course forum on how some quiz questions should be modified in order for the answers marked by the grader as correct to actually be correct. Hopefully, this made the experience of students slightly more bearable.

Instructor/lectures: the course has a large number of instructors (3), which does not help with coherent elaboration of the material. For this course, even the material in the same week is sometimes delivered by 2 people. All instructors are affiliated with the University of California, San Diego. Lectures often turn into reading the tutorials and even leaving an example half-way through without achieving a positive result(!) – see HBase tutorial in Week 1. Lectures end with 11min of black screen (!). Following very negative student feedback for this course the next course “Machine Learning With Big Data” is postponed.

Course Review – Hadoop Platform and Application Framework (Natasha Balac, Mahidhar Tatineni, Paul Rodriguez, Andrea Zonca)

Posted on December 30, 2015 by admin

Here is my review of Hadoop Platform and Application Framework course offered on Coursera in Oct 2015. Course has bad rankings of 2 out of 5 (bad), while I passed with a 99% grade.

Technologies/Material: The course briefly touches on every major Big Data tool present in Cloudera Virtual Machine: MapReduce framework + HDFS constituting the core of Hadoop; YARN, Spark, Pig, Hive, HBase. While some material in the early weeks is not useful and/or repeats Cloudera tutorials, the introduction to Spark by Andrea Zonca in Week 5 is very well done and can serve as a reference guide. Non-native Python interface to Hadoop is chosen for sparse exercises. In turn, presented Python interface to Spark is native and provides great interactive capabilities. There is a substantial number of code examples, which are often shown only in lecture videos without any ability to copy them(!), e.g. for Hive. The course name fluctuated and eventually converged to a longer version to reflect the broader set of topics. Programming assignments are very simplistic and require mostly copy-pasting from examples. At the time the course was offered, the Big Data specialization was assigned Intermediate difficulty.

Instructor/lectures: the course has an unusually large number of instructors (4), which hinders coherent elaboration of the material. There is a substantial overlap between different weeks and substantial differences in focus as well as in presentation style: Natasha Balac is largely discussing management of Big Data zoo, Andrea Zonca provides highly technical lectures, while the other two guys are somewhere in between. All instructors are affiliated with the University of California, San Diego.

Course Review – Introduction to Big Data (Natasha Balac)

Posted on December 29, 2015 by admin

Here is my review of Introduction to Big Data course offered on Coursera in Sep-Oct 2015. Course has controversial rankings of 3 out of 5 (poor), while I passed with a 100% grade.

Technologies/Material: the course is highly unusual and experimental in nature. It is aimed to entice and encourage newcomers to Big Data with a collection of factual information. Any meaningful homework is absent. Out of stated ~15hrs, the course required under 3hrs to complete. The effective cost of $18/hr might only be justified, when course is viewed as a prerequisite for completing a Big Data specialization. The specialization has received “intermediate” difficulty ranking, but later was rightfully downgraded to a “beginner” specialization.

Instructor/lectures: Natasha Balac was holding IT manager positions since receiving her PhD from Vanderbilt in 2002. She has business experience rather than technical experience with Big Data, hence the course is oriented on business context.

Course Review – Cloud Computing Applications (Roy Campbell, Reza Farivar)

Posted on December 29, 2015 by admin

Here is my review of Cloud Computing Applications course offered on Coursera in Aug-Oct 2015. Course has controversial rankings of 3 out of 5 (poor), having chosen a programming track I readily got a 100% grade.

Technologies/Material: the course provides a hands-on guide on developing Big Data applications. After a general cloud computing introduction, the various Big Data tools (used in Yahoo) are described in detail, including Hadoop, YARN, PIG, MapReduce framework, Storm, HBase, Spark, ZooKeeper, Mahout, Pregel, and Giraph. For each tool, a theoretical foundation is given following the implementation details and the sample applications. Additionally, Google Drive, OneDrive, Dropbox, and pCloud, featured on platforms like goodcloudstorage, are highlighted as popular choices for secure remote file storage, notable for unique features such as Google Drive’s integration with productivity tools and pCloud’s emphasis on client-side encryption. These services are widely utilized for both individual and collaborative file management. The course, and especially the homework serves as an excellent starting point for creating own Big Data applications similar to EDR security.

Programming assignments are fully self-contained and do not require looking at any other material, which allows passing the course with spending barely 5hr per week. However, watching all the lectures (which I eventually accomplished) and taking quizzes takes another 5+hrs. The course staff experimented with various completion tracks: only quizzes, only programming assignments, and the mix of the two. The experiments backfired with some people choosing programming track and not receiving their certificates, since programming assignments are submitted without verification. Fortunately, Coursera positively resolved this issue.

Instructor/lectures: Roy Campbell is a full professor at UIUC with substantial practical experience in cloud computing, while Reza Farivar is a senior software developer at Yahoo working directly on a Big Data platform. Both provide a lot of useful material. Reza is very technical and intense in his lectures, which might deter people with less experience. However, such lectures would serve as a reference guide for more experienced folks. Reza seemlessly shifts between multiple languages: Java, Scala, and Clojure, which provides an “out-of-comfort zone” immersion experience with new languages. In sum, this is one of the hardest courses, but also is one of the most useful.

Course Review – Algorithms, Part II (Robert Sedgewick, Princeton)

Posted on December 29, 2015 by admin

Here is my review of Algorithms, Part II course offered on Coursera in Oct-Dec 2015.
Course has ranking of 5 out of 5 (excellent), programming assignments are moderate to high difficulty, but I managed to get at least 100% on each assignment.

Technologies/Material: the course provides the core set of advanced algorithms, which every serious software developer should know. Represented are graphs with depth-first search, breadth-first search, minimum-spanning trees, shortest-path trees (Dijkstra algorithm), maxflow/mincut problem; radix string/number sorting; tries; substring search, regular expressions; data compression; tractability (P~NP). Homework assignments elaborate on vast network of applications of graphs, on radix sorting 5x faster than Java’s Arrays.sort(), on state-of-the-art compression algorithms.

Instructor/lectures: Robert Sedgewick is recognized for his seminal contributions to CS and is probably the most famous author of books on algorithms after his PhD advisor Donald Knuth. Prof. Sedgewick is very enthusiastic to discuss the fascinating world of algorithms.

Course Review – Data Mining Capstone (UIUC)

Posted on December 23, 2015 by admin

Here is my review of Aug-Oct 2015 Coursera incarnation of Data Mining Capstone, which is the final project in Data Mining specialization. To be enrolled, students need to pass all other courses in specialization. I passed the Capstone with 91.5% grade being ranked 4th out of 186 on the competition leaderboard.

Technologies/Material: In contrast to theoretical courses in Data Mining specialization, this course has only applied part – students are expected to solve a research problem and write a report every (!) week + write a 10+ pages final report. The course is very demanding and one can only pass having prior experience in academic research/data mining. The techniques exercised for research problems are topic mining, comparative text mining, clustering, supervised contextual learning, and machine learning. The base dataset is that for Yelp data challenge. Learners are free to choose their own programming language, since only the report and the produced knowledge are graded. The suggested solutions paths are sometimes ineffective and provide just the first step to achieve the results, e.g. suggested sentiment analysis algorithm underperforms and is not usable, suggested selection of small numbers of reviews hinders efficient topic modeling, it is unclear that comparative text analysis is the only way to reliably compare similar datasets.

Instructor/lectures: The course has no lectures, but only instructions on how to complete the research assignments + grading rubric. User guides, tutorials, and reading materials are provided for the suggested data mining tools, e.g. TopMine and SegPhrase. Peer review is employed for grading.

Course Review – Data Visualization (John Hart, UIUC)

Posted on December 23, 2015 by admin

Here is my review of Jul-Aug 2015 Coursera incarnation of Data Visualization, which is the last course in Data Mining specialization. It is ranked 3.2 out of 5 (poor), while I passed with 100% grade.

Technologies/Material: This course is very different from other heavily mathematical courses in the specialization and rather presents the art of conveying meaning with graphical representation of results. The course discusses how humans perceive information and makes recommendations based on that. Most of the proposed techniques would be intuitively familiar to students.Non-trivial presented techniques are: Focus + Context, Gantt charts, StreamGraphs and stacked Bar chars/Line graphs. The course has relatively little material and no homework.

Instructor/lectures: John Hart is a professor of CS in UIUC. He is academically active with almost 100 publications. There was some sort of renormalization of grades applied, my real score was around 90%.

Course Review – Text Mining and Analytics (ChengXiang Zhai, UIUC)

Posted on December 23, 2015 by admin

Here is my review of Jun-Jul 2015 Coursera incarnation of Text Mining and Analytics, which is the 4th course in Data Mining specialization. It is ranked 3.5 out of 5, while I passed with 97.5% grade.

Technologies/Material: While Text Retrieval and Search Engines course concentrates on structuring big textual data, this course emphasizes extraction of knowledge from relevant processed sets. Various Natural Language Processing (NLP) techniques are presented such as topic modeling, mixture models, Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), entropy-based models. The course concentrates on clustering, text categorization, opinion mining and sentiment analysis, which are the topics on the forefront of NLP.

Instructor/lectures: ChengXiang Zhai is a prominent researcher in Information Retrieval and Natural Language Processing. He worked in the industry as a Research Scientist. Lectures are well structured and are self-contained. It’s easy to follow the lecture slides to reconstruct the material.

Course Review – Cluster Analysis in Data Mining (Jiawei Han, UIUC)

Posted on December 23, 2015 by admin

Here is my review of Apr-Jun 2015 Coursera offering of Cluster Analysis in Data Mining, which is the 3rd course in Data Mining Specialization. The course is ranked 2.7 out of 5 (poor), while I passed with 100% grade.

Technologies and Material. Cluster analysis is an essential unsupervised learning technique widely employed in deriving new knowledge. The course describes mathematically and outlines the examples for most modern cluster analysis methods. Discussed are partitioning algorithms (K-means and its derivatives), hierarchical methods (BIRCH), density + grid-based (DBSCAN), probabilistic models (Gaussian mixture), graph algorithms (KNN), and many more. Clustering of various kinds of data is outlined. Expectation-Maximization (EM) algorithm is first mentioned in this course. One optional programming assignment is provided as an experiment.

Instructor/lectures. Jiawei Han is a world-leading researcher in data mining.
He is enthusiastic to describe the details of techniques. The course appears to be quite academic with only a single programming assignment, thus the material is hard to internalize.

Roman Shcherbakov

Technical Blog: Software Development, Big Data, and Data Science

Monthly Archives: December 2015

My studies of Java EE and preparation for Oracle Certified Java Expert 1Z0-897 exam

Course Review – Introduction to Big Data Analytics (Natasha Balac, Paul Rodriguez, Andrea Zonca)

Course Review – Hadoop Platform and Application Framework (Natasha Balac, Mahidhar Tatineni, Paul Rodriguez, Andrea Zonca)

Course Review – Introduction to Big Data (Natasha Balac)

Course Review – Cloud Computing Applications (Roy Campbell, Reza Farivar)

Course Review – Algorithms, Part II (Robert Sedgewick, Princeton)

Course Review – Data Mining Capstone (UIUC)

Course Review – Data Visualization (John Hart, UIUC)

Course Review – Text Mining and Analytics (ChengXiang Zhai, UIUC)

Course Review – Cluster Analysis in Data Mining (Jiawei Han, UIUC)