E-mail server hosting on Amazon EC2

In the previous post I described how to set up web hosting with HTTPS and WordPress. All those steps require less work compared to settings up a fully secured e-mail server.

Technologies

For e-mail self-hosting we need postfix as a message transfer agent (MTA), dovecot for POP3 e-mail server, cyrus SASL (simple authentication security layer) for SMTP relay security, Amazon SES (simple e-mail service) for SMTP relay authority and reverse DNS lookup, SSL certificate from Let’s Encrypt described in the previous post.

Base setup

  1. Install postfix, dovecot, cyrus SASL, start them and enable the correspondent services (postfix, dovecot, saslauthd), and remove sendmail.
    1. sudo yum install postfix dovecot cyrus-sasl
    2. sudo yum remove sendmail
    3. sudo yum postfix start # repeat for dovecot and saslauthd
    4. sudo chkconfig postfix on # repeat for dovecot and saslauthd
  2. Create real user with password + directory (or a virtual user with a virtual mailbox).
    1. sudo useradd admin
    2. sudo passwd admin
    3. sudo mkdir /home/admin/mail/
    4. sudo chown admin /home/admin/mail
  3. Configure postfix for basic SMTP on port 25
    1. Edit /etc/postfix/main.cf to specify
      1. myhostname=yourhostname.com
      2. mydomain=yourhostname.com
      3. inet_interfaces=all
      4. inet_protocols=all
      5. home_mailbox=mail/
      6. message_size_limit=10485760 # for 10MB
      7. mailbox_size_limit=1073741824 # for ~1GB
      8. smtpd_recipient_restrictions=permit_mynetworks, permit_auth_destinations,permit_sasl_authenticated,reject
  4. Configure dovecot for basic POP3 on port 110
    1. Edit /etc/dovecot/10-auth.conf to specify
      1. disable_plaintext_auth=no
      2. auth_mechanisms=plain login
    2. Edit /etc/dovecot/10-mail.conf to specify
      1. mail_location=maildir:~/mail
    3. Edit /etc/dovecot/10-ssl.conf to specify
      1. ssl=no
  5. Open ports 25 and 110 in EC2 security groups, restart dovecot, postfix, and check that you can send e-mail to yourself and receive it via your favorite e-mail agent (SMTP and POP3 hosts are yourhostname.com, no encryption, no SSL/TLS).

Authenticated SMTP

The above setup is the least secure. The first step for amending is to require authentication for SMTP. For that, use dovecot for SASL authentication with SMTP server (smtpd).

  1. Edit /etc/postfix/main.cf to specify
    1. smtpd_sasl_type = dovecot
    2. smtpd_sasl_path = private/auth
    3. smtpd_sasl_auth_enable = yes
    4. smtpd_sasl_security_options = noanonymous
    5. smtpd_sasl_local_domain=$myhostname
    6. broken_sasl_auth_clients=yes
    7. smtpd_sasl_authenticated_header = yes
  2. Edit /etc/dovecot/10-master.conf to specify
    1. unix_listener /var/spool/postfix/private/auth  {
    2. mode = 0666
    3. user = postfix
    4. group = postfix
    5. }
  3. In your favorite e-mail application set “My outgoing server (SMTP) requires authentication” -> “Use same settings as my incoming mail server” and test that the new set up can send and receive e-mails to self and to/from one external account.

Secure SMTP and POP3

The above setup doesn’t allow for anonymous access to the e-mail server. However, the established connections are not secure. Both POP3 and SMTP can be secured with the same SSL certificate, we used for HTTPS as long as the connection server names coincide with the domain name.

  1. Enable SMTP port 587, which makes life easier as an addressee, as many popular mailservers would prefer to send to port 587. Note that SMTP port number itself has little to do with the use of SSL.
    1. Edit /etc/postfix/master.cf and uncomment “submission inet n …” line.
  2. Configure smtpd setting to require SSL by editing /etc/postfix/main.cf:
    1. smtpd_tls_cert_file=/etc/letsencrypt/live/yourhostname.com/fullchain.pem
    2. smtpd_tls_key_file=/etc/letsencrypt/live/youthostname.com/privkey.pem
    3. smtpd_tls_security_level = encrypt # this is the main setting to require SSL
    4. smtpd_tls_loglevel = 1 # raise to 2 or 3 if you plan to dig through logs /var/log/maillog
    5. smtpd_tls_received_header=yes
  3. Configure dovecot to require SSL:
    1. Edit /etc/dovecot/conf.d/10-auth.conf to specify
      1. disable_plaintext_auth = yes
    2. Edit /etc/dovecot/conf.d/10-master.conf to specify
      1. service pop3-login { …
      2. inet_listener_pop3s {
      3. port = 995
      4. ssl = yes
      5. }
      6. }
    3. Edit /etc/dovecot/conf.d/10-ssl.conf. Mind “<” signs for ssl_cert and ssl_key.
      1. ssl = required
      2. ssl_cert=</etc/letsencrypt/live/yourhostname.com/fullchain.pem
      3. ssl_key=</etc/letsencrypt/live/yourhostname.com/privkey.pem
  4. Restart postfix and dovecot, open ports 587 and 995 on EC2 instance, configure SMTP in your client to use port 587 and “Use the following type of encrypted connection = TLS”, configure POP3 in your client to use port 995. Tests should pass.

Relay sending SMTP messages to Amazon SES.

The above SMTP and POP3 client setup looks identical to the one for Gmail, which brings the false sense that we are done. Your first e-mail from such self-hosted SMTP server to Gmail will end up in a Spam folder. I know as I tried it. The problem is that your own SMTP server doesn’t have an authority standing by it to certify that the sender is good. Amazon SES serves as such authority after you promise them you won’t be doing anything bad. In short, an e-mail from your SMTP server needs to be relayed to Amazon SES server in a correct hosting zone. Then Amazon SES provides reverse DNS lookup.

  1. Sign up with Amazon SES, verify your primary e-mail on yourhostname.com and e-mail on Gmail, obtain a correct relay host based on a hosting zone, obtain SMTP credentials, verify DKIM. Generally follow guide for integration with postfix.
  2. Configure smtp server for relay. As a rule of thumb “smtpd” server handles e-mail by itself, while “smtp” server asks someone else to handle their e-mail => we need “smtp” and many smtpd options need to be duplicated into smtp options:
    1. Edit /etc/postfix/main.cf to specify
      1. relayhost = email-smtp.us-east-1.amazonaws.com:25 # port doesn’t matter – 587 is as good as 25, the server depends on a hosting zone
      2. smtp_sasl_auth_enable = yes
      3. smtp_sasl_security_options = noanonymous
      4. smtp_tls_security_level = encrypt #outgoing connection must be secure as well
      5. smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
      6. smtp_use_tls = yes
      7. smtp_tls_note_starttls_offer = yes
      8. smtp_sasl_mechanism_filter = plain, login # essential, but not found in a official guide
      9. smtp_tls_CAfile = /etc/ssl/certs/ca-bundle.crt # we verify authenticity of Amazon SES server
      10. smtp_sasl_type = cyrus # which is the default
    2. It may come as a surprise, but dovecot doesn’t support SASL authentication for “smtp” and we have to use cyrus-sasl. One can store hashes of passwords in a file, which is simpler than the database:
      1. Ensure saslauthd service is running and is set to start automatically.
      2. Create /etc/postfix/sasl_passwd file with Amazon SES SMTP server and SMTP credentials.
      3. Run “sudo postmap hash:/etc/postfix/sasl_passwd” to generate a hash file referenced by password_maps above.
  3. Restart postfix and test sending/receiving e-mail between your 2 verified account.
  4. Apply on Amazon SES for a production account, which allows sending e-mail to unverified accounts (aka clients).

This is basically it! We now have a production e-mail system, which is fully secured and can send 50,000 high authority e-mails per day.  Dependent on the use case, you may consider forwarding incoming e-mails to Gmail.

Website self-hosting on Amazon EC2 cloud

The upcoming 3-yr renewal of website hosting plan on Hostgator and the desire to learn AWS cloud made me thinking about self-hosting of my personal website http://astroman.org.

Costs

Hostgator gradually increased regular costs of its Hatchling shared plan from $3.95/mo to $6.95/mo + the cost of the domain to $15/yr => total for a 3yr term is about $300. For a Positive SSL certificate one has to pay $50/yr + upgrade to the next tier of shared hosting plans => total cost over 3yrs readily rises to $600.
Amazon cloud prices are predictably lower. At present t2.nano EC2 instances are priced at $0.0059/Hour without a long-term commitment and at $69/3yrs = $2.9/mo for a 3 yr dedicated instance. Standard 8GiB of EBS storage go for $0.8/mo. Thus, ones beats even the most discounted pricing of Hostgator… except one has to do much more work!

Technologies
Typical web hosting consists of a lot of static contents in a form of HTML pages, images, CSS + WordPress blog + e-mail. SSL support is a premium paid feature. Under the hood, web hosting implies:

  • a lot of HTML/PDF/CSS/JPEG/PHP/etc in a folder on a Linux host
  • a domain name with adequate DNS service
  • Apache web server routing to the content
  • PHP engine + MySQL database to run WordPress
  • Postfix SMTP and Dovecot POP3 servers for e-mail
  • CA-signed SSL certificate with a mechanism for certificate renewal

I aimed at replicating all those features on an EC2 instance and succeeded in about 2 weeks of working on it about 5 evenings a week.

Base setup

t2.nano EC2 instance has only 500MB memory, which prohibits installation of WHM/cPanel => more manual work. Luckily, all other software runs on such box without a hitch on a chosen Amazon AMI Linux distribution. Provisioning of EC2 instance is fairly standard, except I got a discounted dedicated instance on AWS marketplace with $2.9/mo pricing, but only a 2yr commitment. An instance should have an associated Elastic IP address, which is free for as long as an instance is running. Regardless of the domain hosting registrar, the DNS service could be provided by Amazon via Route 53 service, which offers seamless integration with other AWS services and best possible access to your domain. A hosted zone costs extra $0.5/month and I decided to pay that.

Web hosting

Amazon Linux is based on RedHat and has standard tools like yum available. However, be careful as the default versions of packages may need to be abandoned in favor of compatible versions, e.g. use httpd24 instead of httpd:

  1. Configure DNS, edit /etc/sysconfig/network set HOSTNAME=yourhostname.com and restart server “sudo reboot”.
  2. Install Apache: “sudo yum install httpd24”.
  3. Copy files to /var/www/html with the entry point file named index.html.
  4. Edit /etc/httpd/conf/httpd.conf and comment out “AddDefaultCharset UTF-8” unless you have a Unicode-compatible website.
  5. Start Apache and set the service to autostart “sudo service httpd start” and “sudo chkconfig httpd on”.

The website should now be accessible at http://yourhostname.com.

WordPress

A WordPress blog can either be installed at the website root or made available at a specific URL such as http://yourhostname.com/blog. It requires PHP engine and MySQL database. Using PHP56 avoids conflicts with other versions. I migrated WordPress from a different hosting.

  1. Install PHP and MySQL
    1. sudo yum install php56-mysqlnd php56-gd php56 php56-common
      sudo yum install mysql-server mysql
  2. Start and enable autostart of “mysqld” service.
  3. Secure MySQL installation with “sudo mysql_secure_installation”, set root password etc.
  4. Connect to MySQL from command line and create a database for WordPress.
    1. mysql -u root -p password
      CREATE DATABASE wordpress;
      CREATE USER wordpressuser@localhost IDENTIFIED BY ‘password’;
      GRANT ALL PRIVILEGES ON wordpress.* TO wordpressuser@localhost IDENTIFIED BY ‘password’;
      FLUSH PRIVILEGES;
  5. On old WordPress instance install Duplicator plugin and create the archives, then copy the archives to the relevant folder in the new hosting.
  6. Access installer.php and follow the prompts to hook up to MySQL database, unpack the archive and make selections.
  7. If the website address/folder changes at a subsequent time, make necessary changes to MySQL database.

SSL certificate

Free self-signed certificates cannot be used for anything other then testing. SSL certificates signed by trusted CA were always a paid premium feature, but not anymore. A new company Let’s Encrypt now provides free SSL certificates for anyone! The certificates are cross-signed by IdenTrust, whose Certificate Authority public key is already present in most major browsers/operating systems. Steps to get the certificate and use it with Apache:

  1. Get Let’s Encrypt project
    1. sudo yum install git
    2. sudo git clone https://github.com/letsencrypt/letsencrypt /opt/letsencrypt
  2. Obtain a certificate. Amazon Linux AMI support is experimental, but –debug flag successfully forces installation of relevant dependencies.
    1. sudo -H /opt/letsencrypt/letsencrypt-auto certonly –standalone -d astroman.org –debug
  3. The resultant 3 certificate files are referenced for Apache in /etc/httpd/conf.d/ssl.conf file as:
    1. SSLCertificateFile /etc/letsencrypt/live/yourhostname.com/cert.pem
    2. SSLCertificateKeyFile /etc/letsencrypt/live/yourhostname.com/privkey.pem
    3. SSLCertificateChainFile /etc/letsencrypt/live/yourhostname.com/fullchain.pem
  4. Include a permanent redirect to HTTPS in Apache config file /etc/httpd/conf/httpd.conf
    1. <VirtualHost *:80>
    2. ServerName yourhostname.com:80
    3. Redirect permanent / https://yourhostname.com/
    4. </VirtualHost>
  5. Open 443 port on EC2 instance and restart Apache. All links on your website including the main page and WordPress will now by HTTPS.
  6. Set up automatic certificate renewal to run daily in root crontab and redirect output to a file to check the renewal command runs. Most of the times the renewal script will skip renewal as the certificate is not yet due – it will only renew once in 60 days.
    1. sudo crontab -e
    2. 30 2 * * * /home/ec2-user/renewal.sh
  7. renewal.sh file logs the date and attempts to renew the certificates without Apache restart and without updating dependencies
    1. sudo echo `date` >> /home/ec2-user/renew.log
    2. sudo /opt/letsencrypt/certbot-auto renew –webroot -w /var/www/html –no-bootstrap >> /home/ec2-user/renew.log

First experience with Elasticsearch

Many modern enterprise applications rely on search to some extent. As of Nov 2016 the most popular search engine is Elasticsearch. It is an open source engine based on Apache Lucene. The need to perform search arose in my home project as well. I chose Elasticsearch for the engine and readily dived into the tutorials. My methodology for writing interactions with the 3rd party systems is to create Facade APIs within Test Driven Development process. The tests for indexing and retrieving documents worked flawlessly, but the test results for the search queries got me puzzled. I have formal training in search engines within Coursera Data Mining specialization, thus I know concepts like TF-IDF. The hope was to get the relevance scores and match them precisely to the numbers computed by formulas in the tutorials.

Basic index for 4 test documents returned me the numbers vastly different from my expectations… After some googling I turned on the “explain” functionality and was up to an even bigger shock: the returned scores didn’t match the scores in the explain section. I started suspecting the unthinkable: the relevance calculations are broken! Elasticsearch tutorial confirmed my worst fears… well, it rather explained to me how little I know about the real search engines. After couple more hours of comparing numbers the discrepancies were decomposed into an optimization feature, a bug pretending to be a feature, and a bug. The optimization feature is that several shards are created for each index and documents are randomly distributed between those shards. The relevance calculations are only performed within each shard for DEFAULT search type. Setting search type to DFS_QUERY_THEN_FETCH forces shard statistics to be combined into a single IDF calculation, thus leading to values closer to the expected numbers. However, the “explain” functionality always employs the DEFAULT search type leading to a mismatch, hence a bug. A bug pretending to be a feature is in really coarse-grained rounding of the relevance norm. The discrepancies reach 15%, which hurts testing.

Course Review – Big Data – Capstone Project (Ilkay Altintas, Amarnath Gupta)

Here is my review for Big Data Capstone Project course offered on Coursera in Jul 2016. The course represents the final project for the Big Data specialization, it does not have separate rankings, while I passed with 98.2% score.
Technologies/Material: As a final project, the course does not have lectures, but rather brief descriptions of relevant project parts each week. The project is about making suggestions on how to increase revenue of a company promoting a fictional game “Catch the Pink Flamingo”. A lot of simulated game data is made available to the learners. The part assigned each week represents a separate area of big data analytics: data exploration, classification, clustering, and graph analysis. The suggested technologies are: Splunk, KNIME, Apache Spark, and Neo4j, respectively. As usual within the specialization instead of free exploration a “correct” path is given along with substantial help on the way. The assignment each week is peer graded with the ability to submit multiple times and get regraded. Grading asks to compare learners’ numbers with the correct numbers, which means that almost everyone gets correct answers on their second attempt. Unfortunately, many people slack off on their first attempt or simply submit an empty report. At the end of the course a final report with a powerpoint presentation are submitted and also peer graded.
Instructor/lectures: the task instructions are given by Amarnath Gupta and Ilkay Altintas. The course offers a realistic view of a job of a Data Scientist: analyze all available data to increase revenue of a company, improve retention rates, suggest the ways of development, and, most importantly, make presentations to the management. The instructors emphasize each week that the company’s bottom line is of the utmost importance. Even though the specialization is called Big Data, there is no emphasize on especially large volumes of data or on distributed computations, thus we are in the Data Science realm.

Course Review – Graph Analytics for Big Data (Amarnath Gupta)

Here is my review for Graph Analytics for Big Data course offered on Coursera in Feb 2016. The course is ranked 2.5 out of 5, while I passed with 99.4% score.
Technologies/Material: The course provides introduction to graph theory with practical examples of graph analytics. Most of examples and homework is done in Neo4j, a leading graph database. The last assignment employs GraphX API in Spark. Since graph databases are so different from regular databases, the special graph query language called Cypher was developed to write code for Neo4j. Extensive Cypher tutorial and executable code samples grouped by topics are given. Graph analytics offers simple answers to many questions. The discussed graph techniques are Path Analytics, Dijkstra algorithm and its variations, Connectivity Analytics, Community Analytics, and Centrality Analytics.
Instructor/lectures: the course is taught by Amarnath Gupta, an Associate Director of San Diego Supercomputer Center. Amarnath is an amazing instructor. The course is well taught with just the right speed and the right amount of material given. In my view, he succeeded in making an introduction to graphs, while not oversimplifying the concepts.

Course Review – Machine Learning with Big Data (Natasha Balac, Paul Rodriguez)

Here is my review for Machine Learning with Big Data course offered on Coursera in Jan 2016. The course got 2.0 out of 5 rankings and I passed it with 100% score.
Technologies/Material: The course provides basic theory and some exercises on popular machine learning techniques after presenting business justification and ML pipeline. The presented techniques are decision trees, association rules, and clustering. Exercises are largely done in KNIME with some parts in Apache Spark. Thankfully, the course has copyable code samples and provides basic information on how to get started with KNIME. The assignments require digging into non-trivial details of KNIME from its documentation/Internet/forums. For me the course provided valuable insights and examples of decision trees and association rules, which not many other courses offer.
Instructor/lectures: The course is taught by Natasha Balac, who provides most of business background, and Paul Rodriguez, who is a technical person. The presentation is organized better than in previous courses, though the depth of the material is often not sufficient for solid learning. Some slides can be reused to present Big Data to managers.

first week in semi-autonomous 2016 Volvo S60

The autonomous driving technology is still early in its development, but more and more self-driving cars appear on the road. To facilitate progress many manufacturers put some elements out as driver assist features: adaptive cruise control, collision warning, emergency braking, lane keeping aid, park assist, road sign recognition, pedestrian and cyclist recognition.
I risked getting one of those vehicles with many driver assist features and try it out by myself. Here are the impressions from the first week in 2016 Volvo S60 Platinum.

1. Extremely relaxing to not push gas or brake pedals even in stop&go traffic!
2. Can readily go 5mph above the preset speed on steep declines – bummer.
3. A person can see several cars ahead and predict, if the car right ahead will suddenly decelerate – cruise control isn’t that smart yet.
4. Cruise control’s braking power is insufficient in ~3% of cases for the shortest following distance.
5. Luckily, collision warning kicks in very early on, if the person ahead suddenly brakes.
6. Lane keeping aid applies short and efficient bursts of torque – funny to find my hands in a slightly different position after 0.5 second.
7. Following through turns works just fine meaning adaptive cruise control performs well in the city.
8. No, car doesn’t accelerate uncontrollably into turns even at high preset speed and nobody around – thanks Volvo engineers for resolving this safety concern.
9. Picking stationary vehicles and red lights are the only weak points, but yes, those are hard to resolve.
10. Car always complains about me coming too close to the curb, while parking – such a fussy baby.
11. Rear cross-traffic alert is a must-have addition to the rear view camera, if one doesn’t really want to turn their head for backing.
12. BLIS indicator at the base of the rearview mirror is infinitely better than on the mirror itself.
13. No, I don’t really want to test emergency braking for pedestrians and byciclists and hopefully it won’t ever need to engage…
14. Not sure if the Speed Limit Pilot in new Mercedes is useful – my car sometimes reads the signs wrong + naturally old speed limit will display till it sees another sign.

No, my car is not fully autonomous, but here is the Volvo vision http://www.volvocars.com/intl/about/our-innovation-brands/intellisafe/intellisafe-autopilot/this-is-autopilot.

Evolution of functional programming in Java

For many years I performed most of my data manipulation in Wolfram Mathematica.
With broad support for functional programming, it was nice and simple to apply a function on any arbitraty vector or matrix. The resultant code was really nice and short. A “for” loop was very rarely needed. Having switched to Java 6 and 7, I find such manipulations with data substantially less pleasant. Given the standard packages of Java 7, a “for” loop is unavoidable and the code becomes substantially longer.
Google Guava partially solves the problem with method like “Lists.transform“.
However, in this case the function still has to be explicitly defined.
Let me illustrate the transition on the example of parsing search results in Youtube API. Java 7 code

List<String> videoIds = new ArrayList<>();
for(SearchResult searchResult: searchListResponse.getItems())
          videoIds.add(searchResult.getId().getVideoId());

Java 7 code with functional programming in Guava:

Function<SearchResult, String> func = new Function<SearchResult, String>() {
            @Override
            public String apply(SearchResult searchResult) {
                  return searchResult.getId().getVideoId();
            }
     };
List<String> videoIds Lists.transform(searchListResponse.getItems(), func);

The code became longer and less transparent rather than becoming simpler. This is a known caveat in working with Google Guava under Java 6 and 7.
Luckily, Java 8 brings such needed simplications making the above code truly a one-liner.
Lambda expressions save the day

List<String> videoIds = 
   Lists.transform(searchListResponse.getItems(), d -> d.getId().getVideoId());

A simple method can instead be passed as a method reference

List<ResourceId> resourceIds = 
        Lists.transform(searchListResponse.getItems(), SearchResult::getId);

My studies of Java EE and preparation for Oracle Certified Java Expert 1Z0-897 exam

The third level of Java certification is becoming an Oracle Certified Java Expert. Unlike for the previous 2 levels, the learners have a choice of passing either “Enterprise JavaBeans Developer“, or “Web Component Developer“, or “Web Services Developer” exam. I chose to concentrate on the Web Services exam as the one covering most in-demand material, which also aligns well with my job at Deutsche Bank. I was spending about 1hr every day at work to work through the tutorials and code samples + I’m exposed to Web Services within my work projects. It still took about 1 year to prepare, as the exam is quite hard.

Here is what I did to prepare:
Step 1. Made a study plan based on my preparations for the professional certification. I wanted to read an official Oracle tutorial, a book on Web Services, some online study guides, look through and run some sample code, also check all use cases of Web Services within my applications at work.

Step 2. Read Java EE 7 Oracle tutorial. Only about 25% of topics are relevant for the exam, but I wanted to get a feel for all enterprise technology, which comes out-of-the box with Java. Some constructs, e.g. listeners and interceptors, are widely used in software development. Some non-trivial concepts of Enterprise Java Beans (EJB) do find their way into Web Services exam.

Step 3. Bought and worked through “Java Web Services: Up and Running” book by Martin Kalin. The book describes the development of Web Services from XML RPC to state transfer and method invocations on proxies. A variety of examples provide hands-on experience with Web Servers (Tomcat), build tools (Ant), generation and usage of WSDL files & XSD schemas. I was able to transition most code from GlassFish to Tomcat and successfully run it via IntelliJ IDEA.

Step 4. Read through Oracle Certified Expert in Web Services guide by Mikalai Zaikin. This one is specifically aimed at the exam. It has a good coverage of Jersey client and asynchronous JAX-WS method invocations.

Step 5. Skimmed through SCDJWS 5 exam study guide by Ivan A. Kriszan, which can be found here. The guide is for the previous version of the exam, but it still very informative. It covers in detail XML and WSDL as languages and schemas + Web Services definitions as special instances of objects written in those languages. This guide was the first to describe business registry (UDDI) as well as XML parsing in great detail. It has the whole chapters dedicated to security, interoperability, and design principles.

Step 6. Read through the specifications of JAX-RS 1.19 and JAX-WS 2.2 used on the exam. The specifications themselves are very dry and can only be used to clarify unclear concepts, e.g. how precisely a RESTful server chooses, how to parse the URI and which method to invoke with which input and output content types. JAX-WS specification does a great job elaborating on precise rules for SOAP/Logical handler invocations in case of chain interruptions/exceptions.

Step 7. Read through the APIs and implementations of relevant classes and packages from URLConnection to WebServiceRef: Jersey Core, Jersey Server, Jersey Client, JAX-WS.

Step 8. In parallel to steps 6 and 7 did Enthuware practice exams. I was lucky to be able to take 5 exams on consecutive Saturdays and work through the answers. The exams served to create some mnemonic rules on, e.g. which SEI and SIB methods are exposed; what the names are of Service, Port, PortType, Operation etc in relation to names of classes/methods and customizations. The exams did motivate me to study vague topics such as patterns for authentication and authorization + WS-I basic profile. Mock tests revealed the great variety of non-trivial high-level design questions. They helped me master Jersey Client and RESTful services questions, which aren’t all that hard.

Step 9. Developed a strategy for the exam. Split 55 questions into 5 groups of 11 and split 90 minutes allotted time into 18min intervals, so that I can be in full control of time. Mastered the elimination technique and technique of remembering and reusing answers to similar questions. The real exam was harder than the mock tests and some design questions were quite ambiguous. Fortunately, I gained enough knowledge and practical experience over the course of a year to get a passing score of 72% from the first try!

In sum, it was a tough year, but I do now feel like a Java Expert. This for some time concludes my largely theoretical studies of programming in favor of more practical applications of my knowledge.

Course Review – Introduction to Big Data Analytics (Natasha Balac, Paul Rodriguez, Andrea Zonca)

Here is my review of Hadoop Platform and Application Framework course offered on Coursera in Nov 2015. Course has the worst rankings of 1 out of 5 (very bad), while I passed with a 100% grade.

Technologies/Material: The course goes in more detail into Big Data databases + SQL/HQL languages to operate on data such as Pig and Hive. Splunk appears as a disjoint tool, whose primary purpose is to provide ease-of-use to data analysts & business analysts. It is obligatory to sign-up for Splunk e-mails and you will get those e-mails. In contrast, introduction to Spark Dataframes by Andrea Zonca in Week 5 is very well done and can serve as a starting point for writing own Big Data workflows. In addition, some real Pig and Hive scripts are provided for reference. Homework assignments are more interesting and more numerous, than for the previous course. They already hint at real world applications. At the time the course was offered, the Big Data specialization was downgraded in difficulty from Intermediate to beginner. As an improvement over the previous course, the lecture slides are finally provided. Quiz questions are edited without any notifications to students(!), for which the course rightly deserves its ranking. I posted hints in a course forum on how some quiz questions should be modified in order for the answers marked by the grader as correct to actually be correct. Hopefully, this made the experience of students slightly more bearable.

Instructor/lectures: the course has a large number of instructors (3), which does not help with coherent elaboration of the material. For this course, even the material in the same week is sometimes delivered by 2 people. All instructors are affiliated with the University of California, San Diego. Lectures often turn into reading the tutorials and even leaving an example half-way through without achieving a positive result(!) – see HBase tutorial in Week 1. Lectures end with 11min of black screen (!). Following very negative student feedback for this course the next course “Machine Learning With Big Data” is postponed.