What happens when you type https://www.google.com in your browser and press Enter
Initial Typing ⌨️
The moment you start typing the keys
https://g, the browser receives the event, and the auto-complete functions chip in. Depending on the algorithm of your browser, Most of these algorithms sort and prioritize results based on history searches, bookmarks, cookies, and popular searches from the internet as a whole. As you are typing each letter, many blocks of code run-, and suggestions will be refined with each keypress. Sometimes, It may even suggest
https://www.google.com before you finish typing it.
What is HTTP? 🔗
HTTP is a protocol that allows two applications to communicate with each other by exchanging plain text messages following a client-server architecture. So the client needs to make the contact first and the server responds. HTTP runs on top of the TCP/IP protocol suite as an application layer protocol. using the functionalities provided by the TCP and IP of the suite to send messages.
From your computer, you sent a message on a local network, and is seen by devices on the same network listening at the time, If the destination IP address is theirs, they respond. Among these devices is the router that also sees these messages and ensures messages that are meant for the different networks as relayed onto that network and those that are meant for the local network stay on the local network. The router does this, via a series of intermediate devices while finding the optimal route. Once the router packs the packet on that network and the one destined for that packet sees it then it decides to reply.
DNS lookup Journey ☁️
Before the browser can send a message to the server, it needs to resolve the domain name(server name) provided to an IP address. If the browser has contacted this server by this name before it probably has cached it so does the OS, so it first searches its cache for the IP address for this name and If not found proceeds to ask the underlying OS for the IP address. If it is not in its cache and in the hosts' file that keeps IP addresses to name records, then it sends the name query outside to the DNS.
The DNS (Domain Name System) is a hierarchical and distributed system with the ultimate goal of resolving domain names to addresses. It is the phone book of the internet that we can query. Our computer is configured with the IP address of a DNS server by DHCP server through our ISP. Name query is sent to this DNS server that we call the First hop DNS server or the resolver. By default, it recursively queries the DNS on behalf of your computer. If the resolver itself does not have the result already it then proceeds to ask the root server of the DNS. The root server tells the resolver where to get the ‘com’ name server, so it goes to the ‘com’ top-level domain name server with a query. com, the server tells the resolver the IP address of the authoritative name server and it is this server that tells the resolver the IP address
https://www.google.com. Resolver comes back to return the response to your computer that asked it, your computer gives The IP to the browser. The browser now has the IP address of the host server and it caches it.
The Connection 🔌
Now, the HTTPS part means the browser wants a secured communication with the www.google.com server. Meaning that instead of sending plain text messages the browser wants those messages to be scrambled in a way that anyone that is able to tap into the communication would not understand or do anything with the message.
It's time to use the TCP/IP protocol suite to establish a connection between the browser and the server. This protocol defines a set of rules that work together to make communication possible on the internet, and HTTPS is part of this protocol and runs on top of it.
HTTPS demands the use of a reliable transport layer protocol so that packets/ messages lost on their way can be re-sent. Therefore it is more often than not to use TCP as a transport layer protocol.
The browser tries to set up a TCP connection with the server, via a mechanism known as the TCP 3-way handshake.
Summary of the handshake: 🤝
- SYN: The browser sends a SYN message which stands for; synchronize, to the server along with a sequence number A which is randomly generated.
- SYN-ACK: The server sees the message and sends an SYN-ACK response with an acknowledgment number of A + 1 and a random sequence number B back to the browser.
- ACK: The browser sends an ACK message with an acknowledgment number B + 1 back to the server.
Steps 1 and 2 establish a one-way connection with the client and the last step establishes an opposite-way connection, in all we have a full-duplex connection between the client and the server.
The server has to bind and listen on a port to be opened for a connection. Otherwise, the connection cannot succeed as no SYN-ACK response would be received from the server because it is not listening on the network for any packet addressed to it.
The messages would not reach their destination without the IP part of the protocol put to use, because it is the IP that defines how to address and route messages between and within networks. The addressing uses 32 bits to represent devices communication on the network so each device has a number of the format 188.8.131.52 given to it. And defines routing rules.
The Firewall. 🔥
Firewalls come in handy to filter threats. They can be software on a computer system or dedicated hardware. They filter connections by blocking the establishment of some connections and allowing some, based on some defined rules.
We are able to establish a connection because our network and Google’s firewalls allowed us.
The scheme of the URL we had in the browser says HTTPS, so the browser actually requests to have not just a plane TCP connection but an extra layer of security known as SSL(Secure Sockets Layer) or TLS(Transport Layer Security). And in this layer, data is encrypted and the base of this security is the PKI system.
In the PKI, we have a public key and a private key. What is encrypted with the public key can only be decrypted with the private key and the opposite is true.
A secured website has obtained an SSL certificate which has both public and private keys. The owner of the key only shares the public key when requested to have a secured connection.
Your browser sends a request to have a secured communication channel, sending with it the ciphering suite it supports. The server responds with the certificate which has the public key and the digital signature of the certificate authority, also the agreed ciphering suite to use. The browser verifies the certificate to make sure it is from a trusted certificate authority. If it passes, it sends an encrypted random message to the server with the public key, the server decrypts it and sends an encrypted message with the secret key. Based on these shared secrets between the server and the browser together with the public key, they both generate some key, the Session key, which is symmetric, they do not exchange these keys since they used the same recipe and ingredients. So they have the same key generated. They acknowledge each other by sharing messages encrypted with these keys. Now, these are the keys for subsequent data exchange for their session. And thus a secured layer is established
Load Balancer ⚖️
The computers we use cannot handle and process millions of data concurrently. It makes sense to have multiple computers (server machines) put to use to share the load of the many connections coming to their network. A system is deployed between the servers and the client that receives requests on behalf of the servers and shares them amongst the servers based on some conditions and algorithms. So when we set up the connection we hit Google’s load balancer and then it wired us to the appropriate server that handles our requests. Google has servers all across the globe and this load balancer makes sure we are connected to servers near us. The load balancing mechanism also removes a single point of failure from the system, making sure Google services are always available on top of handling concurrent connections. The load balancer exposes one IP address while hiding the actual webservers' addresses. The balancer might decrypt the messages because there might be SSL termination enabled on it, lifting the heavy duty of decrypting messages off the actual servers.
Web Server 🗄
A web server is a computer that runs websites. It’s a computer program that distributes web pages as they are requisitioned. The basic objective of the web server is to store, process, and deliver web pages to users. This intercommunication is done using Hypertext Transfer Protocol (HTTP)
Now we found
https://www.google.com IP address, a TCP connection has been established because all firewalls allowed us, an SSL tunnel has been established also, and now we send our HTTP request message over that channel to the load balancer. The balancer hands off our request to the right server.
It is on this computer that our website is hosted, where Google has a webserver process running and listening on port 80 or 443(HTTP or HTTPS respectively). It could be Apache HTTP server or Nginx. Web servers serve static content like an HTML page, images, plain text files, etc. So for static content, it maps the request URI (Uniform Resource Identifier) to a file system path. For dynamic content requests, it delegates the task to an application server to handle the request and give back a response.
It is the application server that communicates with databases and handles the applications' business logic like logins and signups amongst other many things.
The Database 🗄
The application server cannot be intelligent without the database. The entire system cannot operate without this system put in place. A database is a collection of data. The Database Management system makes management of data possible, storing user information, updating the location and time zone information, management session data, etc. It allows for maintaining organized data in a beautiful and efficient structure. So, the application server talks to this database system via the database server to query data out of it, to store, update and delete data. In accordance with the request since we are requesting the home page of google, the database will be queried for location information considering our IP address. So we will get a more personalized HTML page generated and served to us by the application server.
Finally, Thanks for reading this article. I enjoyed writing it, I hope you enjoyed reading it as well. Let me know what you think in the comments box! Till next time 👋, Happy coding👩💻