New kid in the block – Homomorphic encryption.

Healthcare data has an important challenge from a cryptography standpoint. It has to be both private and useful. An initial review of these requirements may appear as  two completely contradictory elements. Encrypted data using traditional encryption techniques remove the usability factor. In a traditional encryption scheme, unless the end user with the encrypted data has a decryption key, the data is completely useless.

But, what if there is a new way of encrypting data? A technique where, the end user can perform certain relevant functions on the encrypted data without ever decrypting it. As it turns out, there is a mechanism of accomplishing this. It is called homomorphic encryption scheme.

This scheme was first proposed by Ronald L. Rivest, Leonard Adleman and Michael Dertouzos.  The general expression for a fully homomorphic encryption scheme is:

There are currently a few cryptographic libraries that can be classified as fully homomorphic encryption schemes.

The key advantage of a fully homomorphic encryption scheme is the ability to perform mathematical calculations on the cipher text. For healthcare data to be useful, one need to perform these calculations on the data. Using a fully homomorphic encryption scheme, these computations can be performed without ever decrypting the data.

Homomorphic encryption scheme is the next big step in big data and artificial intelligence. As more and more healthcare organizations are looking to reduce cost of their IT infrastructure by adopting cloud computing, a truly homomorphic encryption scheme will not only protect the data, but also provide useful insights into these massive data-sets without ever compromising privacy.

Some of this work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The picture of rows of protein data creating a colorful sequence display at the Genomic Research Center in Gaithersburg, Maryland. This image was created by © Hank Morgan – Rainbow/Science Faction/Corbis and obtained from public domain via nationalgeographic.com)

Research and development – When is a good time to invest?

Businesses have limited resources. How to efficiently manage them is an art. A key controversial area of spending is always research and development (R&D). As a start-up we are even more constrained than a regular well established business. Therefore the question I often encounter: Is that a great idea to invest in research and development.

I have given some thought to that question. My answer is a resounding yes. I am pitching this idea on top of Amar G. Bose’s vision of role of research and development in business. A trendy approach for most corporations is to keep the research and development efforts to a bare minimum, mostly in the name of shareholder or investor value. Most business doesn’t view themselves as flag bearers of innovation. They orient themselves to protect the status quo of their commercial enterprise. When going gets tough, these enterprises cut spending from R&D to shift the blame from having a poor product portfolio in the first place.

This is a counter-intuitive, yet, widely adopted practice in the world of business. Amar Bose had a very different take. According to Bose, when the economy is going through a recession or when a company is struggling to find a better place in the market, it is the best time to invest in research and development. His reasoning being, cutting money from R&D wouldn’t provide the oxygen for coming up with newer products and innovations that a company needs in the first place. By the time the recession is over, or when customers realize there is a gap in what their expectations are and what the product delivers, the company will no longer be in a position to deliver the increased expectations of the customers or the business environment. New competitors will fill-in the gap.

My suggestion is: always invest in R&D and be bullish about those investments. Even if the business is just a mom and pop store in a highly popular tourist neighborhood, R&D will work. Especially in an era when social media and data sciences have become the life blood of businesses, all types of businesses, whether small or large, need to invest in R&D. By R&D I don’t mean running a lab with a bunch of scientists with white coats. The research and development includes, how to improve supply chain efficiency, how to improve communication and PR, how to improve cash inflow, how to better develop a targeted marketing, so on and so forth.

Science and business goes hand in hand. Science has an empirical view of the world. Businesses need to have an empirical view of financial performance. When these two merge together, it is a recipe for growing into a great business. An approach of R&D heavy investments will help businesses understand the emerging blind-spots in the market place and solve them as quickly as possible.

A great example is Exxon-Mobil. Despite having a heavy investment in fossil fuels, the company invested billions of dollars into scientific research on climate science. When the results of the research started coming out, it turned out to be a completely unexpected outcome for the executives initially. But, it still provided a valuable tool to foresee the evolving energy market. How Exxon-Mobil dealt with the unexpected results is highly controversial, but, I admire the ability of an organization to fund scientific research that had far-reaching consequences to their traditional business model.

My view of R&D is: it is the stethoscope of the market place. It will allow us to listen into small shifts in rhythm, way before those shifts change into a disastrous event. This listening tool will help enterprises survive from being blind-sided by large scale disruptive changes in the market place.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The picture of the International Space Station (ISS) taken on 19 Feb. 2010, backdropped by Earth’s horizon and the blackness of space. This image was photographed by a Space Transportation System (STS) -130 crew member on space shuttle Endeavour after the station and shuttle began their post-undocking relative separation. Undocking of the two spacecraft occurred at 7:54 p.m. (EST) on Feb. 19, 2010. The picture was obtained from public domain via nasa.gov)

Understanding brain imaging data – 65,000 shades of gray.

I wanted to explain how structural and functional MRI image format NIFTI works. NIFTI stands for Neuroimaging Informatics Technology Initiative. It is currently a 8 to 128 complete bit, signed or unsigned integer data storage format. The most common implementation of NIFTI is the 16 bit signed integer data storage. The NIFTI images usually have the file extensions *.nii or (*.hdr and *.img) pairs. This image format is important due to two reasons: 1) The format stores the spatio-temporal imaging details and 2) compression to allow better space management.

Anyone who underwent a brain scan knows, the picture of the brain from an MRI or CT scanner is usually a grayscale picture with 65,536 shades of gray. The raw files from the scanner are usually DICOM (Digital Imaging and Communications in Medicine) format with an extension of *.dcm. The DICOM format is similar to the RAW image format for cameras. Instead of pixel read out stored in RAW images, the DICOM images store scanner readouts.

Each scan of a subject usually contains several DICOM files. This is an advantage and a disadvantage. For sharing specific image slices, DICOM is therefore extremely useful. But, for most interpretation purposes, the analysis often requires full image sets. A few slices from the scanner becomes less useful. This is where NIFTI format comes to rescue.

Since the format stores the entire sequence in a single file, the issues of managing large number of files are eliminated. The interpretation of a specific image based on the image preceding and succeeding it becomes easier. This is due to the ordered arrangement of images.

There is another important advantage of NIFTI. Brain imaging data is most relevant from an analytical point of view, to be used as a 3D data structure. Even though the individual components of the NIFTI are 2D images, the interpretation of an image becomes more reproducible if we treat them as 3D images. For this purpose, the NIFTI format is the best format to work with.

An example is the use of a machine learning tool called 3D convolutional neural networks (cnn). 3dcnn’s provide the 3d spatial context to a voxel. For image sequences like brain scans, identification of various structures or any abnormalities require the 3d spatial context of a voxel. The 3d cnn approach is very similar to looking at a video and trying to identify what the scene is about. Instead of using it for video scene recognition, 3d cnn can be used to train and detect specific features in a brain scan.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

Virtualization – Matryoshka dolls of computing.

A few weeks back I talked about various open operating systems to efficiently run some of the deep learning and simulation models. I switched back and forth between six different flavors of linux to finally settle with one. This experimentation phase is helpful in the long-run.

But, for folks who want to run one particular toolkit in the convenience of their preferred operating system environment, there is an alternate option. It is virtualization, and one software particularly: Docker.

Virtualization is the computing equivalent of Matryoshka dolls. A host computer can have multiple operating systems running inside it, or one can nest virtual machine within a virtual machine within a physical machine. This layering approach to operating systems have made software applications somewhat platform agnostic.

I love Docker in Windows 10. The caveat is, the OS has to be 64 bit and the processor should support specific extensions that allow hardware level virtualization. These extensions are often referred to as x86-VT-x. Docker prefers Microsoft Hyper-V, to run its linux virtual machines inside windows. In systems that may not meet these requirements, it is possible to force Docker to use VirtualBox’s implementation of virtual machines inside Windows.

The backbone of modern internet applications including cloud computing applications are based on virtualization. With the advent of hardware extensions supporting virtual machines, the performance difference between a physical machine and a well configured virtual machine in a right host is non-existent.

For new-comers to linux, who are still more comfortable dealing with off-the-shelf consumer hardware, virtualization is an easy entry point to start using some of the awesome tools for deep learning and computational simulations. But, once you learn the tricks of the trade, there is always the option to convert all your applications to a physical machine or run it inside a cloud provider.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The image of Docker whale logo is from Docker blog)

Democracy and science – What South Sudan teaches us?

I had an incredible opportunity to participate in a Doctors without borders (MSF) initiative to fill-in missing geographic information data into satellite images. This is an incredibly important process to figure out a huge number of operational details for aid agencies and non-profit organizations like MSF.

This includes resource allocation, rapid disaster relief, quick response to public health crisis like outbreak of epidemics, administration of vaccines to children and so many other important life saving efforts. Our mission for the day was to help fill-in housing details in a region called Aweil in South Sudan.

I consider MSF as one of the most important organizations in the world. In 1999, when MSF won a Nobel prize for peace, as a teenager hoping to find what should be my next mission in life, reading about some of the incredible life saving efforts of MSF acted as one of the strong motivators for me to become a doctor.

But, yesterday’s mission of helping fill-in the missing mapping information in South Sudan made me realize another important fact. The importance of investing in transparent democracy, science and technology even in resource poor settings.

According to 2013 World Bank data South Sudan has a per-capita GDP of $1044.99. The country has a steady source of revenue by exporting oil, which accounts for nearly 40% of the GDP. This gives South Sudan another important distinction: the most oil dependent economy in the world.

At a time when most nations around the world are pledging to invest more in renewable energy and as the global economy is shifting away from oil, how can a young upstart like South Sudan cope with these changes? The secret is: early investments in science and technology.

Despite robust oil revenues, due to systemic inefficiencies in the South Sudanese economy, most of it never benefit the citizens of this North African nation. Even today, due to these inefficiencies, the oil revenue is a non-existent contributor of economic development in South Sudan.

The financial infrastructure in South Sudan is virtually non-existent and the military acts as a bank to distribute currency to the public. This creates a huge conflict of interest. Due to the lack of financial transparency in how the country’s revenues are handled, most of the cash distribution system fail to address the poverty and social issues that riddle South Sudanese society.

At a time when democracies around the world are racing towards reinventing themselves as opaque, protectionist and self-serving institutions; this little North African nation serves as a warning beacon against such policies.

Despite all the challenges, I see hope for a small country like South Sudan. With a little bit of external help and guidance, the democratic and financial institutions can be made more efficient. The much needed access to healthcare is currently provided by brilliant organizations like MSF. But, developing local skills and training will be extremely important for South Sudanese society to flourish and be healthy.

There is an incredible opportunity for this nation to invest in good educational infrastructure. This will create more empowered citizens, a much needed resource for a fairly new country. Investment in education is a necessity for a healthy democracy.

Another key area that needs investment is the development of technological backbone to support independent public and private financial institutions. This will create a more accountable economy, reducing financial inefficiencies.

These are very hard tasks, even for developed economies. But, investing in these key basic goals will elevate South Sudan from a languishing new democracy to a beacon of hope, it once was, a few years ago.

Read more about MSF activities in South Sudan. Here is a short movie explaining how a day in Aweil is like for members of MSF.

Please consider making a donation to Médecins Sans Frontières (Doctors without borders) to help MSF continue their incredible work of bringing accessible healthcare to some of the poorest societies in the world. Organizations like MSF are the epitome of hope and inspiration for free societies around the world.

Donate here: through MSF website page for donations.

(The picture of a mother with a new born taken at Aweil, Médecins Sans Frontières hospital, South Sudan, retrieved from MSF UK website, © Diana Zeyneb Alhindawi.)

Linux distros – The art of selecting one.

I have decided to migrate all of the programming environments to linux . The reason is simplicity of linux to run Python and R. I am often befuddled by common dependency issues, which linux seems to have avoided. This is especially true for Python. An added advantage is the ability to run very sophisticated deep learning algorithms, including Nvidia Digits and Google Tensorflow. If one is serious about machine learning, embrace linux.

Right now, I am divided between two major linux distros: Ubuntu and Fedora. Ubuntu has the advantage of wider support. Fedora has Project Wayland, which makes Fedora 24 way more secure than any other X-system based linux distros. For now, the decision is a virtual tie. Right now, I am experimenting with a very minimalist Ubuntu based OS, called Elementary.

I initially ran three tensorflow experiments in Elementary. The OS has major issues with Anaconda and Docker. But, since I don’t care much about either of those issues, as long as tensorflow experiments could perform well, I was very happy. The chief attraction of Elementary OS was the distraction free, minimalist UI. Being a Sherlock fan, the name was also a subjective matter of attraction to me.

After two days of experimentation with Elementary, I decided to stick with plain vanilla Ubuntu 16.10. The biggest issue for me was the lack of a stable package manager. A simple Docker installation routine broke the package manager. Then came the errors in tensorflow. The UI is beautiful in Elementary. It is very beginner friendly distro. But, for advanced applications, I have decided to stick with Ubuntu.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The image of  Elementary OS Loki 0.4 desktop from Elementary OS blog.)

Quantifying trust – An essential necessity for any business.

This post is an evolving set of ideas to quantify trust in decision making systems and processes. For me, an empirical definition of trust is the relative distance of a decision from the ground truth. Quantifying and optimizing trust in decision making systems are therefore highly important. This process will make systems that are involved in decision making processes to perform consistently and closer to the future reality.

The first step to optimizing such systems, human or computational, will be to develop an algorithmic approach to quantify and optimize trust.

The first experimentation is using a measurement of distance from the center. The idea here is, as overall trustworthiness of a decision making system improves overtime, the system has a very short distance from the mean. Also, patterns of delineation between systems that consistently lag behind in real world prediction problems can be easily identified.

The code above is an example I pulled from CRAN for starting an experimentation with k-centroids cluster analysis.

Another approach to quantification of decision systems is to use log-loss. Log-loss is very interesting because of the increased penalty for systems that are very far off from the ground reality.

Here is a simple implementation of the log-loss function. But this function has a series of downsides, which I will discuss below.

The function log(1-predicted), is the function I am wary of. What if the algorithm used for making predictions return a value greater than 1? For most applications, a simple specification of the range between 0 and 1 for the prediction values will fix the issue of >1 values. But, there are circumstances where >1 values are needed as outputs for prediction problems. An excellent scenario is regression problems using machine learning.

In regression problems, there is no real solution to identify whether a probability function returning a value slightly higher than 1, when plugged into another equation, matches the real observation or not. To address this issue of handling >1 probability values, I have modified the code to include an absolute function. This will prevent the log function to return imaginary (i) values or in most programming environments: NaN values. The modified code is included below:

The quantification of trust in decision processes, especially the artificial intelligence systems are important. I visualize AI systems as very similar to constructing a bridge across a deep ravine with a river flowing at break-neck speeds.

If someone builds a rickety rope bridge (very low trust scores), people have the intuition to not use the bridge to cross the ravine. On the other hand, when we build a strong steel suspension bridge with a service lifespan of 300 years and a load carrying capacity way higher than anything currently imaginable (very high trust scores), folks will use the bridge without ever thinking about the risks. The reason is quite simple: the statistical probability of the well engineered steel suspension bridge failing is very close to zero.

But, the problem for AI systems currently is: there is no straight forward and intuitive solutions to quantify the trust worthiness of these systems. The metrics that I am trying to develop, will help visualize and quantitate the trust worthiness of AI systems. It is very similar to human cognitive approach to the bridge crossing problem, but applied for AI and decision systems.

Note: Evolving content with changes in the post as I add more content.

This work is done as part of our startup project nanoveda. For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The image of “Mother and Child, 1921” by Pablo Picasso, Spanish, worked in France, 1881–1973, from Art Institute of Chicago and published under fair use rights.
© 2016 Estate of Pablo Picasso / Artists Rights Society (ARS), New York, 

The image of island rope bridge, Sa Pa, Vietnam, is an edited version of a public domain photograph obtained through Google image search. )

How to think like scientists – A five step process.

Richard Buckland, a professor of computer science, has a very elegant explanation on how to think like a scientist. Even though these steps are aimed at his computer science students, for me this has more general meaning and purpose. Richard’s simple five step process of getting into a scientific mindset are:

  1. The moment you encounter an interesting problem, always try to solve it and never give up.
  2. While solving the problem, always ask more questions.
  3. To make the process of solving problems easier, break it into different parts.
  4. Profit from your path to solving the problem, and from the solutions you created.
  5. Look for more problems to solve.

I love this five step process because it summarizes concisely the thought process of a scientist. What we are doing at nanoveda with nanotechnology and cancer research is essentially a real world embodiment of this five step process. As a group, when we first met, we found a highly disturbing problem of accessible, precision cancer therapy. This problem has a very personal meaning to us at nanoveda.

Since each of us had our own unique expertise, we broke the problem down into components and came up with a solution. We didn’t stop there. We asked more questions like: how to improve the process, how to gather more data and information, how else can we solve this problem, who else can benefit from this work. Then we started looking for more problems to solve.

For providing benefits to a broader cross section of the society, we decided to form a company and scale-up our idea. Up until recently, we never formally addressed how our process worked. Then, I discovered this video of Richard and his five step process. It made me realize: this is exactly what we were doing subconsciously, all along the way.

I am a huge admirer of the course that Richard teaches. He teaches this course at University of New South Wales, Sydney, Australia. I am posting this video below, where he describes the five steps outlined above.

This video is part of a lecture series from his course on computing: Computing 1 – The Art of Programming. The course is also available online. I highly encourage everyone to check this video out and the entire course too. It will help everyone start thinking like a scientist.

For continuing nanoveda’s wonderful work, we are running a crowdfunding campaign using gofundme’s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

(The illustration of the structure of deoxyribonucleic acid (DNA) is from Wikipedia.)

If I were a time traveler – Lessons learned.

If I were a time traveler, I would have traveled back exactly 121 years to witness one of the most important discoveries of modern medicine: x-ray. The photograph above on the left is the first ever recorded x-ray photograph of human anatomy and on the right is the photo of Wilhelm Röntgen, the man who took this picture. The hand in the picture is thought to depict Röntgen’s wife Bertha wearing her wedding ring.

X-ray was recognized back then as an unexplained physical phenomenon. Sir William Crooke noticed an odd phenomenon of blurry photographic papers that he used to wrap the vacuum tubes he was studying. No one could exactly explain this phenomenon well enough. This was despite Hermann von Helmholtz’s formulation of mathematical equations to predict the existence of x-ray without ever experimenting with them.

With ‘Hand mit Ringen’ photograph, Röntgen demonstrated unequivocally, the unique property of x-ray to travel through structures that were other wise thought to be impenetrable. This discovery started an important era in modern medicine: one that belongs to diagnostic medicine. X-ray is in many ways, one of the earliest diagnostic tools. The physics behind x-rays was difficult to understand, but a simple picture of the bones of the hand captivated the imagination of the scientists as well as the general public. Immediately after the publication of  ‘Hand mit Ringen’ photograph, there was a huge uptick in number of scientific publications dealing with these mysterious and powerful light sources. Even the general public weren’t immune to the charm of the mysterious x-rays. Among the general public, x-rays were often considered as magical phenomenon bordering on the paranormal.

An important lesson I would have learned from my time travelling adventure: the importance of capturing public’s imagination. As the old adage goes, a photograph is worth more than a thousand words. Surprisingly, modern medicine has a lot of work to do, when it comes to communicating some of its incredible feats to the public: clearly, eloquently and without distorting the facts.

In an age where hyperbole and mythical fantasies dominate the news cycle, a simple time travelling thought experiment would reveal the exact same events happening to us as human beings, at every juncture in our history. For individuals who understand the physical world better than most, we have a responsibility to convey our ideas, messages and findings; with great conviction and confidence.

Another lesson I would have learned from my journey to the pages of history is the importance of understanding the physical phenomenon to master the biological world. We often venture into experimentation and trial & error with very little grasp of some of the underlying phenomenon. In this day and age of powerful computing, development of simulation models to test the physical theories of interactions are easier and yet ever more important to create sophisticated biological experiments. Simulations and mathematical models often help us solve problems faster, despite the initial sweat and elbow grease associated with it.

To create the next generation cure for cancer, I am confident that the first step is to build simulation and mathematical models of how some of these sophisticated drugs work. As part of nanoveda’s ongoing attempts to advance cancer therapy, we are using advanced simulation tools to build models to predict how some of the drugs we have designed will work.

For continuing this work, we are running a crowdfunding campaign using gofundme‘s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.

What is next – The future of research.

Every year, from 2010 onward, Redmonk publishes a bi-annual comparison of the popularity of programming languages relative to one another using data from GitHub and Stack Overflow. One list is compiled for the summer and another one for the spring. Among top 15 programming languages in the spring 2016 list, only one exclusively scientific and statistical programming language is featured. It is: R.

Within a six month period: between January of 2016 and June of 2016, R climbed from its original 13th place to the current 12th place. The only other popular scientific and statistical programming language ranked higher than R is Python. But, mind you, Python is a general purpose programming language, with broader programming applications than the R’s primary focus of scientific and statistical computations.

I see this list as a bellwether or a canary in the coal mine or a harbinger. The message is clear. Since R and Python are opensource programming languages, any startup with a heavy focus on research and development should start investing in open platforms now. It will provide two benefits: 1) number of application developers, packages, platforms and programmers working with the language will be high; 2) the relative stability over time for the list of names featured in the top 15 programming languages indicate a very easy way to future proof the research and development side of the business.

Even if the underpinning technologies are opensource, there are mechanisms to protect and commercialize the intellectual property. If there is not much competition in the field of development, or if the resources and management strategy allows focusing more on: 1) cascading effect of lowered supply chain costs due to increased competition and  2) maintenance of developer momentum; then one can always choose to take the path of open sourcing every new research and development. A great example here is the electric car manufacturer Tesla.

In conclusion, open platforms are the next big bet for research and development focused startups like nanoveda. Our strategy for technology development is based on what the technology landscape will be for the medium term future of next 5 to 10 years. Cancer research is a very competitive field and monetization potential of startups in this field are measured using intellectual property. Once we reach a position of self sustaining business, nanoveda will opensource all our patents, similar to Elon Musk’s approach with Tesla.

Right now we need your support. We are running a crowdfunding campaign using gofundme‘s awesome platform. Donation or not, please share our crowdfunding campaign and support our cause.

Donate here: gofundme page for nanoveda.