We do a 10-minute task in 2 hours using ChatGPT

We do a 10-minute task in 2 hours using ChatGPT

We’ve all seen many articles where AI tools can do a job in minutes that could easily take a day. Particularly impressive are the examples where the work (successfully) goes beyond the area of ​​human competence (that is, when AI allows you to do what a person could not do alone in principle). But today I had a slightly different case:

The task was relatively trivial: it was necessary to generate a look-up table (LUT) for a channel-by-channel increase in image contrast (S-curve), and apply this curve to a photograph from a microscope USB camera. We take ChatGPT (even 3.5 should scroll, there is only one formula), in 10 seconds we get the Python code with a typical S-curve (also ChatGPT immediately added graph drawing via matplotlib for my convenience), adjust the “curvature” parameter of the curve to taste – and you can insert into the final program. Among the multitude of S-curves/sigmoids, ChatGPT used a “logistic function” – it is one of the simplest. But it also has a drawback – the “curvature” parameter cannot be changed from increasing the contrast to decreasing it (and vice versa) – however, I didn’t particularly need it, it is never necessary to decrease the contrast in the target application (the last resort is to leave everything as it is, and this logistic function can).

But it turned out to be too early to celebrate the completion of work: the generated code stubbornly drew a graph that reduced the contrast instead of increasing it. When I made this claim to ChatGPT – he apologized, and every time he issued more and more broken code. It turned out to be impossible to simply change the parameter of the curve. the formula is not universal enough. Well, it looks like nothing will work here with ChatGPT – but it’s not the end of the world. After all, the limits of LLM application are known, and the narrower the scope, the easier it is to go beyond them. But the story does not end here.

Graphs of variants of S-curves for increasing contrast

For reference, here is the original (best) version from ChatGPT (you can practice and find errors):

import numpy as np
import matplotlib.pyplot as plt

def create_s_curve_lut():
    # Define parameters for the sigmoid curve
    a = 10.0  # Adjust this parameter to control the curve's shape
    b = 127.5  # Midpoint of the curve (127.5 for 8-bit grayscale)

    # Create the S-curve LUT using the sigmoid function
    lut = np.arange(256)
    lut = 255 / (1 + np.exp(-a * (lut - b) / 255))

    # Normalize the LUT to the 0-255 range
    lut = (lut - np.min(lut)) / (np.max(lut) - np.min(lut)) * 255

    return lut.astype(np.uint8)

# Create the S-curve LUT
s_curve_lut = create_s_curve_lut()

# Plot the S-curve for visualization
plt.plot(s_curve_lut, range(256))
plt.xlabel("Input Values (0-255)")
plt.ylabel("Output Values (0-255)")
plt.title("S-curve Contrast Enhancement LUT")
plt.show()

# You can access the S-curve LUT with s_curve_lut

Not being able to achieve the desired shape of the curve, I put a cross on the LUT from ChatGPT and reworked (if you can talk about a couple of lines of code from Google) using a regularized incomplete beta function. I twisted the a=b parameters until I got a curve to my liking on the graph (similar to what I usually set by hand in a graphics editor) and finally applied the LUT to the test image using OpenCV. To my complete surprise, the function reduced contrast, but did not increase it. How so?

Meditation did not allow you to quickly find the error. I wrote a test piecewise linear LUT to increase the contrast – and in the image it gave the expected result. Only by adding a piecewise linear LUT to the graph – the root of the problem became visible: when I threw out the S-curve LUT generation function from ChatGPT, I left the graph drawing code. There, ChatGPT has painstakingly printed the graph title and axis names. but made a concrete basis using the data of the X axis – in Y, and vice versa. Because plt.plot parameters are used without a name – without regularly using/remembering the order of parameters to a human, it’s very easy to miss a mistake here.

So when I was adjusting the shape factor of the curve – I was adjusting it on the inverted graph, and I made it decrease the contrast. When I accused ChatGPT of giving me the wrong formula – and he agreed with me and convulsively tried to correct something – he had no chance. Because the formula was correct and the error was in the display of the graph. Of course, if ChatGPT points out an error in the output of the graph, it happily agrees and corrects it (but I can do that myself). This is the basis of the #define true false level on public computers.

I remember my analytic geometry teacher had a final boss in the last exam: when you tell him the proof, he could suddenly disagree with one of the steps and say that it was wrong, and to get a 5 – you had to prove it without panicking your decision I hope someday language models will be able to disagree more often with the user who is not always right.

View the error code
plt.plot(s_curve_lut, range(256)) #It's a TRAP!
plt.plot(range(256), s_curve_lut) #This is correct

But this was not the end either: Looking at the graph, a slight asymmetry of the GPT-TRAP curve in the area of ​​255 is noticeable. units / 0.25% less, and pixels with full brightness (255) will be much rarer. What makes this bug interesting is that it is in the code generated by both ChatGPT, Bing Copilot and Google Bard. That is, apparently, such code is very common in training data and all models decided that “multiply by 255 and bring to uint8” is success. Technically, of course, we got to the range, but the result is not perfect.

View the error code
#Rounding error, fractional part is discarded
lut = (lut - np.min(lut)) / (np.max(lut) - np.min(lut)) * 255 

#This is correct
lut = np.round((lut - np.min(lut)) / (np.max(lut) - np.min(lut)) * 255) 

Conclusions:

  • Language models are like junior developers: they can and will make unexpected mistakes, and they need clear instructions and guidance. However, the difference is that the Junes will grow, and the models have to wait for the next generations. As with Junes, you need to have realistic expectations with models.

  • All code from language models should be checked, no one can be trusted. And the less trampled the lawn, the more carefully it should be checked. Language models generate code that is very similar to the truth, which means errors can be very expensive to fix.

  • If we get a non-working result, it is often easier to ask several different models and compare the results: now, in addition to ChatGPT (3.5/4), there are Copilot, Bard, Replit and others. None of them produced perfect code on the first try, but this way I wouldn’t have to mess with the schedule.

  • Some errors – may be systematic for many language models, apparently inherited from a partially overlapping training dataset where these errors were widely represented (because language models currently trust examples for training unconditionally, unlike humans ). That is, models cannot yet outperform training data. It is not clear how much work is still needed to “outdo the teacher”. It may turn out that this is the case when 10% of the work takes 90% of the time. But progress has been moving rapidly in the last 3 years, and surprises are possible.

Related posts