CloudCIX ML API

This page describes a number of stateless ML APIs available to be called from within any CloudCIX project. These API calls are protected by an IP address based access list and cannot be used from outside CloudCIX projects.

An Address based API Key, available from within the CloudCIX Membership App, is required to use these APIs so that billing can be made.

Large Language Models

UCCIX-Instruct

Our flagship Irish-English bilingual chatbot model, capable of understanding both languages and outperforms much larger models on Irish language tasks (upto 12% compared to models 10 times larger in size). Additionally, the model is 50% more efficient on Irish tokens than other models, leading to reduction in computing time and cost.

Free preview available here.

Sample Usage


import requests

url = 'https://ml.cloudcix.com/uccix_instruct/'
data={
    'api_key': 'Put your API key here',
    'max_tokens': 100, # Optional, default value is 100
    'prompt': 'Put any text here that you want to use to prompt UCCIX-Instruct',
    'temperature': 0.0, # Optional, default is 0.0, max is 1.0
}
response = requests.post(
    url=url,
    json=data,
)

if response.status_code == 200:
    for item in response:
        print(item.decode('utf-8'), end='')
else:
    print('message:', response.text, 'status_code:', response.status_code)
            


GPT-4o

Most advanced model from OpenAI, advertised as useful for complex, multi-step tasks.

Sample Usage


import requests

url = 'https://ml.cloudcix.com/chatgpt4/'

prompt = [
    {'role': 'user', 'content': 'Put any text here that you want to use to prompt ChatGPT'},
]

data = {
    'api_key': 'Put your API key here',
    'max_tokens': 100, # Optional, default value is 100
    'prompt': prompt,
    'temperature': 0.0, # Optional, default is 0.0, max is 1.0
}

response = requests.post(
    url=url,
    json=data,
)

if response.status_code == 200:
    for answer in response:
        print(answer.decode('utf-8'), end=' ')
else:
    print('message:', response.text, 'status_code:', response.status_code)

                    


Embedding Models

Encode text as a vector (sequence of numbers) that represents the meaningful concepts within the input content. These vectors can be used for a variety of tasks, such as semantic search, clustering, and classification.

CIX Paragraph Encoder (chunk/context encoder)

Our flagship Irish-English embedding model, first-ever model with support for the Irish language, while also retaining State-of-the-Art performance on English.

Sample Usage


import requests

url = 'https://ml.cloudcix.com/cix_chunk_encoder/'

prompt = 'Put any paragraph/chunk here that you want to convert to a vector'

response = requests.post(
    url=url,
    json={
        'list': [prompt],
        'api_key': 'Put your API key here',
    },
)

if response.status_code == 200:
    print(response.json())
else:
    print('message:', response.text, 'status_code:', response.status_code)
                    


CIX Sentence Encoder (query encoder)

Our flagship Irish-English embedding model, first-ever model with support for the Irish language, while also retaining State-of-the-Art performance on English.

Sample Usage


import requests

url = 'https://ml.cloudcix.com/cix_question_encoder/'

prompt = 'Put any text here that you want to convert to a Vector'

response = requests.post(
    url=url,
    json={
        'list': [prompt],
        'api_key': 'Put your API key here',
    },
)

if response.status_code == 200:
    print(response.json())
else:
    print('message:', response.text, 'status_code:', response.status_code)
                    


Google Universal Sentence Encoder(USE) 4

Embedding model from Google. Not recommended as our benchmarks yield superior results for our flagship embedding model.

Cost: €0.005 per API call.

Sample Usage


import requests

url = 'https://ml.cloudcix.com/use4/'

prompt = 'Put any text here that you want to convert to a Use4 vector'

response = requests.post(
    url=url,
    json={
        'list': [prompt],
        'api_key': 'Put your API key here',
    },
)

if response.status_code == 200:
    print(response.json()['embeddings'][0])
else:
    print('message:', response.text, 'status_code:', response.status_code)
                


Dragon+ Sentence Encoder (query encoder)

Smaller size than our flagship model. Good for efficiency.

Sample Usage


import requests

url = 'https://ml.cloudcix.com/dragonplus_question/'

prompt = 'Put any text here that you want to convert to a Dragon+ vector'

response = requests.post(
    url=url,
    json={
        'list': [prompt],
        'api_key': 'Put your API key here',
    },
)

if response.status_code == 200:
    print(response.json())
else:
    print('message:', response.text, 'status_code:', response.status_code)
                


Dragon+ Paragraph Encoder (chunk/context encoder)

Smaller size than our flagship model. Good for efficiency.

Sample Usage


import requests

url = 'https://ml.cloudcix.com/dragonplus_vector/'

prompt = 'Put any paragraph/chunk here that you want to convert to a Dragon+ vector'

response = requests.post(
    url=url,
    json={
        'list': [prompt],
        'api_key': 'Put your API key here',
    },
)

if response.status_code == 200:
    print(response.json())
else:
    print('message:', response.text, 'status_code:', response.status_code)
                


Scraping

This scraping API allows you to collect data from any public website. At the moment, we are supporting scraping of websites and pdfs (note that the pdf file needs to be available through a public web url).

Input:

  • list: a list of URLs to PDF documents or webpages.
  • document_type: either 'pdf' or 'html'.
  • Output:

    A list of dictionaries with the fields:

    • 'source': the URL of the document.
    • 'error': an error message if the document could not be loaded. Optional.
    • 'page_content': a list of dictionaries with the fields:
      • 'page_number': the page number of the current page for pdf, or 0 for webpages.
      • 'text': the text content of the current page.

    HTML Scraping

    For html scraping, we also support filtering out unwanted tags and classes in the html through the variable 'exclusions'.

    Sample Usage

    
    import requests
    
    url = 'https://ml.cloudcix.com/scraping/'
    
    websites_to_scrape = ['https://docs.cloudcix.com']
    exclusions={'exclusion_tags': ['script', 'style'], 'exclusion_classes': ['footer', 'header']}
    document_type = 'html'
    
    response = requests.post(
        url=url,
        json={
            'list': websites_to_scrape,
            'exclusions': exclusions,
            'document_type': document_type,
            'api_key': 'Put your API key here',
        },
    )
    
    if response.status_code == 200:
        print(response.json())
    else:
        print('message:', response.text, 'status_code:', response.status_code)
                        


    Basic PDF Scraping

    Sample Usage

    
    import requests
    
    url = 'https://ml.cloudcix.com/scraping/'
    
    websites_to_scrape = ['https://arxiv.org/pdf/2405.13010.pdf']
    document_type = 'pdf'
    
    response = requests.post(
        url=url,
        json={
            'list': websites_to_scrape,
            'document_type': document_type,
            'api_key': 'Put your API key here',
        },
    )
    
    if response.status_code == 200:
        print(response.json())
    else:
        print('message:', response.text, 'status_code:', response.status_code)
                        


    High Resolution PDF Scraping

    High-resolution PDF scraping involves an advanced algorithm that preserves the layout and structure of complex documents. Unlike basic pdf scraping, this approach supports multi-column formats, tables, and images by generating HTML with <tr>, <td>, and other relevant tags for structural fidelity. Although image data is detected, it’s excluded in the final output to focus on text and tabular content for streamlined processing.

    Sample Usage

    
    import requests
    
    url = 'https://ml.cloudcix.com/scraping/'
    
    websites_to_scrape = ['https://arxiv.org/pdf/2405.13010.pdf']
    document_type = 'pdf_hi_res'
    
    response = requests.post(
        url=url,
        json={
            'list': websites_to_scrape,
            'document_type': document_type,
            'api_key': 'Put your API key here',
        },
    )
    
    if response.status_code == 200:
        print(response.json())
                        


    Parsing

    This parsing API allows you to parse and convert data from files into machine-readable format (html).

    Input:

  • data: a dictionary with 2 keys:
    • api_key: your CloudCIX API key.
    • filenames: list of file names that you are sending to the API.
  • files: list of files' byte-objects.
  • Output:

    A list of dictionaries with the fields:

    • 'source': the file name of the document.
    • 'error': an error message if the document could not be loaded. Optional.
    • 'page_content': a list of dictionaries with the fields:
      • 'page_number': the page number of the current page for pdf, or None for other files.
      • 'text': the text content of the current page.

    Support File Type

    Sample Usage

    
    import requests
    import json
    import os
    
    url = 'https://ml.cloudcix.com/parsing/'
    api_key = 'API_KEY_HERE'
    
    data = {
        'data': json.dumps({'api_key': api_key, 'filenames': ['test.html', '2405.13010v1.pdf']}),
    }
    
    files = [
        ('file', open('test.html', 'rb')),
        ('file', open('2405.13010v1.pdf', 'rb')),
    ]
    
    response = requests.post(url, data=data, files=files)
    
    print(response.status_code)
    print(response.text)